Helping Students Detecting Cyberbullying Vocabulary in Internet with Web Mining Techniques








Abstract

This article presents a model for the analysis of data on the Internet, using Web mining, to find knowledge about large amounts of information in cyberspace. To test the proposed method, Web pages on Cyberbullying were analyzed as a case study. The procedure integrates a Web Scraper to locate and download information from the Internet, to recover the vocabulary are used techniques of Natural Language Processing (tokenization, cleaning of words without meaning, frequency of term, inverse frequency of the document, synonyms, stemming methods). To obtain knowledge, a dataset was constructed using semantic ontologies to define the predictive variables of Cyberbullying and supervised learning to define the variable to be predicted. To evaluate the efficiency of the model, algorithms of machine learning, AdaBoost and Neural Network were used. The results reveal a percentage of 97% accuracy in the detection of Cyberbullying vocabulary, which was approved through crossvalidation, achieving a time saving of 581% with parallel processing, compared to sequential processing.


Modules


Algorithms


Software And Hardware

• Hardware: Processor: i3 ,i5 RAM: 4GB Hard disk: 16 GB • Software: operating System : Windws2000/XP/7/8/10 Anaconda,jupyter,spyder,flask Frontend :-python Backend:- MYSQL