FAKE PRODUCT REVIEW DETECTION
ABSTRACT:-
There are a number of Online selling product websites in today’s world. The quality of the product is defined from the review which they get from previous customers. The customer checks the review section before purchasing the product. If any product has a bad review, the customer thinks a thousand times before purchasing it. But there is a possibility to get a fake review from customers although the quality of product is best. Fake reviews are done to decrease the purchasing possibilities of a product. But it is difficult to read a thousand reviews about a good product for customers to read and decide whether to buy or not the product.
In this paper, we design a system which is able to detect fake reviews before the purchasing of a product. We aim to explain all the customer reviews of a product and compare the products based on reviews that can be done in one place. In this system , we give the weightage according to the steps of the detection process. Each step has different weightage based on the accuracy of the algorithm. We use 3 algorithms which are Naive bayes, SVM and Random forest to detect the fake review. Each algorithm step has 20 percent weightage and remaining steps has 10 percent weightage. Customers can give reviews only if they buy a product. Although the customer gave a review it will be treated as a fake review. Ip address of each user will be traced and many reviews from the same ip address using different usernames are categories in fake reviews. If a user posts the review by copy paste method then it will be treated as a fake review. If one customer reports a review and it counts above then five it will be set as a fake review. All steps have a weight of about 10 percent. If the review passes to the system, the system checks the all above possibilities and assigns the weightage accordingly. If the weightage of review goes above 70 percent then review will be declared as fake review else it is true review. This paper offers numerous original techniques to do these tasks and Our experimental results using reviews of a number of products shifted online demonstrate the efficiency of the techniques.
OBJECTIVES OF THE PROJECT:-
The main objectives of this system are:
- This system may help the user to find out the correct review of products and detect fake reviews and block them automatically.
- Fake Product Review Monitoring and Removal for Genuine Online Product Reviews Using Opinion Mining.
- To implement different algorithms to get better Fake review Detection i.e.; IP Address, Account used, Negative Word Dictionary using Senti-strength, Ontology.
- To present an algorithm that does Opinion Mining with Fake review Detection.
- Other Techniques are incorporated like IP Address Tracking and Ontology to detect Fake Reviews in order to get more accurate results from Opinion mining.
EXISTING SYSTEM:-
In Existing systems, Most of the fake product review systems are used completely supervised learning systems which work on trained classifiers over the data sets that are manually labelled. It does not have the full cost of manual labelling. In this dataset there is a possibility of having errors in the label of the dataset, which cannot give us accurate results. So there is a minimum guarantee to detect the fake review from these methods. I.e. We design the system based on the PU learning algorithm and behaviour density. In this we have two parts, First we have to train the classifier from a set of P and U dataset. From the full dataset, we have to select a small part of data which is identified as a fake review as positive set P. Then we declare the remaining large amount of data as unlabelled data set U. then mark those consistent negative samples from U as set RN. The sample set P, RN and mixed samples M are used to train classifiers. Second part is to calculate the behaviour density. Conduct density consists of fake conduct density of customers and the false conduct density of apps. The opinions with density better than the threshold are seen as faux reviews. Our proposed approach can overcome any such deficiency, and obtain powerful mastering whilst there is only a small range of fine samples and a massive variety of unlabeled samples. Through experiments and case evaluation, we reveal that our method has high detection accuracy. We give our proposed fake evaluations detection approach based on the pu studying algorithm and behavior density. We then provide experimental consequences to show the performance of our proposed method. Ultimately, conclusions are drawn.
PROPOSED SYSTEM:-
As most people check the review section before purchasing the product. The reviews define the product quality. It passes a lot of useful information to the user. It tells the user whether the product is fake or real. Review helps the user to buy the right product for itself. To buy the product users have to go through the review section, but users don’t know the difference between true or fake reviews. Some companies add good reviews to make their product famous. But there is a possibility to get a fake review from users to degrade the quality of the product. But it is difficult to distinguish the good or fake reviews about a product in between the thousands of reviews.
In this proposed system, we design a system that can be able to filter fake reviews and block them automatically. The system helps to detect that the review is fake or true. This system worked on three main algorithms which are Naïve bayes, SVM, Random forest.It has maximum weightage in this system. Naïve bayes classifier is used to identify whether the review is fake or correct. SVM classifier used Linear, rbf, sigmoid kernel to classify the fake or true reviews. Random forest algorithms will be used to classify fake reviews. Each algorithm has 20 percent weightage in this system. Users who have bought the item and then given review will be taken as true review else fake review. Ip address of each user will be traced and if many reviews given by the same ip address with different username will be treated as fake review. If a user posts the same review by copy and pasting,then such review will be taken as a fake review . timestamp will decide which review to keep . This method is called Timestamp based review filtering. If customers report any reviews and such count goes above five it will be taken as a fake review. All above steps have 10 percent weightage to classify fake reviews. If review is passed into the system and weightage goes above 70% then review will be declared as fake review else true review. That’s how this system works.
MODULES:-
- User Register: User have to register to check whether product review is Real or Fake.
- User Login: User have to register to check whether product review is Real or Fake.
- Processing Sentence: Use Natural language processing (NLP) to makes possible for computers to read text, hear speech, interpret it, measure sentiment and determine which parts are important.
- Word to Vector: Tf-idf is used to transforming text into a numerical feature is called text vectorization which mathematically eliminates naturally occurring words in the English language, and selects words that are more descriptive of your text.
- Classifying Sentence: Based on Tf-idf vectors machine learning algorithms (SVM, Naive bayes, Random Forest, Decision tree) will classify text.
- Review fake or real: Customer IP address, Customer multiple review based on all these values system will classify whether customer review is real or fake.
- Buy/add to cart product: User can add any product into cart and can make dummy payment.
ADVANTAGES:-
- Users always get genuine reviews about any of the products.
- Users can post their own review about the product.
- Users can spend money on valuable products.
- The major advantages of this system is its efficiency and the ease to implement and use it.The admin can easily track the analysing process of reviews in the systems.
- The proposed system is having an advantage in finding the fake reviews as it has a better precision percentage.
HARDWARE AND SOFTWARE REQUIREMENTS
HARDWARE: –
- Processor: Intel Core i3 or more.
- RAM: 4GB or more.
- Hard disk: 250 GB or more.
SOFTWARE:-
- Operating System : Windows 10, 7, 8.
- Python
- Anaconda
- Spyder, Jupyter notebook, Flask.
- MYSQL