Vous êtes sur la page 1sur 64

FAKE PRODUCT REVIEW MONITORING USING ARTIFICIAL

INTELLIGENCE

A Project Report submitted in partial fulfiillment of the requirements for the award of the
degree of

BACHELOR OF TECHNOLOGY

IN

COMPUTER SCIENCE AND ENGINEERING

Submitted by

Nazeer Ranadheer Anusumanth Harsha


(1210316648) (1210316609) (1210316658) (1210316652)

Under the esteemed guidance of

Dr.S.Praveen Kumar

Assistant Professor,Department of CSE

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


GITAM
(Deemed to be University)

VISAKHAPATNAM

2016-2020
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

GITAM INSTITUTE OF TECHNOLOGY

GITAM
(Deemed to be University)

CERTIFICATE

This is to certify that the project entitled “FAKE PRODUCT REVIEW MONITORING
USING ARTIFICIAL INTELLIGENCE” is a bonafide work done by Nazeer(1210316648) ,
Ranadheer(1210316609) , Anusumanth(1210316658) , Harsha(1210316652) students of Bachelor
of Engineering , Computer Science And Engineering ,GIT,GITAM University , Visakhapatnam
during the year 2016-2020 , submitted for the fulfillment of credits for bachelor of technology
degree in Department of Computer Science And Engineering.

PROJECT GUIDE HEAD OF DEPARTMENT


Dr.S.Praveen Kumar Dr.K.Thammi Reddy

Assistant Professor, Dept. Of CSE HOD, Dept. Of CSE

GIT, GITAM UNIVERSITY GIT, GITAM UNIVERSITY

Visakhapatnam Visakhapatnam

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

GITAM INSTITUTE OF TECHNOLOGY

GITAM
(Deemed to be University)

DECLARATION

We (Nazeer , Ranadheer , Anusumanth , Harsha ) here by declare that the project report entitled
“Fake Product Review Monitoring Using Artificial Intelligence” in an orginal and authentic work
done in the Department of Computer Science And Engineering , GITAM ,Rushikonda,
Visakhapatnam, Submitted in partial fulfillment of the requirements for the award of the degree
of Bachelor of Technology in Computer Science And Engineering.The matter encoded in this
project work has not been submitted earlier for award of any degree to the best of my knowledge.

Date:

Place:

Registration no. Name Signature

1210316648 Nazeer

1210316609 Ranadheer

1210316658 Anusumanth

1210316652 Harsha
ABSTRACT

As most of the people require review about a product before spending their money on
the product. So people come across various reviews in the website but these reviews are
genuine or fake is not identified by the user. In some review websites some good reviews
are added by the product company people itself in order to make product famous this people
belong to Social Media Optimization team. They give good reviews for many different
products manufactured by their own firm. User will not be able to find out whether the
review is genuine or fake. To find out fake review in the website this “Fake Product Review
Monitoring and Removal for Genuine Online Product Reviews” system is introduced

This system will find out fake reviews made by the social media optimization team by
identifying the IP address. User will login to the system using his user id and password and
will view various products and will give review about the product. To find out the review is
fake or genuine, system will find out the IP address of the user if the system observe fake
review send by the same IP Address many at times it will inform the admin to remove that
review from the system. This system uses data mining methodology. This system helps the
user to find out correct review of the product.
ACKNOWLEDMENTS:-

We sincerely express our deep sense of gratitude to my guide Dr.S.Praveen Kumar , Assistant professor ,
Department Of Computer Science and Engineering ,for his perspicacity , wisdom and sagacity coupled with
compassion and patience. It is our greatest pleasure to submit this work under his wing.

Oue sincere thanks to professor Dr.k.Thammi Reddy, Head Of Department Of Computer Science and
Engineering, Gitam University , for his kind support in successful completion of this work.

We express our gratitude to professor. K. Lakshmi Prasad, principal of our institute for forecasting an
excellent academic environment which made our project work possible.

We are thankful to the teaching and non teaching staff of the Department Of Computer Science and
Engineering, Gitam University, for their inexpressible support.
NAZEER (1210316648)

RANADHEER (1210316609)

ANUSUMANTH (1210316658)

HARSHA (12103166652)
TABLE OF CONTENTS

1. INTRODUCTION - 01
1.1 Introduction to Artificil Intelligence - 01
1.2 Applications of AI - 02
1.3 Types of AI - 03
1.4 Fake Review Monitoring -04

2. LITERATURE REVIEW - 06
2.1 opinion mining and sentiment analysis - 06

2.2 product review analysis and spam review detection - 07


2.3 The fake-food detectives - 07
2.4 Integrated algorithm establishment of spam detection system - 08
2.5 A survey on online review SPAM detection techniques - 09
2.6 Spam review detection with imbalanced data distributions - 09
2.7 Identifying Manipulated Online Reviews using Decision Tree - 10
2.8 A study on Review Manipulation Classification using Decision Tree - 11
2.9 International Conference on Web Search and Data Mining - 11

3. SYSTEM ANALYSIS - 13
3.1 Problem statement - 13
3.2 Software Requirements Specifications - 13
3.3 Non-functional Requirements - 14
3.4 Hardware Requirements - 15
3.5 Software Requirements - 16
3.6 Existing system - 16
3.7 Proposed system - 16

4. System Design - 18
4.1System Architecture - 18
4.2 UML Diagrams - 19

5. IMPLEMENTATION - 22
5.1 Introduction tp Python - 22
5.2 Libraries Used - 25
5.3 Algorithms Used - 27
5.4 Source Code - 29

6. SYSTEM TESTING - 33
6.1 Types Of Testing - 33
6.2 Test Cases - 34

7. CONCLUSION - 38
8. FUTURE SCOPE - 38
9. REFERENCE - 39
LIST OF FIGURES:-

1. Architecture diagram ------ 18

2. Use case diagram ------ 19

3. Sequence diagram ------ 20

4. Activity diagram ------ 21


1.INTRODUCTION

1.1 Introduction to Artificial Intelligence:

Artificial Intelligence is an approach to make a computer, a robot, or a product to think how smart human
think. AI is a study of how human brain think, learn, decide and work, when it tries to solve problems. And
finally this study outputs intelligent software systems.The aim of AI is to improve computer functions which
are related to human knowledge, for example, reasoning, learning, and problem-solving.

Intelligence is composed of:

 Reasoning
 Learning
 Problem Solving
 Perception
 Linguistic Intelligence
Many tools are used in AI, including versions of search and mathematical optimization, logic, methods
based on probability and economics. The AI field draws upon computer science, mathematics, psychology,
linguistics, philosophy, neuro-science, artificial psychology and many others.

Need for Artificial Intelligence

1. To create expert systems which exhibit intelligent behavior with the capability to learn, demonstrate,
explain and advice its users.
2. Helping machines find solutions to complex problems like humans do and applying them as
algorithms in a computer-friendly manner.

Approaches include statistical methods, computational intelligence, and traditional coding AI. During the
AI research related to search and mathematical optimization, artificial neural networks and methods based

-1-
on statistics, probability, and economics, we use many tools. Computer science attracts AI in the field of
science, mathematics, psychology, linguistics, philosophy and so on.

1.2 APPLICATIONS OF AI :

1.2.1 Artificial Intelligence in Healthcare: Companies are applying machine learning to make better and
faster diagnoses than humans. One of the best-known technologies is IBM’s Watson. It understands natural
language and can respond to questions asked of it. The system mines patient data and other available data
sources to form a hypothesis, which it then presents with a confidence scoring schema. AI is a study
realized to emulate human intelligence into computer technology that could assist both, the doctor and the
patients in the following ways:
 By providing a laboratory for the examination, representation and cataloguing medical information
 By devising novel tool to support decision making and research
 By integrating activities in medical, software and cognitive sciences

 By offering a content rich discipline for the future scientific medical communities.

1.2.2.Artificial Intelligence in business: Robotic process automation is being applied to highly repetitive
tasks normally performed by humans. Machine learning algorithms are being integrated into analytics and
CRM (Customer relationship management) platforms to uncover information on how to better serve
customers. Chatbots have already been incorporated into websites and e companies to provide immediate
service to customers. Automation of job positions has also become a talking point among academics and
IT consultancies.

1.2.3 AI in Autonomous vehicles: Just like humans, self-driving cars need to have sensors to understand
the world around them and a brain to collect, processes and choose specific actions based on information
gathered. Autonomous vehicles are with advanced tool to gather information, including long range radar,
cameras, and LIDAR. Each of the technologies are used in different capacities and each collects different
information. This information is useless, unless it is processed and some form of information is taken
based on the gathered information. This is where artificial intelligence comes into play and can be

-2-
compared to human brain. AI has several applications for these vehicles and among them the more
immediate ones are as follows:
 Directing the car to gas station or recharge station when it is running low on fuel.
 Adjust the trips directions based on known traffic conditions to find the quickest route.
 Incorporate speech recognition for advanced communication with passengers.
 Natural language interfaces and virtual assistance technologies.

1.2.4 AI for robotics : Robotics will allow us to address the challenges in taking care of an aging
population and allow much longer independence. It will drastically reduce, may be even bring down traffic
accidents and deaths, as well as enable disaster response for dangerous situations for example the nuclear

meltdown at the fukushima power plant.

1.2.5 AI in education: It automates grading, giving educators more time. It can also assess students and
adapt to their needs, helping them work at their own pace.

1.2.6 Cyborg Technology: One of the main limitations of being human is simply our own bodies and
brains. Researcher Shimon Whiteson thinksthat in the future, we will be able to augment ourselves with
computers and enhance many of our own natural abilities. Though many of these possible cyborg
enhancements would be added for convenience, others may serve a more practical purpose. Yoky Matsuka
of Nest believes that AI will become useful for people with amputated limbs, as the brain will be able to
communicate with a robotic limb to give the patient more control. This kind of cyborg technology would
significantly reduce the limitations that amputees deal with daily.
In the future, predictive analytics and artificial intelligence could play an even more fundamental role in
content creation and also in the software fields. Open source information and artificial
intelligence collection will provide opportunities for global technological parity and the technology of
artificial can become the future in all the domains of health, environment, public safety and security.

1.3 TYPES OF AI :

Artificial Intelligence involves a variety of technologies and tools, some of the recent technologies are as
follows:

-3-
1.3.1 Natural Language Generation: it’s a tool that produces text from the computer data. Currently used
in customer service, report generation, and summarizing business intelligence insights.

1.3.2 Speech Recognition: Transcribes and transforms human speech into a format useful for computer
applications. Presently used in interactive voice response systems and mobile applications.

1.3.3 Virtual Agent: A Virtual Agentis a computer generated, animated, artificial intelligence virtual
character (usually with anthropomorphic appearance) that serves as an online customer service
representative. It leads an intelligent conversation with users, responds to their questions and performs
adequate non-verbal behavior. An example of a typical Virtual Agent is Louise, the Virtual Agent of eBay,
created by a French/American developer VirtuOz.

1.3.4 Machine Learning: Provides algorithms, APIs (Application Program interface) development and
training toolkits, data, as well as computing power to design, train, and deploy models into applications,
processes, and other machines. Currently used in a wide range of enterprise applications, mostly `involving
prediction or classification.

1.3.5 Deep Learning Platforms: A special type of machine learning consisting of artificial neural
networks with multiple abstraction layers. Currently used in pattern recognition and classification
applications supported by very large data sets.

1.3.6 Biometrics: Biometrics uses methods for unique recognition of humans based upon one or more
intrinsic physical or behavioral traits. In computer science, particularly, biometrics is used as a form of
identity access management and access control. It is also used to identify individuals in groups that are
under surveillance. Currently used in market research.

1.3.7 Robotic Process Automation: using scripts and other methods to automate human action to support
efficient business processes. Currently used where it is inefficient for humans to execute a task.

1.3.8 Text Analytics and NLP: Natural language processing (NLP) uses and supports text analytics by
facilitating the understanding of sentence structure and meaning, sentiment, and intent through statistical
and machine learning methods. Currently used in fraud detection and security, a wide range of automated
assistants, and applications for mining unstructured data.
-4-
1.4 FAKE REVIEW MONITORING:

“What other people thoughts are and their thinking” has always been an important source of
information for most of us during the decision-making process. Long before awareness of the World Wide
Web (www) became widespread, many of us requested our friends to recommend a mixer or to explain who
they were thinking to vote for in elections, requested reference letters regarding job applicants from friends,
or consulted Consumer Reports to decide what mixer to buy. With the rapid expansion of e-commerce,
many products are sold on the Web, and many people are also buying products online. In order to enhance
customer satisfaction, requirements and online shopping experience, it has become a common practice for
online merchants to enable their customers to suggest opinions on the products that they have purchased.
With more and more common users becoming comfortable with the Web, a growing number of people are
writing reviews and posting them which are becoming beneficial for others. As a result, the number of
reviews that a product receives grows rapidly. Some popular products can get hundreds of reviews at some
large merchant sites. And our application will give you the promising reviews by filtering them from other
sites. And then you can decide what you want to buy or not.

The reason behind developing this system is that people are now days heavily rarely on opinions
before buying anything. This instigates many peoples to write fraud and useless opinions about other
products or service. Even there are some organizations in the market who is are hiring professional to write
fake reviews and promote their products or defame its competitors product. This fake opinions are misleads
the customers buying experience and convince them to buy products which are based on fake opinions so
there is a need to devise a tool which can help them to find the true opinions about products, peoples and
services. The proposed system and it will analyze the opinions and classifies them which one spam or non-
spam.

The scope and need of online markets and e-commerce platforms are on the rise and many people buy
products from these platforms. The amount of feedbacks for products as a result are also present in detail
for users to analyze the product they are buying. This can work against the users as well because users can
sometime bombard the review section with extreme opinion comments which can work in favor or against
the product. Thus, we need to take care of this because this can be done either by the merchant to increase
the value of his product or the user to degrade the ratings of that product.
-5-
-6-
2.LITERATURE REVIEW

2.1 Cambria, E; Schuller, B; Xia, Y; Havasi, C (2013). "New avenues in opinion


mining and sentiment analysis". IEEE Intelligent Systems. 28 (2): 15–21.
doi:10.1109/MIS.2013.30.

Mining opinions and sentiments from natural language is challenging, because it requires a deep
understanding of the explicit and implicit, regular and irregular, and syntactical and semantic language
rules. Sentiment analysis researchers struggle with NLP’s unresolved problems: coreference resolution,
negation handling, anaphora resolution, named-entity recognition, and word-sense disambiguation. Opinion
mining is a very restricted NLP problem.because the system only needs to understand the positive or
negative sentiments of each sentence and the target entities or topics. Therefore, sentiment analysis is an
opportunity for NLP researchers to make tangible progress on all fronts of NLP, and potentially have a huge
practical impact. Many companies use opinion mining and sentiment analysis as part of their research.

For instance, companies use opinion mining to create and automatically maintain review and
opinion-aggregation websites. Their systems continuously gather a wide array of information from the Web,
such as product reviews, brand perception, and political issues. Other systems might also use opinion
mining and sentiment analysis as subcomponent technology to improve customer relationship management
and recommendation systems through positive and negative customer feedback. Similarly, opinion mining
and sentiment analysis might detect and exclude “flames” (overly heated or antagonistic language) in social
communication and enhance antispam systems.

Typically, a system performs sentiment analysis over on-topic documents— using, for example, the
results of a topic-based search engine. However, several studies suggest that managing these two tasks
jointly might benefit overall performance. For example, a document’s off-topic passages might contain
irrelevant affective information and create inaccurate globalsentiment polarity about the main topic. Also, a
document might contain information on multiple topics that interest the user. In such instances, it’s
important to identify topics and separate the opinions associated with each topic.

-7-
2.2 Shashank Kumar Chauhan, Anupam Goel, Prafull Goel, Avishkar Chauhan and

Mahendra K Gurve, “Research on product review analysis and spam review


detection”, 4th International Conference on Signal Processing and Integrated
Networks(SPIN) 2017, ISBN (e):978-1-50902797-2, September-2017, pp. 1104-1109.

Many e-commerce web sites enable their customers to write product reviews and feedback in the
form of ratings. This gives the company personnel an indication about their products' standing in the
market, while also enabling fellow customers to form an opinion and help purchase a product. However,
due to the reason of profit or fame, many target products are promoted or demoted in the form of spam. It
may contain fake reviews or malicious opinions, which is misleading. In this paper, we make an attempt to
detect spam and fake reviews, and filter out reviews with expletives, vulgar and curse words, by
incorporating sentiment analysis. Other studies solve this by using just the ratings as a parameter. This
paper, however, by taking upon consideration Amazon dataset, matches the posted rating with the
calculated ratings of each review, by producing a sentiment score with the help of an inhouse dictionary.
Finally, we graphically show and analyze the different features of the product which adds to its popularity
or demotion.

In future we would try to improve the method of calculating the sentiment score of the reviews. We
would also try to update our dictionary containing sentiment word. We would try to add more words in our
dictionary and update the weights given to those words to get more accurate calculated score of the
reviews.we incorporate sentiment analysis of reviews techniques into the spam review detection.

2.3 Jeneen Interlandi (February 8, 2010). "The fake-food detectives". Newsweek.


Archived from the original on October 21, 2010.
-8-
Seller selling products on the web often ask or take reviews from customers about the products that
they have purchased. As e-commerce is growing and becoming popular day-by-day, the number of reviews
received from customer about the product grows rapidly. For a popular product, the reviews can go up to
thousands. This creates difficulty for the potential customer to read them and to make a decision whether to
buy or not the product. Problems also arise for the manufacturer of the product to keep track and to manage
customer opinions. And also additional difficulties are faced by the manufacturer because many other
merchant’s sites may sell the same product at good ratings and the manufacturer normally produces many
kinds of products. In this research, identifying opinion sentences in each review and deciding whether each
comment positive or negative and while giving opinions if its fake then e-mail id is blocked.

A direction for future research is to implement the system and check performance by applying
proposed approach to various benchmark data sets. Comparing performance of different classification
methods to find the best one for our proposed opinion spam classification method could be another future
research direction. However, there exist other kinds of review or reviewer related features that are likely to
make a contribution to the prediction task. In the future we will do further investigate different kinds of
features to make more accurate predictions.

2.4 Ruxi Yin, Hanshi Wang and Lizhen Liu, “Research of integrated algorithm

establishment of spam detection system”, 4th International Conference on Computer


Science and Network Technology (ICCSNT) 2015, ISBN (e): 978-1-4673-8173-4, pp.
390-393.

Online product review on shopping experience in social media has promoted users to provide
customer feedback. Nowadays many e-commerce sites allow customers to write their opinion on the
product which they buy in the form of reviews or ratings. The reviews given by the customer can build or
shatter the good name of the product. Due to this reason company personnel gets an idea of standing’s of
their product in the market. In order to demote or promote the product, spiteful reviews or fake reviews,
which are deceptive, are posted in the ecommerce site. This result will lead to potential financial losses or
larger amount of growth in business. We propose a project which focuses on detecting fake and spam

-9-
reviews by using sentiment analysis and removes out the reviews which contains vulgar and curse words
and make the e-commerce site fake review free online shopping center.

This work has investigated the user's request and has given the particular prediction and safety
measure for that input. The K-Means clustering technique was used to find the clusters and fatality for the
flight crash investigation. This method will also consider other factors like efficiency, weather impact and
schedules of other aircraft. Results showed that the proposed algorithm obtained prediction based on user's
input with efficiency. The proposed algorithm also provided improved search results for the query given by
the user. Possible future work is to improve the efficiency and also increase the count of clusters used

2.5 SP.Rajamohana, Dr.K.Umamaheshwari, M.Dharani, R.Vedackshya, “A survey on


online review SPAM detection techniques”, International Conference on Innovations
in Green Energy and Healthcare Technologies (IGEHT) 2017, ISBN(e): 978-1-5090-
57788. ”, International Conference on Innovations in Green Energy and Healthcare
Technologies (IGEHT) 2017, ISBN(e): 978-1-5090-5778-8.

Online product review on shopping experience in social media has promoted users to provide
customer feedback. Nowadays many e-commerce sites allow customers to write their opinion on the
product which they buy in the form of reviews or ratings. The reviews given by the customer can build or
shatter the good name of the product. Due to this reason company personnel gets an idea of standing’s of
their product in the market. In order to demote or promote the product, spiteful reviews or fake reviews,
which are deceptive, are posted in the ecommerce site. This result will lead to potential financial losses or
larger amount of growth in business. We propose a project which focuses on detecting fake and spam
reviews by using sentiment analysis and removes out the reviews which contains vulgar and curse words
and make the e-commerce site fake review free online shopping center.

- 10 -
This method will also consider other factors like efficiency, weather impact and schedules of other
aircraft. Results showed that the proposed algorithm obtained prediction based on user's input with
efficiency. The proposed algorithm also provided improved search results for the query given by the user.
Possible future work is to improve the efficiency and also increase the count of clusters used.

2.6 Hamzah Al Najada; Xingquan Zhu, “iSRD: Spam review detection with
imbalanced data distributions”, Proceedings of the 2014 IEEE 15th International
Conference on Information Reuse and Integration (IEEE IRI 2014), ISBN (e): 978-1-
4799-5880-1.\

Internet is playing an essential role for modern information systems. Applications, such as e-commerce
websites, are becoming popularly available for people to purchase different types of products online. During
such an online shopping process, users often rely on online review reports from previous customers to make
the final decision. Because online reviews are playing essential roles for the selling of online products (or
services), some vendors (or customers) are providing fake/spam reviews to mislead the customers. Any
false reviews of the products may result in unfair market competition and financial loss for the customers or
vendors. In this research, we aim to distinguish between spam and non-spam reviews by using supervised
classification methods. When training a classifier to identify spam vs. non-spam reviews, a challenging
issue is that spam reviews are only a very small portion of the online review reports. This naturally leads to
a data imbalance issue for training classifiers for spam review detection, where learning methods without
emphasizing on minority samples (i.e., spams) may result in poor performance in detecting spam reviews
(although the overall accuracy of the algorithm might be relatively high). In order to tackle the challenge,
we employ a bagging based approach to build a number of balanced datasets, through which we can train a
set of spam classifiers and use their ensemble to detect review spams. Experiments and comparisons
demonstrate that our method, iSRD, outperforms baseline methods for review spam detection.

we have addressed the problem of detecting spam online reviews from imbalanced data
distributions, and proposed a new classifier technique to overcome the problem of imbalanced data
distributions for review spam detection. In order to tackle the data imbalance, we proposed to use random
under-sampling to generate balanced training sets. A set of classifiers are trained from the balanced
- 11 -
training sets, and the voting of all the classifiers is used to predict whether a review is a spam or a non-
spam.

2.7 Rajashree S. Jadhav, Prof. Deipali V. Gore, "A New Approach for Identifying
Manipulated Online Reviews using Decision Tree ". (IJCSIT) International Journal of
Computer Science and Information Technologies, Vol. 5 (2), pp 1447-1450, 2014.

Now-a-days an internet has become an essential thing, as it provides more facilities to its users.
There are many social networking sites which offer users to share their views. People share their thoughts
about politics, social issues as well as about different products. It is a common practice today that before
purchasing anything user checks the reviews of that product online. There are multiple sites which deal with
these reviews. They provide ratings for the products as well as show comparison between different
products. Some enterprises attempt to create fake reviews to affect customer behaviours and increase their
sales. But, how to identify those fake reviews is a difficult task for customers. In today’s world of
competition it is necessary for any enterprise to maintain its reputation in a market. So it is necessary for
both, i.e. enterprise and customer to identify manipulated reviews. This paper studies different approaches
for identifying manipulated reviews and proposes a new approach to identify those manipulated reviews
using Decision Tree (DT).

More focus is given on the behavior of the reviewer and different text properties of comments. The
proposed method employs decision tree algorithm to classify manipulated reviews. Decision tree is used to
select the features which will give maximum accuracy. To increase the accuracy of the classification
bagging and boosting methods can be introduced.

2.8 Long- Sheng Chen, Jui-Yu Lin, “A study on Review Manipulation Classification
using Decision Tree", Kuala Lumpur, Malaysia, pp 3-5, IEEE conference publication,
2013.

- 12 -
Mining of opinions from product reviews, forum posts and blogs is an important research topic with
many applications. However, existing research has been focused on extraction, classification and
summarization of opinions from these sources. An important issue that has not been studied so far is the
opinion spam or the trustworthiness of online opinions. In this paper, we study this issue in the context of
product reviews. To our knowledge, there is still no published study on this topic, although Web page spam
and email spam have been investigated extensively. We will see that review spam is quite different from
Web page spam and email spam, and thus requires different detection techniques. Based on the analysis of
5.8 million reviews and 2.14 million reviewers from amazon.com, we show that review spam is widespread.
In this paper, we first present a categorization of spam reviews and then propose several techniques to
detect them.

2.9 N. Jindal and B. Liu, “Opinion spam and analysis,” International Conference on
Web Search and Data Mining, 2008, pp. 219-230.

Evaluative texts on the Web have become a valuable source of opinions on products, services,
events, individuals, etc. Recently, many researchers have studied such opinion sources as product reviews,
forum posts, and blogs. However, existing research has been focused on classification and summarization of
opinions using natural language processing and data mining techniques. An important issue that has been
neglected so far is opinion spam or trustworthiness of online opinions. In this paper, we study this issue in
the context of product reviews, which are opinion rich and are widely used by consumers and product
manufacturers. In the past two years, several startup companies also appeared which aggregate opinions
from product reviews. It is thus high time to study spam in reviews. To the best of our knowledge, there is
still no published study on this topic, although Web spam and email spam have been investigated
extensively. We will see that opinion spam is quite different from Web spam and email spam, and thus
requires different detection techniques. Based on the analysis of 5.8 million reviews and 2.14 million
reviewers from amazon.com, we show that opinion spam in reviews is widespread. This paper analyzes
such spam activities and presents some novel techniques to detect them.

Detection of such spam is done first by detecting duplicate reviews. We then detect type 2 and type

- 13 -
3 spam reviews by using supervised learning with manually labeled training examples. Results showed that
the logistic regression model is highly effective. However, to detect type 1 opinion spam, the story is quite
different because it is very hard to manually label training examples for type 1 spam. We thus proposed to
use duplicate spam reviews as positive training examples and other reviews as negative examples to build a
model. We showed the effectiveness of the model. The current study, however, only represents an initial
investigation. Much work remains to be done. In our future work, we will further improve the detection
methods, and also look into spam in other kinds of media, e.g., forums and blogs.

- 14 -
SYSTEM ANALYSIS

3.1 Problem Statement:

In today’s world reviews on online websites play a vital role in sales of the product
because people try to get all the pros and cons of any product before they buy it as there
are many different options for the same product as there can be different manufactures
for the same type of product or there might be difference in sellers that can provide the
product or there might be some difference in the procedure that is taken while buying the
product so the reviews are directly related to the sales of the product and thus it
necessary for the online websites to spot fake reviews as it’s their own reputation that
comes into consideration as well, so a Fake Review Detection is used to spot any
fraudulent going on because it’s not possible for them to verify every product and sale
manually so a program comes into the picture that tries to detect any pattern in the
reviews given by the customers.

3.2 Software Requirements Specifications:

3.2.1 Purpose:

Purpose of this project is to remove fake reviews from the set of product reviews to obtain
genuine reviews.
- 15 -
3.2.2 Scope:

Now any people can write any opinion text or review, this can draw the individuals attention, and
organizations to give undeserving spam opinions to promote or to discredit some target products. So there is
a need to develop an smart system which automatically mine opinions and classify them into spam and non-
spam category. Proposed opinion spam analyzer will automatically classify user opinions into spam or non-
spam. This automatic system can be useful to business organization as well as to customers. Business
organization can monitor their product selling by analyzing and understand what the customers are saying
about products. Customers can make decision whether he/she should buy or not buy the products. This can
helpful to people to purchase valuable product and spend their money on quality products.

3.2.3 objectivies:

The problem investigates the problem of finding the fake reviews from set of product reviews.we consider a
dataset of product reviews that is used to find out fake reviews by multiple methods.Following are the
objectives of the proposed approach and this work.

 To implement different algorithm to get better Spam Detection i.e.; IP Address, Account used,
Negative Word Dictionary using Senti-strength, Ontology. Graphical representation of work.

 To deals with 6 different types of Spam Reviews.

 To presents Opinion Mining on Spam Filtered Data.

 To implement Ontology in Spam Detection

 To present an algorithm that does Opinion Mining with Spam Detection.

3.3 Non-functional Requirements:

3.3.1 User Requirements:


- 16 -
The User Requirements Specification describes the business needs for what users require from the
system. User Requirements Specifications are written early in the validation process, typically before the
system is created. They are written by the system owner and end-users, with input from Quality Assurance.
Requirements outlined in the URS are usually tested in the Performance Qualification or User Acceptance
Testing. User Requirements Specifications are not intended to be a technical document; readers with only a
general knowledge of the system should be able to understand the requirements outlined in the URS.

3.3.2 Usability:

Usability is the ease of use and learnability of a human-made object such as a tool or
device. In software engineering, usability is the degree to which a software can be used by specified
consumers to achieve quantified objectives with effectiveness, efficiency, and satisfaction in a quantified
context of use.

3.3.3 Reliability:

Reliability refers to the consistency of a measure. A test is considered reliable if we get the same
result repeatedly. For example, if a test is designed to measure a trait, then each time the test is administered
to a subject, the results should be approximately the same. Unfortunately, it is impossible to calculate
reliability exactly, but it can be estimated in a number of different ways.

3.3.4 Performance:

Project performance measurement isn't just on-time and on-budget. . When we're measuring
the performance of a project we are interested in days over planned duration, which are great
for managing projects day to day.

3.3.5 Supportability:

Supportability is the degree to which system design characteristics and planned logistics resources
meet system requirements. Supportability is the capability of a total system design to support operations and
readiness needs throughout the life-cycle of a system at an affordable cost.

- 17 -
3.3.6 Accuracy and Precesion:

Both accuracy and precision reflect how close a measurement is to an actual value,
but accuracy reflects how close a measurement is to a known or accepted value, while precision reflects
how reproducible measurements are, even if they are far from the accepted value.

3.3.7 Portability:

Porting the ability of a computer program to be ported from one system to another in computer
science,Software portability the portability of a piece of software to multiple platforms.

3.3.8 Modifiability:

Maintainability is the capability of the software product to be modified.The modifiability of


a software system is the ease with which it can be modified to changes in the environment, requirements or
functional specification.

3.4 Hardware Requirements:

➢ Processer : Pentium – III

➢ RAM : 1GB

➢ Hard Disk : 50GB

➢ Monitor : 15 LED

➢ Input devices : keyboard , mouse

3.5 Software Requirements:

➢ Operating System : Windows 10.

➢ Tool : collab python,visual paradice.


- 18 -
➢ IDE : Python 3.

3.6 EXISTING SYSTEM:

When performing any type of internet shopping, many of the users will spend their quality time into
reading other user reviews if they are available. A survey performed by Yelp.com has shown that:

➢ More than 80% of users and shoppers do check and rely on reviews of the people.

➢ 50% rely on ratings of the online product they want to buy.

➢ 30% of the users compare the product’s reviews with others product’s reviews to get a

reliable and trustworthy thing.

Clearly consumers value the feedback given by other users as do the companies that sell such
products. Blogs, websites, discussion boards etc. are a repository of customer suggestions which are
valuable and important sources of textual data. Therefore, today’s individuals and older ones extensively
rely on reviews available on line. It means that people make their decisions of,whether to purchase the
products or not by analyzing and reflecting the existing opinions on those products.

3.7 PROPOSED SYSTEM:

This system will find out fake reviews made by the social media optimization team by identifying
the IP address. User will login to the system using his user id and password and will view various products
and will give review about the product. To find out the review is fake or genuine, system will find out the IP
address of the user if the system observe fake review send by the same IP Address many at times it will
inform the admin to remove that review from the system. This system uses data mining methodology. This
system helps the user to find out correct review of the product.

Sentiment Analysis :-

Sentiment analysis is used to understand writers emotion. We define three list of words; positive
vocabulary, negative vocabulary and neutral vocabulary, which consists of positive, negative and neutral
- 19 -
words. Every review is passed to nltk classifier which calculates the sentiment score of the reviews.
Sentimental Analysis is contextual mining of text which identifies and extracts subjective information in
source material and helping a business to understand the social sentiment of their brand, product or service
while monitoring online conversations.

Content Similarity:-

Content similarity is performed on the reviews given by same user. We use cosine similarity to obtain
similarity of two reviews. If the cosine value is greater than 0.5 the review is considered to be fake.

METHODS USED TO DETERMINE FAKE REVIEWS:

1. Reviews which have dual view.

2. Reviews in which same user promoting or demoting a particular brand.

3. Reviews in which person from same IP Address promoting or demoting a particular


brand.

4. Reviews which are posted as flood by same user all the reviews are either positive or
negative.

5. Reviews which are posted as flood by same person from same IP Address.

6. Similar reviews posted in the same time interval.

ADVANTAGES:

➢User gets genuine reviews about the product.

➢User can post their own review about the product.

➢User can spend money on valuable products.

- 20 -
4.SYSTEM DESIGN

4.1 System Architecture:

Architecture design gives the real world view of the system. Fig-1 represents the architecture design of
Fake Product Review Monitoring. The reviews which are part of online data on websites are scraped by the
web crawler. This data is taken for preprocessing where the data is transformed into required format and the
abusive reviews are removed. Fake reviews are next identified by Fake Review Detector. This is taken as
training data for the classifier which classifies fake and genuine reviews.

- 21 -
4.2 UML DIAGRAMS:

UML is a way of visualizing a software program using a collection of diagrams. The notation has
evolved from the work of Grady Booch, James Rumbaugh, Ivar Jacobson, and the Rational Software
Corporation to be used for object-oriented design, but it has since been extended to cover a wider variety of
software engineering projects. Today, UML is accepted by the Object Management Group (OMG) as the
standard for modeling software development.

USE CASE DIAGRAM:

Use case diagrams are a way to capture the system's functionality and requirements in UML
diagrams. It captures the dynamic behavior of a live system. A use case represents a distinct functionality of
a system, a component, a package, or a class.

Contents:

1. Use case

2. Actor

- 22 -
SEQUENCE DIAGRAM:

A sequence diagram simply depicts interaction between objects in a sequential order i.e. the order in which
these interactions take place. We can also use the terms event diagrams or event scenarios to refer to a
sequence diagram. Sequence diagrams describe how and in what order the objects in a system function.
These diagrams are widely used by businessmen and software developers to document and understand
requirements for new and existing systems.

- 23 -
ACTIVITY DIAGRAM:

We use Activity Diagrams to illustrate the flow of control in a system and refer to the steps involved in the
execution of a use case. We model sequential and concurrent activities using activity diagrams. So, we
basically depict workflows visually using an activity diagram. An activity diagram focuses on condition of
- 24 -
flow and the sequence in which it happens. We describe or depict what causes a particular event using an
activity diagram.

5.IMPLEMENTATION

5.1 Introduction to Python:


- 25 -
Python is a widely used general-purpose, high level programming language. It was
initially designed by Guido van Rossum in 1991 and developed by Python Software
Foundation. It was mainly developed for emphasis on code readability, and its syntax
allows programmers to express concepts in fewer lines of code.

Python is a programming language that lets you work quickly and integrate systems
more efficiently.There are two major Python versions- Python 2 and Python 3. Both are
quite different.

There are so many applications of Python, here are some of the them.
1. Web development – Web framework like Django and Flask are based on Python. They
help you write server side code which helps you manage database, write backend
programming logic, mapping urls etc.

2. Machine learning – There are many machine learning applications written in Python.
Machine learning is a way to write a logic so that a machine can learn and solve a
particular problem on its own. For example, products recommendation in websites like
Amazon, Flipkart, eBay etc. is a machine learning algorithm that recognises user ’ s
interest. Face recognition and Voice recognition in your phone is another example of
machine learning.

3. Data Analysis – Data analysis and data visualisation in form of charts can also be
developed using Python.

4. Scripting – Scripting is writing small programs to automate simple tasks such as


sending automated response emails etc. Such type of applications can also be written in
Python programming language.

5. Game development – You can develop games using Python.

6. You can develop Embedded applications in Python.

- 26 -
7. Desktop applications – You can develop desktop application in Python using library
like TKinter or QT.

FEATURES OF PYTHON:

1. Readable: Python is a very readable language.

2. Easy to Learn: Learning python is easy as this is a expressive and high level
programming language, which means it is easy to understand the language and thus easy
to learn.

3. Cross platform: Python is available and can run on various operating systems such as
Mac, Windows, Linux, Unix etc. This makes it a cross platform and portable language.

4. Open Source: Python is a open source programming language.

5. Large standard library: Python comes with a large standard library that has some
handy codes and functions which we can use while writing code in Python.

6. Free: Python is free to download and use. This means you can download it for free and
use it in your application. Python is an example of a FLOSS (Free/Libre Open Source
Software), which means you can freely distribute copies of this software, read its source
code and modify it.

7. Supports exception handling: If you are new, you may wonder what is an exception?
An exception is an event that can occur during program exception and can disrupt the
normal flow of program. Python supports exception handling which means we can write
less error prone code and can test various scenarios that can cause an exception later on.

8. Advanced features: Supports generators and list comprehensions. We will cover these
features later.

- 27 -
9. Automatic memory management: Python supports automatic memory management
which means the memory is cleared and freed automatically. You do not have to bother
clearing the memory.

Versions of Python:

Python 2:

Since Python 2 has been the most popular version for over a decade and a half, it is
still entrenched in the software at certain companies.

However, since more companies are moving from Python 2 to 3, someone who
wants to learn Python programming for beginners may wish to avoid spending time on a
version that is becoming obsolete.

Python 3:

Python 2.0 was released in 2000, and the 2.x versions were the prevalent releases
until December 2008. At that time, the development team made the decision to release
version 3.0, which contained a few relatively small but significant changes that were not
backward compatible with the 2.x versions. Python 2 and 3 are very similar, and some
features of Python 3 have been backported to Python 2. But in general, they remain not
quite compatible.

Python is a general purpose and high level programming language. You can
use Python for developing desktop GUI applications, websites and web applications.
Also, Python, as a high level programming language, allows you to focus on core
functionality of the application by taking care of common programming tasks.

- 28 -
Both Python 2 and 3 have continued to be maintained and developed, with
periodic release updates for both. As of this writing, the most recent versions available
are 2.7.15 and 3.6.5. However, an official End of Life Date of january 1 2020, has been
established for Python 2, after which time it will no longer be maintained

5.2 Libraries Used:

PANDAS:-

pandas is a Python package providing fast, flexible, and expressive data structures
designed to make working with “relational” or “labeled” data both easy and intuitive.
It aims to be the fundamental high-level building block for doing practical, real world data
analysis in Python.

NLTK:-

Natural Language Processing with Python NLTK is one of the leading platforms for
working with human language data and Python, the module NLTK is used for natural
language processing. NLTK is literally an acronym for Natural Language Toolkit. In this
article you will learn how to tokenize data (by words and sentences).

- 29 -
RANDOM:-

You can generate random numbers in Python by using random


module. Python offers random module that can generate random numbers. These are
pseudo-random number as the sequence of number generated depends on the seed. If
the seeding value is same, the sequence will be the same.

SKLEARN:

Scikit-learn is a library in Python that provides many unsupervised and supervised learning
algorithms. It's built upon some of the technology you might already be familiar with, like
NumPy, pandas, and Matplotlib.

- 30 -
NUMPY:

NumPy is a very popular python library for large multi-dimensional array and matrix
processing, with the help of a large collection of high-level mathematical functions. It is
very useful for fundamental scientific computations in Machine Learning. It is particularly
useful for linear algebra, Fourier transform, and random number capabilities. High-end
libraries like TensorFlow uses NumPy internally for manipulation of Tensors.

SCIPY:

SciPy is a very popular library among Machine Learning enthusiasts as it contains


different modules for optimization, linear algebra, integration and statistics. There is a
difference between the SciPy library and the SciPy stack. The SciPy is one of the core
packages that make up the SciPy stack. SciPy is also very useful for image manipulation.

Tensor flow:

TensorFlow is a very popular open-source library for high performance numerical


computation developed by the Google Brain team in Google. As the name suggests,
Tensorflow is a framework that involves defining and running computations involving
tensors. It can train and run deep neural networks that can be used to develop several AI
applications. TensorFlow is widely used in the field of deep learning research and
application.

- 31 -
Keras:

Keras is a very popular Machine Learning library for Python. It is a high-level neural
networks API capable of running on top of TensorFlow, CNTK, or Theano. It can run
seamlessly on both CPU and GPU. Keras makes it really for ML beginners to build and
design a Neural Network.

- 32 -
5.3 ALGORITHMS USED:

5.3.1 Decision Tree Learning Algorithm:

A decision tree is a tree-like graph with nodes representing the place where we pick
an attribute and ask a question; edges represent the answers the to the question; and the
leaves represent the actual output or class label. They are used in non-linear decision
making with simple linear decision surface.

Decision trees classify the examples by sorting them down the tree from the root to
some leaf node, with the leaf node providing the classification to the example. Each node
in the tree acts as a test case for some attribute, and each edge descending from that
node corresponds to one of the possible answers to the test case. This process is recursive
in nature and is repeated for every subtree rooted at the new nodes.

Decision Tree the major challenge is to identification of the attribute for the root
node in each level. This process is known as attribute selection. We have two popular
attribute selection measures:

1. Information Gain
2. Gini Index
1. Information Gain
When we use a node in a decision tree to partition the training instances into smaller
subsets the entropy changes. Information gain is a measure of this change in entropy.

Entropy
Entropy is the measure of uncertainty of a random variable, it characterizes the impurity of
an arbitrary collection of examples. The higher the entropy more the information content

2. Gini Index

- 33 -
 Gini Index is a metric to measure how often a randomly chosen element would be
incorrectly identified.
 It means an attribute with lower Gini index should be preferred.
 Sklearn supports “Gini” criteria for Gini Index and by default, it takes “gini”
value.
 The Formula for the calculation of the of the Gini Index is given below.

5.3.2 Naive Bayes Classifier:

The naive Bayes classifier applies to learning tasks where each instance x is
described by a conjunction of attribute values and where the target function f (x) can take
on any value from some finite set V. A set of training examples of the target function is
provided, and a new instance is presented, described by the tuple of attribute values (al,
a2.. .a,). The learner is asked to predict the target value, or classification, for this new
instance. The Bayesian approach to classifying the new instance is to assign the most
probable target value, VMAP, given the attribute values (al, a2 . . . a,) that describe the
instance.

The naive Bayes classifier is based on the simplifying assumption that the attribute
values are conditionally independent given the target value. In other words, the
assumption is that given the target value of the instance, the probability of observing the
conjunction al, a2.. .a, is just the product of the probabilities for the individual attributes:
P(a1, a2 . . . a, 1 vj) = ni P(ai lvj). Substituting this into Equation (6.19), we have the
approach used by the naive Bayes classifier.

- 34 -
where VNB denotes the target value output by the naive Bayes classifier. Notice that
in a naive Bayes classifier the number of distinct P(ailvj) terms that must be estimated
from the training data is just the number of distinct attribute values times the number of
distinct target values-a much smaller number than if we were to estimate the P(a1, a2 . . .
an lvj) terms as first contemplated.

5.4 Source Code:

import pandas as pd
import random
import re
import nltk

dataset = pd.read_csv("reviews.csv",sep="\t")
Dataset

def getSentiment(text):
# PREPROCESSING THE DATASET
text = str(text)
text = text.lower()
text = re.sub(r"that's","that is",text)
text = re.sub(r"there's","there is",text)
text = re.sub(r"what's","what is",text)
text = re.sub(r"where's","where is",text)
text = re.sub(r"it's","it is",text)
text = re.sub(r"who's","who is",text)
text = re.sub(r"i'm","i am",text)
text = re.sub(r"she's","she is",text)
- 35 -
text = re.sub(r"he's","he is",text)
text = re.sub(r"they're","they are",text)
text = re.sub(r"who're","who are",text)
text = re.sub(r"ain't","am not",text)
text = re.sub(r"wouldn't","would not",text)
text = re.sub(r"shouldn't","should not",text)
text = re.sub(r"can't","can not",text)
text = re.sub(r"couldn't","could not",text)
text = re.sub(r"won't","will not",text)

text = re.sub(r"\W"," ",text)


text = re.sub(r"\d"," ",text)
text = re.sub(r"\s+[a-z]\s+"," ",text)
text = re.sub(r"^[a-z]\s+"," ",text)
text = re.sub(r"\s+[a-z]$"," ",text)
text = re.sub(r"\s+"," ",text)

sent = clf.predict(tfidf.transform([text]).toarray())

return sent[0]

#1. Reviews from same IP on the same day with all the reviews are either positive or negative.

ip_group = dataset.groupby("IP Address")


# grouping the dataset by ip addresses

ip_list = dataset["IP Address"].unique().tolist()


# stores the list of unique ip addresses

- 36 -
size = len(ip_list)
# total no of unique ip addresses

for i in range(size):
# iterate through all the ip addresses

reviews = ip_group.get_group( ip_list[i] )


# dataframe of each ip

dates_list = reviews["review_date"].unique().tolist()
# list of dates of reviews by each ip addresses

reviews_by_date = reviews.groupby("review_date");
# grouping the dataframe by date

for j in range(len(dates_list)):
# iterate through all the dates

reviews_by_date_for_pos = []
reviews_by_date_for_neg = []

reviews_for_each_day = reviews_by_date.get_group(dates_list[j])
#dataframe of reviews for a day by each ip addresses

indices = reviews_for_each_day.index.tolist()
# list of indices of the dataframe reviews_for_each_day
for k in range(len(reviews_for_each_day)):
#iterate through all the reviews on a day by each ip addresses

text = reviews_for_each_day["review_body"][ indices[k] ]


# reviews on a day for an ip addresses
- 37 -
if(getSentiment(text) == 0):

#if sentiment is negative, append review_id to list of negative reviews


reviews_by_date_for_neg.append(reviews_for_each_day["review_id"][ indices[k] ])
else:

#if sentiment is positive, append review_id to list of positive reviews


reviews_by_date_for_pos.append(reviews_for_each_day["review_id"][ indices[k] ])

# CONDITION FOR CONSIDERING THE FAKE REVIEW

#removing postive reviews that are written by a reviewer that are > 3 on same day
if(len(reviews_by_date_for_pos)>3):
remove_reviews.extend(reviews_by_date_for_pos)

#removing postive reviews that are written by a reviewer that are > 3 on same day
if(len(reviews_by_date_for_neg)>3):
remove_reviews.extend(reviews_by_date_for_neg)

#2. different sentiment in review headline and review body

remove_reviews = []
# stores the list of review_id of fake reviews

for i in range(len(dataset)):
#iterate through the whole dataset

if( getSentiment( dataset["review_headline"][i] ) != getSentiment( dataset["review_body"][i] ) ):


# checking if the sentiment of the body and the headline are not same
- 38 -
remove_reviews.append(dataset["review_id"][i])
# append review_id to the list of fake reviews.

from nltk.corpus import stopwords


from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

dataset.reset_index()
dataset.set_index("review_id")
dataset.sort_values("timestamp",inplace=True)

def OnlyStopwords(str):
words = nltk.word_tokenize(str)
words = [word for word in words if word not in stopwords.words("english")]
if(len(words)==0):
return True
return False

from nltk.corpus import wordnet

remove_reviews = []
indices = []
for i in range(len(dataset)):

reviews = [str(dataset["review_body"][i])]

- 39 -
try:
tfidf_vectorizer.fit_transform(reviews)
except:
# reviews with one word and with no dictionary meaning will be invalid
# e.g- ["c","O.K."]
remove_reviews.append(dataset["review_id"][i])
continue

Time = dataset["timestamp"][i]
# timestamp of the review that will be compared

for j in range(i+1,len(dataset)):

indices.append(dataset["review_id"][j])

if(dataset["timestamp"][j]-Time <= 1800):


# reviews written in 30 min of intervals will be checked for same pattern
reviews.append(str(dataset["review_body"][j]))
else:
break

tfidf_matrix = tfidf_vectorizer.fit_transform(reviews)

#creates TF-IDF Model


tfidf_list = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix).tolist()
# Creates matrix based on document similarity

# To check similarity b/w 2 reviews


i_appended = False
for k in range(1,len(tfidf_list[0])):
#print(tfidf_list[0][k],i+k)
- 40 -
if(tfidf_list[0][k]>0.6):
# 0.6 is defind for the simmilarity level

remove_reviews.append(dataset["review_id"][i+k])
# i+k is to get the review id of the review

if(not i_appended):
remove_reviews.append(dataset["review_id"][i])
i_appended = True

remove_reviews

dataset = dataset.set_index("IP Address")

dataset.drop(remove_ip,inplace=True)

dataset.to_csv("real_reviews.csv",sep="\t")
dataset

- 41 -
6.SYSTEM TESTING

6.1 TYPES OF TESTING

6.1.1 UNIT TESTING:

Unit testing is performed for testing modules against detailed design.Inputs to the
process are usually compiled modules from the coding process.Each modules are
assembled into a larger unit during the unit testing process.

Testing has been performed on each phase of project design and coding.We carry out
the testing of module interface to ensure the proper flow of information into and out of
the program unit while testing.We make sure that the temporarily stored data maintains
the integrity throughout the algorithms execution by examining the local data
structure.Finally all error handling paths re also tested.

6.1.2 SYSTEM TESTING:

We usually perform system testing to find errors resulting from unaticipated interaction
between the sub-system and system components.Software must be tested to detect and
rectify all possible errors once the source code is generated delivering it to the
customers.For finding errors,series of test cases must be developed which ultimately
uncover all the possibly existing errors.Different software techinques can be used for this
process.We test the software using two methods:

White Box testing:Internal program logic is exercised using the test case design
techniques.
- 42 -
Black Box testing:Software requirements are exercised using the test case design
techniques.

6.1.3 PERFORMANCE TESTING:

It is done to test the run-time performance of the software within the context of
integrated system.These tests are carried out throughout the testing process.For
example,the performance of the individual modules are accessed during white box testing
under unit testing.

6.1.4 VERIFICATION AND VALIDATION:

Verification and validation are two different things.One is performed to ensure the the
software correctly implements a specific functionality and other is done to ensure if the
customer requirements are properly met or not by the end product.Verification is more
like are we buliding the product right and validation is more like are we buliding the
right product.

- 43 -
6.2 TEST CASES:

A Test Case is a set of actions executed to verify a particular feature or functionality of your software
application. The Test Case has a set test data, precondition, certain expected and actual results developed for
specific test scenario to verify any requirement.A test case includes specific variables or conditions, using
which a test engineer can determine as to whether a software product is functioning as per the requirements of
the client or the customer.

A Test Scenario is defined as any functionality that can be tested. It is a collective set of test cases
which helps the testing team to determine the positive and negative characteristics of the project.Test Scenario
gives a high-level idea of what we need to test.

- 44 -
- 45 -
- 46 -
- 47 -
- 48 -
CONCLUSION

Now a days technology is growing day by day and there are so many website and
application are available in the online market by which they sell their product and on that
products there are millions of reviews available on base of reviews user buy the product
most of the time. There are some organization which posting fake reviews on fake product
.or on genuine product and user gets stuck.

Our application which will help the user to pay for the right product without any
getting into any scams. Our application will do analysis[2] and then post the genuine
reviews on genuine product. And user can be sure about the products availability on that
application and reviews too.

FUTURE SCOPE

Finding the opinion spam from huge amount of unstructured data has become an
important research problem. Now business organizations, specialists and academics are
putting forward their efforts and ideas to find the best system for opinion spam analysis.
Although, some of the algorithms have been used in opinion spam analysis gives good

- 49 -
results, but still no algorithm can resolve all the challenges and difficulties faced by today’s
generation. More future work and knowledge is needed on further improving the
performance of the opinion spam analysis.In the future we will do further investigate
different kinds of features to make more accurate predictions.

- 50 -
REFERENCES

1. Cambria, E; Schuller, B; Xia, Y; Havasi, C (2013). "New avenues in opinion mining and sentiment
analysis". IEEE Intelligent Systems. 28 (2): 15–21. doi:10.1109/MIS.2013.30.

2. Hamzah Al Najada; Xingquan Zhu, “iSRD: Spam review detection with imbalanced data
distributions”, Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and
Integration (IEEE IRI 2014), ISBN (e): 978-1-4799-5880-1.\

3. Jeneen Interlandi (February 8, 2010). "The fake-food detectives". Newsweek. Archived from the
original on October 21, 2010.

4. Shashank Kumar Chauhan, Anupam Goel, Prafull Goel, Avishkar Chauhan and Mahendra K Gurve,
“Research on product review analysis and spam review detection”, 4th International Conference on Signal
Processing and Integrated Networks(SPIN) 2017, ISBN (e):978-1-50902797-2, September-2017, pp. 1104-
1109.

5. Ruxi Yin, Hanshi Wang and Lizhen Liu, “Research of integrated algorithm establishment of spam
detection system”, 4th International Conference on Computer Science and Network Technology
(ICCSNT) 2015, ISBN (e): 978-1-4673-8173-4, pp. 390-393 .

6. SP.Rajamohana, Dr.K.Umamaheshwari, M.Dharani, R.Vedackshya, “A survey on online review


SPAM detection techniques”, International Conference on Innovations in Green Energy and Healthcare
Technologies (IGEHT) 2017, ISBN(e): 978-1-5090-57788. ”, International Conference on Innovations in
Green Energy and Healthcare Technologies (IGEHT) 2017, ISBN(e): 978-1-5090-5778-8.

- 51 -
7. Rajashree S. Jadhav, Prof. Deipali V. Gore, "A New Approach for Identifying Manipulated Online
Reviews using Decision Tree ". (IJCSIT) International Journal of Computer Science and Information
Technologies, Vol. 5 (2), pp 1447-1450, 2014.

8. Rajashree S. Jadhav, Prof. Deipali V. Gore, "A New Approach for Identifying Manipulated Online
Reviews using Decision Tree ". (IJCSIT) International Journal of Computer Science and Information
Technologies, Vol. 5 (2), pp 1447-1450, 2014.

9. Long- Sheng Chen, Jui-Yu Lin, “A study on Review Manipulation Classification using Decision Tree",
Kuala Lumpur, Malaysia, pp 3-5, IEEE conference publication, 2013.

10. Benjamin Snyder and Regina Brazil, “Multiple Aspect ranking using the Good Grief Algorithm
“Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology2007.

11. Ivan Tetovo, “A Joint Model of Text and Aspect Ratings for Sentiment Summarization “Ivan
Department of Computer Science University of Illinois at Urbana, 2011.

12. N. Jindal and B. Liu, “Analyzing and detecting review spam,” International Conference on Web
Search and Data Mining, 2007, pp. 547-552.

13. N. Jindal and B. Liu, “Opinion spam and analysis,” International Conference on Web Search and Data
Mining, 2008, pp. 219-230

- 52 -
- 53 -

Vous aimerez peut-être aussi