Amazon Reviews Sentimental Analysis
Shruti Sharma1, Salis Kumar Nayak2, Ankita Nayak3, Swaroop Nayak4, Prativa Das5
Department of Computer Science and Engineering, Siksha ‘O’ Anusandhan (Deemed to be)
University, Bhubaneswar, Odisha, India
[email protected]
[email protected]
[email protected]
[email protected]
Abstract-The goal of this project is to create a tool or algorithm that is effective
for analysing client feedback using sentiment analysis methods. The initiative
attempts to classify input, spot trends, and produce reports for development. It
uses Python and the NLTK package for text analysis and trains a sentiment
prediction model on labelled data. The technology created by the project
effectively analyses and classifies client input and produces recommendations
for improvement. It is only available in text feedback and online markets.
Future studies may examine more modalities and techniques for feedback
analysis. In general, the initiative helps businesses by offering a useful tool for
consumer feedback analysis, enabling changes to products and services.
Keywords: Machine learning, NLTK library, online marketplaces, algorithm
1 Introduction
The "Amazon Reviews Sentiment Analysis" project aims to automatically analyze
consumer feelings as they are presented in Amazon product or service reviews. The
research develops a model to categorize the sentiment of each review using
preprocessing the data and classification methods like logistic regression and K-
nearest neighbors. The method helps companies to learn crucial information about
client preferences and potential areas for growth, improving customer happiness and
competitiveness. Businesses may use data-driven choices and consumer feedback to
enhance their goods or services by including sentiment analysis into their feedback
monitoring systems. Overall, the research provides a useful method for drawing
insightful conclusions from Amazon user evaluations
2
1.1 Motivation
The project on Amazon sentiment analysis is motivated by the need to gain
valuable customer insights, achieve a competitive advantage, and make data-
driven decisions. By analyzing sentiment in Amazon reviews, businesses can
understand customer preferences, improve products and services, and
differentiate themselves from competitors. It also helps in managing brand
reputation, guiding research and development efforts, and obtaining market
intelligence. Overall, the project aims to harness the power of sentiment
analysis to enhance customer satisfaction, drive business growth, and stay
ahead in the dynamic e-commerce landscape.
1.2 Objectives
The objectives of the project on Amazon sentiment analysis are to develop an
accurate machine learning model for predicting sentiment in Amazon reviews,
gain insights into customer preferences and satisfaction levels, improve
products and services based on customer feedback, differentiate the business
from competitors using sentiment analysis, make data-driven decisions in
pricing, marketing, and product development, manage brand reputation by
addressing negative sentiment, guide research and development efforts based
on customer sentiment, obtain market intelligence through sentiment analysis,
and drive business growth and customer loyalty through an enhanced
understanding of customer sentiment.
1.3 Original Contributions
The goal of this project is to create a tool or algorithm that is effective for
analysing client feedback using sentiment analysis methods. The initiative
attempts to classify input, spot trends, and produce reports for development. It
uses Python and the NLTK package for text analysis and trains a sentiment
prediction model on labelled data. The technology created by the project
effectively analyses and classifies client input and produces recommendations
for improvement. It is only available in text feedback and online markets.
Future studies may examine more modalities and techniques for feedback
analysis. In general, the initiative helps businesses by offering a useful tool for
consumer feedback analysis, enabling changes to products and services.
3
1.4 Paper Layout
In the subsequent sections of this paper, we delve deeper into various aspects of
automated sentiment analysis for Amazon reviews. The literature survey provides
a comprehensive overview of the topic. We discuss in detail the topics in brief
which include Section 2 reviews the literature survay in this area, Section 3
presents the proposed system used in the code, Section 4 shows the experiment
and evaluation and related discussions. And, finally, Section 5 holds our
conclusions with future scope.
2 Literature Survey
In [1] author’s emphasis on Amazon review comments on the Amazon.in platform
was sentiment analysis. To analyze the sentiment represented in the comments, they
used SVM (Support Vector Machines) and NB (Naive Bayes) natural language
processing algorithms. The study found that these strategies are effective in
deciphering customer sentiments in Amazon reviews, with a sentiment classification
accuracy of 76%.
In [2] author examined Naive Bayes and Support Vector Machine (SVM) as two
machine learning techniques for analyzing the sentiment of Amazon book reviews.
They obtained a dataset of 147,000 reviews from Amazon.in and extracted the
features using TF-IDF vectorizer. The Naive Bayes technique achieved an accuracy
of 82.875%, demonstrating its competitive performance in sentiment analysis of
Amazon book reviews, while the SVM classifier achieved an accuracy rate of 84%,
suggesting its ability in effectively polarizing feelings.
In [3] authors used Nave Bayes classifier which was utilized to focus on sentiment
evaluation. For their analysis, they used information from the UCI Repository. The
study's outstanding 89% classification accuracy rate for sentiments shows that the
Naive Bayes method is effective at assessing sentiments appropriately.
In [4] authors focused on the dual sentiment analysis of Amazon product data related
to books, mobile phones, and clothing. For sentiment analysis, they used the Naive
Bayes, SVM, and Bag-of-Words (BoW) model. The study's results for the SVM
classifier showed an excellent accuracy of 91%, demonstrating its efficacy in
precisely analyzing dual feelings. In this situation, the Naive Bayes technique
performed substantially worse than SVM with an accuracy of only 66.8%.
3 Proposed System/Model
4
With the use of Amazon reviews and the suggested approach, useful insights into
buyer sentiment will be gleaned. The algorithm will categorise reviews as favourable,
negative or positive based on their textual content. Businesses will be able to
comprehend client input, pinpoint areas for development, and make data-driven
choices thanks to this study. The technology will enable firms to acquire deeper
insights into consumer sentiment and improve their products or services as a result of
its capacity to process a high volume of evaluations effectively
3.1 Methodologies Used
This project's approach for sentiment analysis of Amazon reviews includes a
number of crucial components. First, information about Amazon reviews is
gathered and preprocessed, which involves editing the content, deleting
extraneous details, and putting the information into a format that is appropriate
for analysis.
The textual input is then transformed into numerical representations using feature
extraction algorithms. The reviews' key characteristics and patterns are captured
using techniques like TF-IDF (Term Frequency-Inverse Document Frequency) or
word embeddings.
After the characteristics have been retrieved, classification techniques are used to
categories the reviews' sentiment. To create models that can correctly predict the
sentiment, popular methods like Logistic Regression, Naive Bayes, or KNN is
used. These models are used to categories fresh, unread reviews after being
trained using labelled data, where the sentiment of reviews is known.
Programming languages like Python and packages like NLTK (Natural Language
Toolkit) and scikit-learn are used to implement the technique. These technologies
offer effective ways to preprocess data, extract features, and train models. The
project's goal is to automate the sentiment analysis process using this technique,
allowing companies to learn from Amazon reviews and make data-driven
decisions to enhance their goods and services.
5
Fig. 1. The above figure shows the methodology steps.
3.2 Schematic Layout of the proposed system
Fig. 2 . The above figure gives a proposed model layout
6
3.3 System Requirements
Operating system: The hardware ought to work with well-known OSes
including Windows, macOS, and Linux.
Programming Language: Python, which provides a large selection of libraries
and tools for natural language processing and machine learning, should be used
to construct the system.
Database: The gathered review data may be stored and managed using a
database management system like MySQL or MongoDB.
Data collection: Using web scraping techniques or the Amazon API, the system
should be able to scrape and gather Amazon reviews from a variety of product
categories.
Scalability and Performance: The system must be built to handle enormous
amounts of review data effectively and offer appropriate sentiment analysis
processing times.
Integration: To give a thorough examination of customer attitudes and
preferences, the system may relate to other corporate analytics tools or platforms.
3.4 Proposed Algorithms
K-Nearest Neighbours
K-Nearest Neighbours (KNN) is a straightforward yet efficient machine learning
technique used for classification applications, such as sentiment analysis. KNN
uses a new data point's closeness to the labelled data points in the training set to
categories it. The K nearest neighbors are chosen after calculating the distances
between the new data point and all the labelled data points. Following that, the
new data point's class is decided by a vote of its K neighbors. KNN is a flexible
technique that works well for sentiment analysis jobs where the underlying
sentiment patterns may change since it is a non-parametric approach that makes
no assumptions about the data distribution.
7
Fig. 3. KNN Algorithm
Logistic Regression
A common classification approach in machine learning, including sentiment
analysis applications, is logistic regression. By estimating the probabilities of
various classes, it represents the link between the independent factors and the
binary or categorical dependent variable. Instead of using a linear function,
logistic regression converts the input variables to a probability between 0 and
1. To reduce the discrepancy between the projected probability and the
actual class labels, the method optimizes the coefficients of the independent
variables. Due to its ease of use, readability, and capacity to handle both
linear and non-linear correlations in the data, logistic regression is often
utilized. As it can forecast the likelihood of a positive or negative sentiment
depending on input variables, it is highly suited for sentiment analysis.
8
Fig. 4. Graph showing the logistic algorithm implementation
Having a uniform boundary helps to predict the value of the predicting data using
the differentiation caused by the boundary which helps predict the next value of
the given input.
4 Experimentation and Model Evaluation
4.1 Depiction Results
We used different accuracy measurements for evaluating the prediction rate of
our system.
Accuracy: The sentiment analysis model achieves an accuracy of 87% with
Logistic Regression and 89% with KNN, indicating that it correctly predicts the
sentiment of reviews 89% of the time.
9
Comparative Analysis: When comparing the sentiment analysis performance
across different product categories, it is found that the model achieves higher
accuracy and precision for Jewelries and beauty (90%) compared to Electronics
and Gadgets (75%).
Fig. 5. Histogram on some amazon reviews
4.2 Validation/System Performance Evaluation
System Performance Evaluation: Investigating a dataset's sentiment distribution in
the context of validation and system performance evaluation might yield useful
information. According to the data, 89% of the evaluations show positive
sentiment, demonstrating a high degree of satisfaction with the system's
functionality. In order to improve the overall system performance and user
experience, it is imperative to address the 10% of unfavorable evaluations that do
exist. Recognizing this sentiment distribution makes it easier to assess and validate
the system's performance.
10
Fig. 9. The graph of evaluation under testing against all review.
4.3 Discussions on Contributions
Each team member was essential in various elements of the project, contributing
their own talents and knowledge. They made contributions in the areas of data
gathering, preprocessing, choosing an algorithm, creating a model, and outcome
analysis. The team's successful implementation of an efficient sentiment analysis
system for Amazon reviews was the result of teamwork. The conversations focus
on the precise contributions made by each team member, showcasing both their
unique skills and the collective effect of their work. This section highlights the
team's overall contribution to attaining the project's goals and highlights the
importance of working together to complete challenging tasks.
5 Conclusion and Future Scope
11
In conclusion, Amazon review sentiment analysis was successful in eliciting
consumer opinion. In order to reliably classify feelings, the team created a solid
framework employing preprocessing, feature extraction, and classification algorithms.
The analysis gave firms useful information to improve goods and services in
accordance with client preferences. Looking towards the future, there are several
areas of potential expansion and enhancement like :
Extension to social media and online review sites: Including more platforms
in the study will give us a clearer picture of client attitudes overall.
Enhancing language processing skills will increase accuracy and application
by giving the model the capacity to handle many languages, visuals, and
human subtleties.
Investigating the use of sentiment analysis in recommendation systems for
products would allow for personalised suggestions based on consumer
sentiments, improving the overall customer experience.
Thanks to these developments, customer happiness will be further increased, and
business growth will be fueled by insightful information from Amazon review
sentiment analysis.
References
[1] Juyal, Prachi. "Sentimental analysis of amazon customers based on their review
comments." 2022 International Interdisciplinary Humanitarian Conference for
Sustainability (IIHC). IEEE, 2022.
[2] Dey, Sanjay, et al. "A comparative study of support vector machine and Naive
Bayes classifier for sentiment analysis on Amazon product reviews." 2020
International Conference on Contemporary Computing and Applications (IC3A).
IEEE, 2020.
[3] Surya, Prabha PM, and B. Subbulakshmi. "Sentimental analysis using Naive
Bayes classifier." 2019 International conference on vision towards emerging trends in
communication and networking (ViTECoN). IEEE, 2019.
[4] D’souza, Stephina Rodney, and Kavita Sonawane. "Sentiment analysis based on
multiple reviews by using machine learning approaches." 2019 3rd International
Conference on Computing Methodologies and Communication (ICCMC). IEEE,
2019.