0% found this document useful (0 votes)

129 views

Sentiment Analysis On Movie Reviews: Natural Language Processing UML602 Project Report

This document summarizes a project report on sentiment analysis of movie reviews. The report discusses how sentiment analysis was performed on movie reviews from the NLTK movie reviews corpus using various natural language processing techniques. Three main approaches to preprocessing the data were explored: using the 2000 most frequent words, bag-of-words modeling, and bi-gram modeling. Naive Bayes classification was used to train models on the preprocessed data. Accuracy improved from 70% using unigrams only to 77% when combining unigram and bigram features. Potential applications of sentiment analysis discussed include brand monitoring, reputation management, and customer support.

Uploaded by

Himanshu Pandey

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

129 views

Sentiment Analysis On Movie Reviews: Natural Language Processing UML602 Project Report

Uploaded by

Himanshu Pandey

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

SENTIMENT ANALYSIS ON MOVIE REVIEWS

Natural Language Processing UML602

Project Report

BE Third Year, COE

Submitted by:

101603120 Himanshu Dhiman

101603125 Himanshu Pandey

Submitted to:

Dr. Aashima Sharma

Computer Science and Engineering Department

TIET, Patiala
April, 2019
1. INTRODUCTION

Sentiment Analysis means analyzing the sentiment of a given text or document and categorizing
the text/document into a specific class or category (like positive and negative). In other words, we
can say that sentiment analysis classifies any particular text or document as positive or negative.
Basically, the classification is done for two classes: positive and negative. By definition Sentiment
analysis refers to the use of natural language processing, text analysis, computational linguistics,
and biometrics to systematically identify, extract, quantify, and study affective states and
subjective information. Sentiment Analysis is also referred as Opinion Mining. It’s mostly used in
social media and customer reviews data.

1.1 Steps involved during sentiment analysis

Figure 1.1
1.2 Libraries used

Natural Language Toolkit (NLTK)

NLTK is a leading platform for building Python programs to work with human language data. It
provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along
with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing,
and semantic reasoning.

2. STEPS OF WORKING

In this project, NLTK’s movie_reviews corpus is used as our labeled training data. The
movie_reviews corpus contains 2,000 movie reviews with sentiment polarity classification. The
two categories for classification are - positive and negative. The movie_reviews corpus already
has the reviews categorized as positive and negative. The reviews are categorized using supervised
classification technique. In supervised classification, the classifier is trained with labeled training
data.

The below shown figure depicts the working followed during training and testing of the model.

Figure 2.2
2.1 Pre-processing of data

Three different ways are used pre-process the data to achieve maximum training and testing
accuracy.

2.1.1 Using 2000 most frequently occurring words:

1. Convert movie review data into useful format

2. Remove Stopwords and Punctuation

3. Create word feature using 2000 most frequently occurring words

2.1.2 Bag of words feature

1. Create unique list based on positive and negative review

2. Shuffle both list separately and add equal no of reviews

3. Train classifier and test model

2.1.3 n-gram feature

1. Create unique list based on positive and negative review

2. We define two functions

bag_of_words: that extracts only unigram features from the movie review words

bag_of_ngrams: that extracts only bigram features from the movie review words

We then define another function

bag_of_all_words: that combines both unigram and bigram features

4. Train classifier and test model

2.2 Training of model

The model is trained using NLTK’s Naïve Bayes Classifier which is an in-built classifier of the
module. It’s a simple, fast, and easy classifier which performs well for small datasets. It’s a
simple probabilistic classifier based on applying Bayes’ theorem. Bayes’ theorem describes the
probability of an event, based on prior knowledge of conditions that might be related to the
event.
2.3 Testing of model

The model accuracy is tested on training data as well as on custom data input by the user.

3. CODE

Pre-processing of data

The below shown code creates frequency distribution of all the words in the document and removes
stop-words and punctuations from the text and as a result data is cleaned and cleaned words are
added to a new list.

Figure 3.1
Creating document feature using top-N occurring words

The below shown code creates the document feature using 2000 frequently occurring words and
then trains the model using Naïve Bayes classifier and prints the accuracy of the model.

Figure 3.2
Creating feature word using bag of words method

The code shown below categorizes the text as positive and negative in different lists which helps
to reduce positive and negative data in separately and then pre-processes the data.

Figure 3.3
Bi-Gram Feature

In bag of words feature extraction, we used only unigrams. In the example below, we will use
both unigram and bigram feature, i.e. we will deal with both single words and double words.

Figure 3.4
Training the model

After pre-processing, the created feature sets are trained using NLTK’s Naïve Bayes classifier.

Figure 3.5

Figure 3.6
4. Results

top-N most frequently occurring words –

Figure 4.1

We can see that custom negative reviews are categorized accurately but in case of positive
custom review we get inaccurate results.

In the top-N feature, we only used the top 2000 words in the feature set.

We combined the positive and negative reviews into a single list, randomized the list, and then
separated the train and test set.

This approach can result in the un-even distribution of positive and negative reviews across the
train and test set.
Bag of words Feature –

Figure 4.2

Now using bag of words feature we get appropriate results on custom test reviews but the overall
accuracy of the model is decreased to 70%
Bi-gram Feature –

Figure 4.3

The accuracy of the classifier has significantly increased when trained with combined feature set
(unigram + bigram).

Accuracy was 70% while using only Unigram features.

Accuracy has increased to 77% while using combined (unigram + bigram) features.
5. Applications & Future Scope

5.1 Brand Monitoring - or you could also call it Reputation management. We all know how
much good reputation means these days when the majority of us check social media reviews as
well as review sites before making a purchase decision.

5.2 Customer support - Social media are channels of communication with your customers
these days, and whenever they’re unhappy about something related to you, whether or not
it’s your fault, they’ll call you out on Facebook/Twitter/Instagram.

Such mentions will appear in your dashboard with a flashing red color, and you better start
engaging them as soon as they are there.

People nowadays expect brands to respond on social media almost immediately, and if
you’re not quick enough, you might as well see them moving on to your competitors instead
of waiting for your reply.

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
Procedural Generation in Game Design
93% (14)
Procedural Generation in Game Design
339 pages
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
SE Unit 3
No ratings yet
SE Unit 3
10 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Pham-Han-2023-Natural-Language-Processing-With-Multitask-Classification-For-Semantic-Prediction-Of-Risk-Handling - (For Sentiment Analysis)
No ratings yet
Pham-Han-2023-Natural-Language-Processing-With-Multitask-Classification-For-Semantic-Prediction-Of-Risk-Handling - (For Sentiment Analysis)
19 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
A Coomer's Guide To AI Dungeon
No ratings yet
A Coomer's Guide To AI Dungeon
30 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
The Fabric of Reality
100% (1)
The Fabric of Reality
6 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Challenges and Scope of Data Science Project
No ratings yet
Challenges and Scope of Data Science Project
21 pages
Project Report "E-Commerce Recommendation"
No ratings yet
Project Report "E-Commerce Recommendation"
20 pages
Question 1 (10 Marks) : Student ID - 2015046 Student Name Sushan Rajchal Unit Code CIS007
0% (1)
Question 1 (10 Marks) : Student ID - 2015046 Student Name Sushan Rajchal Unit Code CIS007
4 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
Online Reputation Management
100% (1)
Online Reputation Management
205 pages
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
No ratings yet
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
27 pages
East West Institute of Technology: Sadp Notes
No ratings yet
East West Institute of Technology: Sadp Notes
30 pages
Practical Lab File Based ON Programing in C: Submitted by
No ratings yet
Practical Lab File Based ON Programing in C: Submitted by
6 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
5 pages
Sentiment Analysis For Movie Reviews
No ratings yet
Sentiment Analysis For Movie Reviews
3 pages
Ieee Paper
No ratings yet
Ieee Paper
5 pages
Brochure for ATAL Workshop
No ratings yet
Brochure for ATAL Workshop
3 pages
Da Unit-2
No ratings yet
Da Unit-2
23 pages
Digital Notes: (Department of Computer Applications)
No ratings yet
Digital Notes: (Department of Computer Applications)
14 pages
Introduction To Computer Vision
No ratings yet
Introduction To Computer Vision
10 pages
Image Super Resolution Report
No ratings yet
Image Super Resolution Report
12 pages
Final Twitter - Sentiment - Analysis - Report
100% (1)
Final Twitter - Sentiment - Analysis - Report
14 pages
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
Mutual Fund Performance Analyser
No ratings yet
Mutual Fund Performance Analyser
24 pages
SOC Lab Manual
No ratings yet
SOC Lab Manual
11 pages
Machine Learning Lab Manual 7
100% (1)
Machine Learning Lab Manual 7
8 pages
355955B30 Siddesh Mahind SMA Exp-5
No ratings yet
355955B30 Siddesh Mahind SMA Exp-5
11 pages
Clouds and Big Data Computing
No ratings yet
Clouds and Big Data Computing
13 pages
Wa0002.
No ratings yet
Wa0002.
29 pages
AI-UNIT-1 PPT
No ratings yet
AI-UNIT-1 PPT
149 pages
HCI Notes - Unit 1
No ratings yet
HCI Notes - Unit 1
15 pages
Convolution Neural Networks U2
No ratings yet
Convolution Neural Networks U2
24 pages
Sign Language Recognition Using Deep Learning
No ratings yet
Sign Language Recognition Using Deep Learning
6 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
Master of Computer Applications (MCA) Course
No ratings yet
Master of Computer Applications (MCA) Course
58 pages
PPT1
No ratings yet
PPT1
93 pages
Deep Learning in Healthcare
No ratings yet
Deep Learning in Healthcare
23 pages
Text Summarization Using NLP
No ratings yet
Text Summarization Using NLP
6 pages
Calendar Functions in Python
No ratings yet
Calendar Functions in Python
3 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
Sentiment Analysis Over Online Product Reviews A Survey
No ratings yet
Sentiment Analysis Over Online Product Reviews A Survey
9 pages
Java Notes-Ii CS
No ratings yet
Java Notes-Ii CS
265 pages
UNIT IV (Well Posed Leaning Problems)
100% (1)
UNIT IV (Well Posed Leaning Problems)
16 pages
Education Website
No ratings yet
Education Website
158 pages
NP Lab
No ratings yet
NP Lab
70 pages
Data Literacy Questions All Types
No ratings yet
Data Literacy Questions All Types
2 pages
Project Detecto!: A Real-Time Object Detection Model
No ratings yet
Project Detecto!: A Real-Time Object Detection Model
3 pages
IS 7118 Unit-5 POS Tagging
No ratings yet
IS 7118 Unit-5 POS Tagging
89 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
70 pages
Sentiment Analysis of Restaurant Customer
100% (1)
Sentiment Analysis of Restaurant Customer
6 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
Sepm Unit 3.... Roshan
No ratings yet
Sepm Unit 3.... Roshan
16 pages
Chapter - 1: The Project Plan of "Ghost in The Town"
No ratings yet
Chapter - 1: The Project Plan of "Ghost in The Town"
95 pages
Transaction With Replicated Data PDF
No ratings yet
Transaction With Replicated Data PDF
3 pages
RM4151 Class Notes3
No ratings yet
RM4151 Class Notes3
14 pages
Elementary Data Structures
No ratings yet
Elementary Data Structures
66 pages
OOSE Lab Report
No ratings yet
OOSE Lab Report
30 pages
Drowsiness Detection Using Python Opencv
No ratings yet
Drowsiness Detection Using Python Opencv
10 pages
MCS 021 Data and File Structures
No ratings yet
MCS 021 Data and File Structures
22 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
43 pages
Algorithms Notes 2 - TutorialsDuniya
No ratings yet
Algorithms Notes 2 - TutorialsDuniya
101 pages
Notes - Unit 3 - Map Reduce Applications
No ratings yet
Notes - Unit 3 - Map Reduce Applications
11 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Touchpad Plus Ver. 2.1 Class 2
From Everand
Touchpad Plus Ver. 2.1 Class 2
Team Orange
No ratings yet
Touchpad Plus Ver. 4.0 Class 3
From Everand
Touchpad Plus Ver. 4.0 Class 3
Nidhi Gupta
No ratings yet
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
No ratings yet
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
25 pages
Mythic Magazine #009
100% (3)
Mythic Magazine #009
27 pages
Improved Statistical Test
87% (171)
Improved Statistical Test
20 pages
Download Complete Artificial Intelligence and Problem Solving 1st Edition Danny Kopec PDF for All Chapters
100% (4)
Download Complete Artificial Intelligence and Problem Solving 1st Edition Danny Kopec PDF for All Chapters
61 pages
Algebra Workbook
100% (3)
Algebra Workbook
299 pages
Next Generation Sequencing Data Analysis
No ratings yet
Next Generation Sequencing Data Analysis
435 pages
Ghosh S. Mathematics and Computer Science Vol 1. 2023
No ratings yet
Ghosh S. Mathematics and Computer Science Vol 1. 2023
743 pages
Prompt Engineering - Links and Resources
No ratings yet
Prompt Engineering - Links and Resources
2 pages
A Methodology For Detecting Credit Card Fraud
No ratings yet
A Methodology For Detecting Credit Card Fraud
60 pages
Deep Thinking Where Machine Intelligence PDF
100% (1)
Deep Thinking Where Machine Intelligence PDF
3 pages
Scientific American - April 2024
100% (1)
Scientific American - April 2024
88 pages
Websites and Tools Links
No ratings yet
Websites and Tools Links
3 pages
List of Deepfake Tools
No ratings yet
List of Deepfake Tools
5 pages
Cognitive Bias Cheat Sheet
100% (1)
Cognitive Bias Cheat Sheet
17 pages
AMAZON PRODUCT REVIEW ANALYSIS-Report
No ratings yet
AMAZON PRODUCT REVIEW ANALYSIS-Report
77 pages
All-In-One Emotion, Sentiment and Intensity Prediction Using A Multi-Task Ensemble Framework-Ppt-1
No ratings yet
All-In-One Emotion, Sentiment and Intensity Prediction Using A Multi-Task Ensemble Framework-Ppt-1
29 pages
Quiz 4 Based On CH 7 and 8
No ratings yet
Quiz 4 Based On CH 7 and 8
12 pages
Sentiment Mining Model For Opinionated Amharic Texts
No ratings yet
Sentiment Mining Model For Opinionated Amharic Texts
86 pages
Stock Price Prediction Using News Sentiment Analysis
No ratings yet
Stock Price Prediction Using News Sentiment Analysis
4 pages
Copia de Chat GPT Seo KINDLE
No ratings yet
Copia de Chat GPT Seo KINDLE
159 pages
Microsoft - Ai 900.VApr 2024.by .ToanNguyen.116q
No ratings yet
Microsoft - Ai 900.VApr 2024.by .ToanNguyen.116q
73 pages
ML Lab Manual-17csl76
No ratings yet
ML Lab Manual-17csl76
43 pages
Arabic Sentiment Analysis of YouTube Comments NLP-Based Machine Learning
No ratings yet
Arabic Sentiment Analysis of YouTube Comments NLP-Based Machine Learning
16 pages
101 AI Business Ideas, Opportunities & Side Hustles
No ratings yet
101 AI Business Ideas, Opportunities & Side Hustles
30 pages
Handwriting to Text Conversion (2)
No ratings yet
Handwriting to Text Conversion (2)
7 pages
Meltwater Data Driven PR Playbook New
No ratings yet
Meltwater Data Driven PR Playbook New
20 pages
Text Mining
No ratings yet
Text Mining
12 pages
Classification of Retail Products From Probabilist
No ratings yet
Classification of Retail Products From Probabilist
24 pages
Instant Access to Analytics, data science, & artificial intelligence : systems for decision support Eleventh Edition Dursun Delen ebook Full Chapters
100% (2)
Instant Access to Analytics, data science, & artificial intelligence : systems for decision support Eleventh Edition Dursun Delen ebook Full Chapters
51 pages
Unit 3
No ratings yet
Unit 3
14 pages
Classification of Customer Feedbacks Using Sentiment Analysis Towards Mobile Banking Applications
No ratings yet
Classification of Customer Feedbacks Using Sentiment Analysis Towards Mobile Banking Applications
9 pages
Project 1
No ratings yet
Project 1
14 pages
Building AI - No-Code NLP Workflows
No ratings yet
Building AI - No-Code NLP Workflows
109 pages
Stock Portfolio Trading Bot
No ratings yet
Stock Portfolio Trading Bot
30 pages
Active Online Learning For Social Media Analysis To Support Crisis Management
No ratings yet
Active Online Learning For Social Media Analysis To Support Crisis Management
14 pages
Natural Language Processing 110641
No ratings yet
Natural Language Processing 110641
19 pages
Convolutional Neural Networks For Text Classification A Comprehensive Analysis
No ratings yet
Convolutional Neural Networks For Text Classification A Comprehensive Analysis
11 pages
Abstract
No ratings yet
Abstract
5 pages
Finbert: Financial Sentiment Analysis With Pre-Trained Language Models
No ratings yet
Finbert: Financial Sentiment Analysis With Pre-Trained Language Models
11 pages
Cyberbullying Detection Through Sentiment Analysis
No ratings yet
Cyberbullying Detection Through Sentiment Analysis
6 pages
Artificial Intelligence: Smart Assistants
No ratings yet
Artificial Intelligence: Smart Assistants
21 pages
MINI PROJECT Music Recommendation
No ratings yet
MINI PROJECT Music Recommendation
21 pages