0% found this document useful (0 votes)

141 views11 pages

Maneesha Nidigonda Verzeo Major Project

This document provides a summary of a machine learning project on sentiment analysis using Twitter data. The project uses various machine learning classifiers like RNNs to analyze sentiment in tweets. It extracts user sentiment and opinions from Twitter posts to analyze trends in tweet languages and volumes. Experimental results show the machine learning models achieve good accuracy for sentiment classification. The project is implemented in Python using natural language processing and machine learning techniques.

Uploaded by

Maneesha Nidigonda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

141 views11 pages

Maneesha Nidigonda Verzeo Major Project

Uploaded by

Maneesha Nidigonda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

VERZEO

MACHINE
LEARNING JUNE
MAJOR PROJECT

PRESENTED BY:
MANEESHA NIDIGONDA
ABSTRACT

With the rise of social networking epoch and its growth, Internet has become a
promising platform for online learning, exchanging ideas and sharing opinions.
Social media contain huge amount of the sentiment data in the form of tweets,
blogs, and updates on the status, posts, etc. In this paper, the most popular micro
blogging platform twitter is used. Twitter sentiment analysis is an application of
sentiment analysis on data from Twitter (tweets), to extract user’s opinions and
sentiments. The main goal is to explore how text analysis techniques can be used to
dig into some of the data in a series of posts focusing on different trends of tweets
languages, tweets volumes on twitter. Experimental evaluations show that the
proposed machine learning classifiers are efficient and performs better in terms of
accuracy. The proposed algorithm is implemented in python.
Keywords – Machine Learning, Natural Language Processing, Python, Sentimental
Analysis
SENTIMENT ANALYSIS

Sentiment Analysis (SA) is an ongoing field of research in text mining field. SA is the
computational treatment of opinions, sentiments and subjectivity of text. This survey
paper tackles a comprehensive overview of the last update in this field. Many
recently proposed algorithms' enhancements and various SA applications are
investigated and presented briefly in this survey. These articles are categorized
according to their contributions in the various SA techniques. The related fields to SA
(transfer learning, emotion detection, and building resources) that attracted
researchers recently are discussed. The main target of this survey is to give nearly full
image of SA techniques and the related fields with brief details. The main
contributions of this paper include the sophisticated categorizations of a large
number of recent articles and the illustration of the recent trend of research in the
sentiment analysis and its related areas.
1. Introduction
Sentiment Analysis (SA) or Opinion Mining (OM) is the computational study of
people’s opinions, attitudes and emotions toward an entity. The entity can represent
individuals, events or topics. These topics are most likely to be covered by reviews.
The two expressions SA or OM are interchangeable. They express a mutual meaning.
However, some researchers stated that OM and SA have slightly different notions [1].
Opinion Mining extracts and analyses people’s opinion about an entity while
Sentiment Analysis identifies the sentiment expressed in a text then analyses it.
Therefore, the target of SA is to find opinions, identify the sentiments they express,
and then classify their polarity as shown in Fig.

Figure 1. Sentiment analysis process on product reviews.

Sentiment Analysis can be considered a classification process as illustrated. There are

three main classification levels in SA: document-level, sentence-level, and aspect-
level SA. Document-level SA aims to classify an opinion document as expressing a
positive or negative opinion or sentiment. It considers the whole document a basic
information unit (talking about one topic). The opinion holders can give different
opinions for different aspects of the same entity like this sentence “The voice quality
of this phone is not good, but the battery life is long”. This survey tackles the first two
kinds of SA.

The data sets used in SA are an important issue in this field. The main sources of data
are from the product reviews. These reviews are important to the business holders
as they can take business decisions according to the analysis results of users’
opinions about their products. The reviews sources are mainly review sites.

Sentiment Analysis Dataset

The dataset which we will use in sentiment analysis is the International Movie
Database (IMDb) reviews for 50,000 reviews of movies from all over the
world, it is a binary classification dataset categorizing each review in a
positive or negative. It has 25000 samples for training and 25000 for testing.
You don’t need to download it separately for this project but you can have a
look at it on its official website. Because it is a text dataset it is very lightweight
around 80MB.
We are going to code all this up in a Jupiter notebook on google collab to make
use of the free c p u. If you follow along on your own system everything will be
pretty much the same except for mounting the google drive for use as a
persistent storage option.

• So we begin by mounting our google drive and navigating to the folder

where we have to work.
import OS from google. Collab import drive. mount('/content/drive') OS. chdir

('/content/drive/My Drive/Data Flair/Sentiment')

!ls
Preparation of data
We are going to python for this project and luckily it comes preinstalled with
some functionalities for helping us speeding up our work
The torch. Text library is a great tool for n l p projects. It has a loader for some
common n l p datasets like the one we are going to use today, also complete
pipeline for abstraction of vectorization of data, data loaders and iteration of
data.
import random import torch

from torch text .legacy import data from torch text. legacy import datasets seed = 42 torch. Manual

seed(seed) torch. backends. cpu. deterministic = True device = torch .device('c u d a' if torch. Cuda .is_

available() else 'c p u') txt = data .Field(tokenize = 'spacy', tokenizer _language = 'e n core_ web_ s m ', include

_lengths = True) labels = data. Label Field ( = torch .float) train data, test _data = datasets. IMDB. splits(txt,

labels) train_ data, valid _data = train_ data. split(random _state = random .seed(seed) n u m _words = 25_000

txt. Build _vocab(train _data, max _size = n u m _words, vectors = "glove.6B.100d", u n k _int = torch.

Tensor. normal_) labels. build_ vocab(train_ data)

Here we have downloaded the in d B dataset for python sentiment analysis

and divided it into train test and validation split. The dataset is already
divided into a train and test set, we further create a validation set from it.
We further limit the number of words the model will learn to 25000, this will
choose the most used 25000 words from the dataset and use them for
training. Significantly reducing the work of the model without any real loss
in accuracy. btch_ size = 64 train its, valid_ its, test_ its = data. Bucket Iterator. splits(
(train_ data, valid_ data, test_ data), batch_ size = b tch _ size, sort_ within_ batch = True, device =

device) import torch. n as n class RNN(nn. Module): def _in it__(self, word_ limit, dimension_

embedding, dimension_ hidden, dimension_ output, number_ layers, bidirectional, dropout,

pad_idx):

super()._ in it_ () self. embedding = nn. Embedding(word_ limit, dimension_ embedding, padding_ idx = pad_ idx) self. rnn =

nn. LSTM(dimension_ embedding, dimension_ hidden, number_ layers = number_ layers, bidirectional=bidirectional,

dropout=dropout) self. fc = n n. Linear(dimension_ hidden * 2, dimension_ output) self. dropout = nn. Dropout(dropout) def

forward(self, text, length_ txt):

embedded = self. dropout(self. embedding(text)) packed_ embedded = nn. utils. rnn. pack_ padded_

sequence(embedded, len_txt.to('c p u')) packed_ output, (hidden, cell) = self. Rnn (packed_ embedded) output,

output_ lengths = nn. utils. rnn. pad_ packed_ sequence(packed_ output) hidden = self. dropout(torch.cat((hidden[-

2,:,:], hidden[-1,:,:]), dim = 1)) return self. fc(hidden)

We define the parameters for python sentiment analysis model and pass it to
an instance of the model class we just defined. The number of input
parameters, hidden layer, and the output dimension along with throughput
rate and bidirectionality Boolean is defined. dimension_ input = length(txt. vocab) dimension
_embedding = 100 dimension_ hidden = 256 dimension_ out = 1

layers = 2 bidirectional = True dropout = 0.5 idx_ pad =

txt. vocab. stoi[txt. pad_ token] model = RNN(dimension_

input, dimension_ embedding, dimension_ hidden,

dimension_ out, layers, bidirectional, dropout, idx pad)

Now we print some details about our model. Getting the number of trainable
parameters that are present there in the model.
We then get the pre-trained embedding weights and copy them to our model
so that it does not need to learn the embeddings, and can directly focus on the
job at hand that is learning the sentiments related to those embeddings.
def count_ parameters(model):

return sum(p. number) for p in model. Parameters() if p. requires_ grad) print f('The model has {count_

parameters(model):,} trainable parameters') pretrained _embeddings = txt. vocab. vectors

print(pretrained_ embeddings. shape) unique_ id = txt. vocab. [txt. unk_ token] model. embedding. weight.

data[unique_ id] = torch. zeros(dimension_ embedding) model. embedding. weight. data[idx_ pad] = torch.

zeros(dimension_ embedding) print(model. embedding. weight. data)

import torch. optim as optim optimizer = optim. Adam(model.

parameters()) criterion = nn. BCE With Logics Loss() model =

model.to(device)

criterion = criterion.to(device)
def bin _account(preds, y):

predictions = torch. round(torch. sigmoid(preds)) correct =

(predictions == y).float() account = correct. sum() / length(correct)

return account

We define the function for training and evaluating the models. The process
here is standard. We start by looping through the number of epochs and the
number of iterations in each epoch is according to the batch size that we
defined. We pass the text to the model, get the predictions from it, calculate
the loss for each iteration and then backward propagate that loss.
The only major change in the evaluating function from the training function is
that we do not backward propagate the loss through the model and use torch.
no grad basically signifying no gradient descent while evaluating. def train(model, itr,
optimizer, criterion): epoch_ loss = 0 epoch_ account = 0 model. train() for i in ITR:

optimizer. zero_ grad() text, length_ txt = I. text predictions = model(text,

length _ txt).squeeze(1) loss = criterion(predictions, I. label) account =

bin_ account(predictions, I. label) loss. backward() optimizer. step()

epoch_ loss += loss. item() epoch_ account+= account. item() return

epoch_ loss / length(ITR), epoch_ account / length(ITR) def

evaluate(model, ITR, criterion): epoch_ loss = 0

epoch_ account =

0 model. train()

for I in ITR:

optimizer .zero_ grad() text, length_ txt = I. text predictions = model(text,

length_ txt).squeeze(1) loss = criterion(predictions, I. label) account =

bin_ account(predictions, I. label) loss. backward() optimizer. step()

epoch_ loss += loss. item() epoch_ account += acc. item() return epoch_

loss / length(ITR), epoch_ account / length(ITR) def evaluate(model, ITR,

criterion): epoch_ loss = 0 epoch_ account = 0 model. eval() with

torch.no_ grad(): for I in ITR: text, length_ txt = i. text predictions =

model(text, length_ txt).squeeze(1) loss = criterion(predictions, i. label)

account = bin_ account(predictions, I. label) epoch_ loss += loss. item()

epoch_ account += acc. item() return epoch_ loss / length(ITR), epoch_

account / length (I t r)

We build a helper function epoch time for calculating the time each epoch
takes to complete its run and print it. We set the number of epochs to 5 and
then begin our training. Adding the training and validation loss at each stage,
if we need to understand or plot the training curve at a later point. We save the
python sentiment analysis model that has the best validation loss.
We load the saved checkpoint of the model and test it on the test set that we
created earlier. During the dry run of python sentiment analysis model, we
achieved a decent accuracy score of 85.83%.
model. load_ state_ dict(torch. load('tut2-model.pt')) test_ loss, test_ account = evaluate(model,

test_ itr, criterion) print f('Test Loss: {test_loss:.3f} | Test Acc: {test_ acc*100:.2f}%')

We can also check the model on our data. This is trained to classify the movie
reviews into positive, negative, and neutral, therefore we will pass to it
relatable data for checking. So for that we will import and load spacy for
tokenizing the data we need to give to the model. In the beginning, while
defining the pre processing we used spacy built-in torch. text, but here we are
not using batches, and the pre processing that we need to do can be handled
by the spacy library. We define a predict sentiment function for this. After the
pre processing, we convert it into tensors and ready to be passed to the model
import spacy nlp = spacy. load('en_ core_ web_ sm') def pred(model, sentence):

model. eval() tokenized = [tok. text for tok in nlp. tokenizer(sentence)] indexed =

[txt. vocab. stoi[t] for t in tokenized] length = [length(indexed)] tensor = torch. Long

Tensor(indexed).to(device) tensor = tensor .un squeeze ( 1) length_ tensor = torch.

Long Tensor(length) prediction = torch. sigmoid(model(tensor, length_ tensor))

return prediction. item()

We define another helper function that will print the sentiment of the
comment based on the score that the model provides.
sent=["positive", "neutral" ,"negative"] def print_ sent(x):
if (x<0.3): print(sent[0]) elif (x>0.3 and x<0.7):

print(sent[1]) else: print(sent[2])

Python Sentiment Analysis Output

Summary
We have successfully developed python sentiment analysis model based on
lstm techniques that is pretty robust and highly accurate. As discussed earlier,
sentiment analysis has many use-cases based on requirements we can use it.
We can similarly train it on any other kind of data just by changing the dataset
according to our needs. We can use this sentiment analysis model in all
different ways possible.

Maneesha Nidigonda Major Project
No ratings yet
Maneesha Nidigonda Major Project
11 pages
RES Presentation
No ratings yet
RES Presentation
21 pages
Detailed Report
No ratings yet
Detailed Report
6 pages
PES1PG24CS018 Debjit DLTP Assignment-2 Sentiment Analysis Report
No ratings yet
PES1PG24CS018 Debjit DLTP Assignment-2 Sentiment Analysis Report
8 pages
Thesis - Aru Omarali
No ratings yet
Thesis - Aru Omarali
34 pages
Deep-Sentiment: Sentiment Analysis Using Ensemble of CNN and Bi-LSTM Models
No ratings yet
Deep-Sentiment: Sentiment Analysis Using Ensemble of CNN and Bi-LSTM Models
6 pages
Sentiment Analysis From H El Reviews: Data Mining For Business Intelligence
No ratings yet
Sentiment Analysis From H El Reviews: Data Mining For Business Intelligence
13 pages
Conference Template A4 1
No ratings yet
Conference Template A4 1
6 pages
Sentiment Analysis Using Machine Learning Classifiers
No ratings yet
Sentiment Analysis Using Machine Learning Classifiers
41 pages
Welco ME
No ratings yet
Welco ME
15 pages
SML 1
No ratings yet
SML 1
16 pages
Sentiment Analysis Based On Deep Learning - A Comparative Study
No ratings yet
Sentiment Analysis Based On Deep Learning - A Comparative Study
29 pages
Complete Report
No ratings yet
Complete Report
56 pages
Stock Sentiment Analysis Using Ai
No ratings yet
Stock Sentiment Analysis Using Ai
17 pages
Imdb Article (23bai11047)
No ratings yet
Imdb Article (23bai11047)
9 pages
Sentiment Analysis For User Reviews On Social Media-IJAERDV04I0291676
No ratings yet
Sentiment Analysis For User Reviews On Social Media-IJAERDV04I0291676
4 pages
Fin Ijprems1714118825
No ratings yet
Fin Ijprems1714118825
6 pages
Sentiment Analysis Using Recurrent Neural Network
No ratings yet
Sentiment Analysis Using Recurrent Neural Network
7 pages
Data Science Project
No ratings yet
Data Science Project
24 pages
Sentiment Classification System of Twitter Data For US Airline Service Analysis
No ratings yet
Sentiment Classification System of Twitter Data For US Airline Service Analysis
5 pages
Sentiment Analysis On Amazon Fine Food Reviews by Using Linear Machine Learning Models
No ratings yet
Sentiment Analysis On Amazon Fine Food Reviews by Using Linear Machine Learning Models
6 pages
Hybrid CNN-BERT for Sentiment Analysis
No ratings yet
Hybrid CNN-BERT for Sentiment Analysis
12 pages
Deep Learning Based Sentiment
No ratings yet
Deep Learning Based Sentiment
62 pages
NLP Final Mini Project
No ratings yet
NLP Final Mini Project
17 pages
21bce3701 Senti K9ar
No ratings yet
21bce3701 Senti K9ar
28 pages
Document Movie Review
No ratings yet
Document Movie Review
31 pages
Twitter Sentiment Analysis Using Deep Learning
No ratings yet
Twitter Sentiment Analysis Using Deep Learning
5 pages
Sentiment Analysis Behind Text With Different Length and Formality
No ratings yet
Sentiment Analysis Behind Text With Different Length and Formality
6 pages
CSE4062S21 Group3 Project Delivery7 FinalReport
No ratings yet
CSE4062S21 Group3 Project Delivery7 FinalReport
9 pages
Report Sentiment Analysis Marcos Matheus
No ratings yet
Report Sentiment Analysis Marcos Matheus
12 pages
ISSS609 Project Proposal Group 7
No ratings yet
ISSS609 Project Proposal Group 7
8 pages
Sentiment Analysis Twitter
No ratings yet
Sentiment Analysis Twitter
3 pages
Harsh Internship
No ratings yet
Harsh Internship
18 pages
Machine Learning With Advance Model
No ratings yet
Machine Learning With Advance Model
19 pages
MP 1
No ratings yet
MP 1
14 pages
Plati 1
No ratings yet
Plati 1
16 pages
NILES2021 Paper 43
No ratings yet
NILES2021 Paper 43
5 pages
A Review On Advances in Sentiment Analysis A Deep Learning Approach Using Transformer Based Models
No ratings yet
A Review On Advances in Sentiment Analysis A Deep Learning Approach Using Transformer Based Models
5 pages
Mukesh Joshiyara FInal
No ratings yet
Mukesh Joshiyara FInal
31 pages
Text Classification - Movie Review - News Wires
No ratings yet
Text Classification - Movie Review - News Wires
5 pages
Exp6 Dav 68 Dnyaneshwar 1
No ratings yet
Exp6 Dav 68 Dnyaneshwar 1
6 pages
Capstone Project
No ratings yet
Capstone Project
15 pages
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Reviews
No ratings yet
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Reviews
4 pages
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Reviews
No ratings yet
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Reviews
4 pages
Simplify Product Review Analysis Using Deep Learning and Natural Language Processing 1
No ratings yet
Simplify Product Review Analysis Using Deep Learning and Natural Language Processing 1
16 pages
Project Review On The Opinion Minin
No ratings yet
Project Review On The Opinion Minin
4 pages
Twitter Sentiment Analysis Project
No ratings yet
Twitter Sentiment Analysis Project
18 pages
Sentiment Analysis With NLP Deep Learning
No ratings yet
Sentiment Analysis With NLP Deep Learning
8 pages
Vietnamese Sentiment Analysis Under Limited Training Data
No ratings yet
Vietnamese Sentiment Analysis Under Limited Training Data
14 pages
Group 10 Data Science Project Report (Sentiment Analysis)
No ratings yet
Group 10 Data Science Project Report (Sentiment Analysis)
23 pages
Sentimental Analysis On Twitter Data Using Naive Bayes: Ijarcce
No ratings yet
Sentimental Analysis On Twitter Data Using Naive Bayes: Ijarcce
4 pages
Emotion AI Driven Sentiment Analysis A S
No ratings yet
Emotion AI Driven Sentiment Analysis A S
27 pages
XLNet for Sentiment Analysis
No ratings yet
XLNet for Sentiment Analysis
9 pages
Praveen Phase 3
No ratings yet
Praveen Phase 3
6 pages
Sentiment Analysis From Movie Reviews Us
No ratings yet
Sentiment Analysis From Movie Reviews Us
5 pages
Final Sentiment Classification
No ratings yet
Final Sentiment Classification
16 pages
Performance Evaluation and Comparison Using Deep Learning Techniques in Sentiment Analysis
No ratings yet
Performance Evaluation and Comparison Using Deep Learning Techniques in Sentiment Analysis
12 pages
DS - Lab Report.
No ratings yet
DS - Lab Report.
25 pages
Webpage Based Street Light Controlling System Using Embedded System
No ratings yet
Webpage Based Street Light Controlling System Using Embedded System
25 pages
Vlsi Seminar
No ratings yet
Vlsi Seminar
1 page
VLSI Technology and Trends
No ratings yet
VLSI Technology and Trends
16 pages
Titanic Logistic Regression Project
No ratings yet
Titanic Logistic Regression Project
35 pages
USB Cassette Converter Manual PDF
No ratings yet
USB Cassette Converter Manual PDF
13 pages
Faculty of Civil Engineering and Built Environment Universiti Tun Hussein Onn Malaysia
No ratings yet
Faculty of Civil Engineering and Built Environment Universiti Tun Hussein Onn Malaysia
4 pages
Price List Haiwell 2023 Ver 5.05.02
No ratings yet
Price List Haiwell 2023 Ver 5.05.02
51 pages
SB-220 Restoration and Optimization 1.04
No ratings yet
SB-220 Restoration and Optimization 1.04
35 pages
Smart Parking System Using Iot
No ratings yet
Smart Parking System Using Iot
22 pages
Microwave Hearing: History and Applications
No ratings yet
Microwave Hearing: History and Applications
10 pages
Simba M4 C PDF
No ratings yet
Simba M4 C PDF
4 pages
Gaurav
No ratings yet
Gaurav
26 pages
SNiP Vs Eurocode
No ratings yet
SNiP Vs Eurocode
105 pages
Components of HR Audit
67% (3)
Components of HR Audit
11 pages
Managing Hardware and Software Assets
No ratings yet
Managing Hardware and Software Assets
13 pages
Unit 5 - JSP
No ratings yet
Unit 5 - JSP
93 pages
Difference Between OLAP and OLTP
No ratings yet
Difference Between OLAP and OLTP
7 pages
Operating Manual: H40-60XT (A380)
100% (1)
Operating Manual: H40-60XT (A380)
180 pages
Large Panel Prefab
0% (1)
Large Panel Prefab
18 pages
ST L6713a
No ratings yet
ST L6713a
64 pages
CV Toth-Nagy, Csaba, 2023februar25 Habilitacio
No ratings yet
CV Toth-Nagy, Csaba, 2023februar25 Habilitacio
7 pages
Polling Data Registers From Allen-Bradley PLCS: Application User Guide
No ratings yet
Polling Data Registers From Allen-Bradley PLCS: Application User Guide
24 pages
Hism200-1 Sa1 J Mokopanele
No ratings yet
Hism200-1 Sa1 J Mokopanele
4 pages
Azure Concepts
No ratings yet
Azure Concepts
6 pages
Project Deliverable 1
No ratings yet
Project Deliverable 1
10 pages
The T Programming Language: A Dialect of Lisp
No ratings yet
The T Programming Language: A Dialect of Lisp
1 page
39cz Carrier Ahu
No ratings yet
39cz Carrier Ahu
14 pages
HTML Tables, Forms, and Links
100% (1)
HTML Tables, Forms, and Links
15 pages
Cent 2100
No ratings yet
Cent 2100
460 pages
Civil Engineer Seeking New Role
No ratings yet
Civil Engineer Seeking New Role
4 pages
XC-HM86D-B S WGP Update@171027 HR
No ratings yet
XC-HM86D-B S WGP Update@171027 HR
2 pages
MK HoneyWell
No ratings yet
MK HoneyWell
56 pages
Quiz 3
No ratings yet
Quiz 3
10 pages
FCT User Guidev1.4
No ratings yet
FCT User Guidev1.4
26 pages