0% found this document useful (0 votes)
35 views

Lecture 3 Sentiment Analysis

This document discusses sentiment analysis and different methods used for sentiment analysis including dictionary-based approaches, corpus-based approaches, and machine learning approaches. It provides examples of using naive bayes classification and n-grams for sentiment analysis. Specifically, it explains how to calculate n-gram probabilities and provides a step-by-step example of using bigrams and logistic regression for sentiment analysis on movie reviews.

Uploaded by

Andrew Chung
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Lecture 3 Sentiment Analysis

This document discusses sentiment analysis and different methods used for sentiment analysis including dictionary-based approaches, corpus-based approaches, and machine learning approaches. It provides examples of using naive bayes classification and n-grams for sentiment analysis. Specifically, it explains how to calculate n-gram probabilities and provides a step-by-step example of using bigrams and logistic regression for sentiment analysis on movie reviews.

Uploaded by

Andrew Chung
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Lecture 3

Sentiment Analysis
What is sentiment analysis also called opinion
mining or emotion AI?
Sentiments Analysis in FinTech
Suicide Risk Detection via social media
User comment on consumer products and services
Sentiment Analysis Methods

Text Classification
• Naive Bayes
• Logistic regression
• Support-vector machines
• K-Nearest Neighbours
https://www.sciencedirect.com/topics/computer-science/lexicon-based-approach
Sentiment Analysis Methods
1. Dictionary-based approach
• In this approach, a dictionary is created by taking a few
words initially. Then an online dictionary, thesaurus or
WordNet can be used to expand that dictionary by
incorporating synonyms and antonyms of those words. The
dictionary is expanded till no new words can be added to
that dictionary. The dictionary can be refined by manual
inspection.

https://www.sciencedirect.com/topics/computer-science/lexicon-based-approach
Sentiment Analysis Methods
2. Corpus-based approach
• This finds sentiment orientation of context-specific words. The two
methods of this approach are:
 Statistical approach: The words which show erratic behavior in
positive behavior are considered to have positive polarity. If they
show negative recurrence in negative text they have negative
polarity. If the frequency is equal in both positive and negative text
then the word has neutral polarity.
 Semantic approach: This approach assigns sentiment values to
words and the words which are semantically closer to those words;
this can be done by finding synonyms and antonyms with respect to
that word.
https://www.sciencedirect.com/topics/computer-science/lexicon-based-approach
Sentiment Analysis Methods
4. Machine Learning Approach
Machine learning approaches use probability models and
features that are derived from the input text.
There are four subclasses: Hidden Markov Model (HMMs),
Conditional Random Fields (CRFs), Support Vector
Machines (SVM), and Maximum Entropy models. 

https://www.analyticsvidhya.com/blog/2018/08/nlp-guide-conditional-random-fields-text-classification/
https://www.sciencedirect.com/topics/computer-science/lexicon-based-approach
Example 1: Naïve Bayes Classifier for Sentiment
Analysis
• To correctly classify a review as positive or negative, these are the
two classes to which each document belongs.

where P(c|d) is the probability of class c, given document d


C is the set of all possible classes
c is one of these classes
d is the document that we are currently classifying

https://medium.datadriveninvestor.com/implementing-naive-bayes-for-sentiment-analysis-in-python-951fa8dcd928
Example 1: Naïve Bayes Classifier for Sentiment
Analysis
• We can rewrite this equation using the well known Bayes’ Rule, one of
the most fundamental rules in machine learning. Since we want to
maximize the equation we can drop the denominator, which doesn’t
depend on class c.

where P(d|c) is the probability that given a class c, document d belongs


to it.
P(c) is the probability of having a document from class c.
https://medium.datadriveninvestor.com/implementing-naive-bayes-for-sentiment-analysis-in-python-951fa8dcd928
Example 1: Naïve Bayes Classifier for Sentiment
Analysis
• Naive Bayes assumption: given a class c, the presence of an individual
feature of our document is independent on the others.
• We consider each individual word of our document to be a feature. If
we write this formally we obtain:

• The Naive Bayes assumption lets us substitute P(d|c) by the product of


the probability of each feature conditioned on the class because it
assumes their independence. We can make one more change:
maximize the log of our function instead.

https://medium.datadriveninvestor.com/implementing-naive-bayes-for-sentiment-analysis-in-python-951fa8dcd928
Example 1: Naïve Bayes Classifier for Sentiment
Analysis
• P(c) is simply the probability of encountering a document of a certain class
within our corpus. This is easily calculated by just dividing the number of
occurrences of class c by the total number of documents.
• P(w_i|c) is the probability of word w_i occurring in a document of class c. Again
we can use the frequencies in our corpus to compute this. This will simply be
the number of times word w_i occurs in documents of class c, divided by the
sum of the counts of each word that appears in documents of class c.
• Naturally, the probability P(w_i|c) will be 0, making the second term of the
equation go to negative infinity! smoothing. we add-one (Laplace) smoothing,
where the constant is just 1, to the formuka.. This solves the zero probabilities
problem. 
Example 1: Naïve Bayes Classifier for Sentiment Analysis

https://medium.datadriveninvestor.com/implementing-naive-bayes-for-sentiment-analysis-in-python-951fa8dcd928
Example 1: Naïve Bayes
Classifier for Sentiment
Analysis

Database: Sentiment
Analysis of Movie Reviews
in NLTK Python

https://medium.com/@joel_34096/sentiment-analysis-of-movie-reviews-in-nltk-python-4af4b76a6f3
Example 1: Naïve Bayes Classifier for Sentiment
Analysis

https://gist.github.com/CateGitau/6608912ca92733036c090676c61c13cd
Example 1: Naïve Bayes Classifier for Sentiment Analysis

https://gist.github.com/CateGitau/6608912ca92733036c090676c61c13cd
Example 1: Naïve Bayes Classifier for Sentiment Analysis

https://gist.github.com/CateGitau/6608912ca92733036c090676c61c13cd
What is N-gram?
• An N-gram is simply a sequence of N words. For instance, let us take a
look at the following examples.
 San Francisco (is a 2-gram)
 The Three Musketeers (is a 3-gram)
 She stood up slowly (is a 4-gram)
• An N-gram model predicts the occurrence of a word based on the
occurrence of its N – 1 previous words. 
• For instance, a bigram model (N = 2) predicts the occurrence of a word
given only its previous word (as N – 1 = 1 in this case). Similarly, a
trigram model (N = 3) predicts the occurrence of a word based on its
previous two words (as N – 1 = 2 in this case).
https://blog.xrds.acm.org/2017/10/introduction-n-grams-need/
What is N-gram?
• Suppose we have the following sentences as the training corpus:
 Thank you so much for your help.
 I really appreciate your help.
 Excuse me, do you know what time it is?
 I’m really sorry for not inviting you.
 I really like your watch.
• I want to write the sentence “I really like your garden.” Now because this is a bigram
model, the model will learn the occurrence of every two words, to determine the
probability of a word occurring after a certain word. For example, from the 2nd, 4th,
and the 5th sentence in the example above, we know that after the word “really” we
can see either the word “appreciate”, “sorry”, or the word “like” occurs. 
• I want to write the sentence “I really like your garden.” Now because this is a bigram
model, the model will learn the occurrence of every two words, to determine the
probability of a word occurring after a certain word. For example, from the 2nd, 4th,
and the 5th sentence in the example above, we know that after the word “really” we
can see either the word “appreciate”, “sorry”, or the word “like” occurs. 
What is n-gram?
• From our example sentences, let’s
• Suppose we’re calculating the calculate the probability of the word
probability of word “w1” “like” occurring after the word “really”:
occurring after the word “w2,” count(really like) / count(really)
then the formula for this is as =1/3
= 0.33
follows: count (w2, w1)/count
(w2), which is the number of • Similarly, for the other two
times the words occurs in the possibilities:
count(really appreciate) / count(really)
required sequence, divided by =1/3
the number of the times the = 0.33
word before the expected word
count(really sorry) / count(really)
occurs in the corpus. =1/3
= 0.33
https://towardsdatascience.com/understanding-word-n-grams-and-n-gram-probability-in-natural-language-processing-
9d9eef0fa058
Example 2: Using N-gram and logistics
regression for sentiment analysis
• Text Processing: Stemming/Lemmatizing to convert different forms of
each word into one.
• n-grams: Instead of just single-word tokens (1-gram/unigram) we can
also include word pairs.
• Representations: Instead of simple, binary vectors we can use word
counts or TF-IDF to transform those counts.
• Algorithms: In addition to Logistic Regression, we’ll see how Support
Vector Machines perform.

https://towardsdatascience.com/sentiment-analysis-with-python-part-2-4f71e7bde59a
Example 2: Using N-gram and logistics
regression for sentiment analysis
Step 1: Tokenizing by n-gram Step 2: Counting and filtering n-grams

https://www.tidytextmining.com/ngrams.html
Example 2: Using N-gram and logistics
regression for sentiment analysis
Step 2: Counting and filtering n-grams Step 3: Analysing bigrams

https://www.tidytextmining.com/ngrams.html
Example 2: Using N-gram and logistics
regression for sentiment analysis
Step 4 A bigram is treated as a term in a Step 5: document analysis
document to calculate the tf-idf. results in term of bigram

https://www.tidytextmining.com/ngrams.html
Example 2: Using N-gram and logistics
regression for sentiment analysis
• The bigrams “not like” and “not help” were Step 6: Bi-gram analysis for
overwhelmingly the largest causes of sentiment analysis corpus. E.g.
misidentification, making the text seem much
more positive than it is. But we can see phrases
like “not afraid” and “not fail” sometimes
suggest text is more negative than it is.
• “Not” isn’t the only term that provides some
context for the following word. We could pick
four common words (or more) that negate the
subsequent term, and use the same joining and
counting approach to examine all of them at
once.

https://www.tidytextmining.com/ngrams.html
Example 2: Using N-gram and logistics
regression for sentiment analysis
• Step 7: Any statistics or machine
learning to cluster/classify the good
or bad comment/document.
e.g.Counting and correlating among sections
(positive or negative bi-gram words) for sentiment
analysis. Or using logistics regression in next
programming example.

https://www.tidytextmining.com/ngrams.html
Example 2: Using N-gram and logistics regression for sentiment analysis

https://towardsdatascience.com/sentiment-analysis-with-python-part-2-4f71e7bde59a
Example 2: Using N-gram and logistics regression for sentiment analysis

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
https://scikit-learn.org/stable/modules/ generated/sklearn.feature_extraction.text.CountVectorizer.html
https://towardsdatascience.com/sentiment-analysis-with-python-part-2-4f71e7bde59a
Example 3: Sentiment Analysis for Fashion
using Deep Learning (Neural Network)
• Nowadays, online shopping is trendy and famous for different
products like electronics, clothes, food items, and others. For
instance, e-commerce sells products and provides an option to rate
and write comments about consumers’ products, which is a handy
and important way to identify a product’s quality.
• Based on them, other consumers can decide whether to purchase a
product or not. It is also beneficial to sellers and manufacturers to
know their products’ sentiments to make their products better.

https://pub.towardsai.net/sentiment-analysis-opinion-mining-with-python-nlp-tutorial-d1f173ca4e3c
Example 3: Sentiment Analysis for Fashion using Deep Learning

https://pub.towardsai.net/sentiment-analysis-opinion-mining-with-python-nlp-tutorial-d1f173ca4e3c
Example 3: Sentiment Analysis for Fashion using Deep Learning

https://pub.towardsai.net/sentiment-analysis-opinion-mining-with-python-nlp-tutorial-d1f173ca4e3c
Example 3: Sentiment Analysis for Fashion using Deep Learning

https://pub.towardsai.net/sentiment-analysis-opinion-mining-with-python-nlp-tutorial-d1f173ca4e3c
Example 3: Sentiment Analysis for Fashion using Deep Learning

https://pub.towardsai.net/sentiment-analysis-opinion-mining-with-python-nlp-tutorial-d1f173ca4e3c
Example 3: Sentiment Analysis for Fashion using Deep Learning

https://pub.towardsai.net/sentiment-analysis-opinion-mining-with-python-nlp-tutorial-d1f173ca4e3c
Example 3: Sentiment Analysis for Fashion using Deep Learning

https://pub.towardsai.net/sentiment-analysis-opinion-mining-with-python-nlp-tutorial-d1f173ca4e3c
Example 4: Using CNN in Word2vec for sentiment analysis

•  In the CBOW method, the goal is to predict a


word given the surrounding words.
• Skip-gram is the converse: we want to predict
a window of words given a single word.
• Both methods use artificial neural networks as
their classification algorithm. Initially, each
word in the vocabulary is a random N-
dimensional vector.
• During training, the algorithm learns the
optimal vector for each word using the CBOW
or Skip-gram method.
https://www.districtdatalabs.com/modern-methods-for-sentiment-analysis
https://towardsdatascience.com/another-twitter-sentiment-analysis-with-python-part-6-doc2vec-603f11832504
Example 4: Using CNN in Word2vec for sentiment
analysis

https://towardsdatascience.com/another-twitter-sentiment-analysis-with-python-part-6-doc2vec-603f11832504
Example 4: Using CNN in Word2vec for sentiment analysis

https://towardsdatascience.com/another-twitter-sentiment-analysis-with-python-part-11-cnn-word2vec-41f5e28eda74
Example 4: Using CNN in Word2vec for
sentiment analysis
• See example codes

https://www.kaggle.com/code/atagunduzalp/sentiment-analysis-word2vec-cnn-and-pytorch/notebook
Example 4: Using CNN in Word2vec for sentiment
analysis
• CBOW
https://www.kdnuggets.com/2018/04/implementing-deep-learning-methods-feature-engineerin
g-text-data-cbow.html

• Skip gram
https://towardsdatascience.com/skip-gram-nlp-context-words-prediction-algorithm-5bbf34f84e
0c

• Word2Vec
• https://towardsdatascience.com/another-twitter-sentiment-analysis-with-python-part-11-cnn-
word2vec-41f5e28eda74

• https://medium.com/swlh/sentiment-classification-using-word-embeddings-word2vec-aedf28fb
b8ca

You might also like