Lecture 3 Sentiment Analysis
Lecture 3 Sentiment Analysis
Sentiment Analysis
What is sentiment analysis also called opinion
mining or emotion AI?
Sentiments Analysis in FinTech
Suicide Risk Detection via social media
User comment on consumer products and services
Sentiment Analysis Methods
Text Classification
• Naive Bayes
• Logistic regression
• Support-vector machines
• K-Nearest Neighbours
https://www.sciencedirect.com/topics/computer-science/lexicon-based-approach
Sentiment Analysis Methods
1. Dictionary-based approach
• In this approach, a dictionary is created by taking a few
words initially. Then an online dictionary, thesaurus or
WordNet can be used to expand that dictionary by
incorporating synonyms and antonyms of those words. The
dictionary is expanded till no new words can be added to
that dictionary. The dictionary can be refined by manual
inspection.
https://www.sciencedirect.com/topics/computer-science/lexicon-based-approach
Sentiment Analysis Methods
2. Corpus-based approach
• This finds sentiment orientation of context-specific words. The two
methods of this approach are:
Statistical approach: The words which show erratic behavior in
positive behavior are considered to have positive polarity. If they
show negative recurrence in negative text they have negative
polarity. If the frequency is equal in both positive and negative text
then the word has neutral polarity.
Semantic approach: This approach assigns sentiment values to
words and the words which are semantically closer to those words;
this can be done by finding synonyms and antonyms with respect to
that word.
https://www.sciencedirect.com/topics/computer-science/lexicon-based-approach
Sentiment Analysis Methods
4. Machine Learning Approach
Machine learning approaches use probability models and
features that are derived from the input text.
There are four subclasses: Hidden Markov Model (HMMs),
Conditional Random Fields (CRFs), Support Vector
Machines (SVM), and Maximum Entropy models.
https://www.analyticsvidhya.com/blog/2018/08/nlp-guide-conditional-random-fields-text-classification/
https://www.sciencedirect.com/topics/computer-science/lexicon-based-approach
Example 1: Naïve Bayes Classifier for Sentiment
Analysis
• To correctly classify a review as positive or negative, these are the
two classes to which each document belongs.
https://medium.datadriveninvestor.com/implementing-naive-bayes-for-sentiment-analysis-in-python-951fa8dcd928
Example 1: Naïve Bayes Classifier for Sentiment
Analysis
• We can rewrite this equation using the well known Bayes’ Rule, one of
the most fundamental rules in machine learning. Since we want to
maximize the equation we can drop the denominator, which doesn’t
depend on class c.
https://medium.datadriveninvestor.com/implementing-naive-bayes-for-sentiment-analysis-in-python-951fa8dcd928
Example 1: Naïve Bayes Classifier for Sentiment
Analysis
• P(c) is simply the probability of encountering a document of a certain class
within our corpus. This is easily calculated by just dividing the number of
occurrences of class c by the total number of documents.
• P(w_i|c) is the probability of word w_i occurring in a document of class c. Again
we can use the frequencies in our corpus to compute this. This will simply be
the number of times word w_i occurs in documents of class c, divided by the
sum of the counts of each word that appears in documents of class c.
• Naturally, the probability P(w_i|c) will be 0, making the second term of the
equation go to negative infinity! smoothing. we add-one (Laplace) smoothing,
where the constant is just 1, to the formuka.. This solves the zero probabilities
problem.
Example 1: Naïve Bayes Classifier for Sentiment Analysis
https://medium.datadriveninvestor.com/implementing-naive-bayes-for-sentiment-analysis-in-python-951fa8dcd928
Example 1: Naïve Bayes
Classifier for Sentiment
Analysis
Database: Sentiment
Analysis of Movie Reviews
in NLTK Python
https://medium.com/@joel_34096/sentiment-analysis-of-movie-reviews-in-nltk-python-4af4b76a6f3
Example 1: Naïve Bayes Classifier for Sentiment
Analysis
https://gist.github.com/CateGitau/6608912ca92733036c090676c61c13cd
Example 1: Naïve Bayes Classifier for Sentiment Analysis
https://gist.github.com/CateGitau/6608912ca92733036c090676c61c13cd
Example 1: Naïve Bayes Classifier for Sentiment Analysis
https://gist.github.com/CateGitau/6608912ca92733036c090676c61c13cd
What is N-gram?
• An N-gram is simply a sequence of N words. For instance, let us take a
look at the following examples.
San Francisco (is a 2-gram)
The Three Musketeers (is a 3-gram)
She stood up slowly (is a 4-gram)
• An N-gram model predicts the occurrence of a word based on the
occurrence of its N – 1 previous words.
• For instance, a bigram model (N = 2) predicts the occurrence of a word
given only its previous word (as N – 1 = 1 in this case). Similarly, a
trigram model (N = 3) predicts the occurrence of a word based on its
previous two words (as N – 1 = 2 in this case).
https://blog.xrds.acm.org/2017/10/introduction-n-grams-need/
What is N-gram?
• Suppose we have the following sentences as the training corpus:
Thank you so much for your help.
I really appreciate your help.
Excuse me, do you know what time it is?
I’m really sorry for not inviting you.
I really like your watch.
• I want to write the sentence “I really like your garden.” Now because this is a bigram
model, the model will learn the occurrence of every two words, to determine the
probability of a word occurring after a certain word. For example, from the 2nd, 4th,
and the 5th sentence in the example above, we know that after the word “really” we
can see either the word “appreciate”, “sorry”, or the word “like” occurs.
• I want to write the sentence “I really like your garden.” Now because this is a bigram
model, the model will learn the occurrence of every two words, to determine the
probability of a word occurring after a certain word. For example, from the 2nd, 4th,
and the 5th sentence in the example above, we know that after the word “really” we
can see either the word “appreciate”, “sorry”, or the word “like” occurs.
What is n-gram?
• From our example sentences, let’s
• Suppose we’re calculating the calculate the probability of the word
probability of word “w1” “like” occurring after the word “really”:
occurring after the word “w2,” count(really like) / count(really)
then the formula for this is as =1/3
= 0.33
follows: count (w2, w1)/count
(w2), which is the number of • Similarly, for the other two
times the words occurs in the possibilities:
count(really appreciate) / count(really)
required sequence, divided by =1/3
the number of the times the = 0.33
word before the expected word
count(really sorry) / count(really)
occurs in the corpus. =1/3
= 0.33
https://towardsdatascience.com/understanding-word-n-grams-and-n-gram-probability-in-natural-language-processing-
9d9eef0fa058
Example 2: Using N-gram and logistics
regression for sentiment analysis
• Text Processing: Stemming/Lemmatizing to convert different forms of
each word into one.
• n-grams: Instead of just single-word tokens (1-gram/unigram) we can
also include word pairs.
• Representations: Instead of simple, binary vectors we can use word
counts or TF-IDF to transform those counts.
• Algorithms: In addition to Logistic Regression, we’ll see how Support
Vector Machines perform.
https://towardsdatascience.com/sentiment-analysis-with-python-part-2-4f71e7bde59a
Example 2: Using N-gram and logistics
regression for sentiment analysis
Step 1: Tokenizing by n-gram Step 2: Counting and filtering n-grams
https://www.tidytextmining.com/ngrams.html
Example 2: Using N-gram and logistics
regression for sentiment analysis
Step 2: Counting and filtering n-grams Step 3: Analysing bigrams
https://www.tidytextmining.com/ngrams.html
Example 2: Using N-gram and logistics
regression for sentiment analysis
Step 4 A bigram is treated as a term in a Step 5: document analysis
document to calculate the tf-idf. results in term of bigram
https://www.tidytextmining.com/ngrams.html
Example 2: Using N-gram and logistics
regression for sentiment analysis
• The bigrams “not like” and “not help” were Step 6: Bi-gram analysis for
overwhelmingly the largest causes of sentiment analysis corpus. E.g.
misidentification, making the text seem much
more positive than it is. But we can see phrases
like “not afraid” and “not fail” sometimes
suggest text is more negative than it is.
• “Not” isn’t the only term that provides some
context for the following word. We could pick
four common words (or more) that negate the
subsequent term, and use the same joining and
counting approach to examine all of them at
once.
https://www.tidytextmining.com/ngrams.html
Example 2: Using N-gram and logistics
regression for sentiment analysis
• Step 7: Any statistics or machine
learning to cluster/classify the good
or bad comment/document.
e.g.Counting and correlating among sections
(positive or negative bi-gram words) for sentiment
analysis. Or using logistics regression in next
programming example.
https://www.tidytextmining.com/ngrams.html
Example 2: Using N-gram and logistics regression for sentiment analysis
https://towardsdatascience.com/sentiment-analysis-with-python-part-2-4f71e7bde59a
Example 2: Using N-gram and logistics regression for sentiment analysis
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
https://scikit-learn.org/stable/modules/ generated/sklearn.feature_extraction.text.CountVectorizer.html
https://towardsdatascience.com/sentiment-analysis-with-python-part-2-4f71e7bde59a
Example 3: Sentiment Analysis for Fashion
using Deep Learning (Neural Network)
• Nowadays, online shopping is trendy and famous for different
products like electronics, clothes, food items, and others. For
instance, e-commerce sells products and provides an option to rate
and write comments about consumers’ products, which is a handy
and important way to identify a product’s quality.
• Based on them, other consumers can decide whether to purchase a
product or not. It is also beneficial to sellers and manufacturers to
know their products’ sentiments to make their products better.
https://pub.towardsai.net/sentiment-analysis-opinion-mining-with-python-nlp-tutorial-d1f173ca4e3c
Example 3: Sentiment Analysis for Fashion using Deep Learning
https://pub.towardsai.net/sentiment-analysis-opinion-mining-with-python-nlp-tutorial-d1f173ca4e3c
Example 3: Sentiment Analysis for Fashion using Deep Learning
https://pub.towardsai.net/sentiment-analysis-opinion-mining-with-python-nlp-tutorial-d1f173ca4e3c
Example 3: Sentiment Analysis for Fashion using Deep Learning
https://pub.towardsai.net/sentiment-analysis-opinion-mining-with-python-nlp-tutorial-d1f173ca4e3c
Example 3: Sentiment Analysis for Fashion using Deep Learning
https://pub.towardsai.net/sentiment-analysis-opinion-mining-with-python-nlp-tutorial-d1f173ca4e3c
Example 3: Sentiment Analysis for Fashion using Deep Learning
https://pub.towardsai.net/sentiment-analysis-opinion-mining-with-python-nlp-tutorial-d1f173ca4e3c
Example 3: Sentiment Analysis for Fashion using Deep Learning
https://pub.towardsai.net/sentiment-analysis-opinion-mining-with-python-nlp-tutorial-d1f173ca4e3c
Example 4: Using CNN in Word2vec for sentiment analysis
https://towardsdatascience.com/another-twitter-sentiment-analysis-with-python-part-6-doc2vec-603f11832504
Example 4: Using CNN in Word2vec for sentiment analysis
https://towardsdatascience.com/another-twitter-sentiment-analysis-with-python-part-11-cnn-word2vec-41f5e28eda74
Example 4: Using CNN in Word2vec for
sentiment analysis
• See example codes
https://www.kaggle.com/code/atagunduzalp/sentiment-analysis-word2vec-cnn-and-pytorch/notebook
Example 4: Using CNN in Word2vec for sentiment
analysis
• CBOW
https://www.kdnuggets.com/2018/04/implementing-deep-learning-methods-feature-engineerin
g-text-data-cbow.html
• Skip gram
https://towardsdatascience.com/skip-gram-nlp-context-words-prediction-algorithm-5bbf34f84e
0c
• Word2Vec
• https://towardsdatascience.com/another-twitter-sentiment-analysis-with-python-part-11-cnn-
word2vec-41f5e28eda74
• https://medium.com/swlh/sentiment-classification-using-word-embeddings-word2vec-aedf28fb
b8ca