0% found this document useful (0 votes)
157 views89 pages

NLP and Sentiment Analysis

This document provides an overview of natural language processing and sentiment analysis. It discusses using NLP for applications like information extraction and automatic summarization. Additionally, it describes sentiment analysis and some common uses, such as determining sentiment towards brands on social media or analyzing customer service notes.

Uploaded by

Ahmad Alhamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
157 views89 pages

NLP and Sentiment Analysis

This document provides an overview of natural language processing and sentiment analysis. It discusses using NLP for applications like information extraction and automatic summarization. Additionally, it describes sentiment analysis and some common uses, such as determining sentiment towards brands on social media or analyzing customer service notes.

Uploaded by

Ahmad Alhamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 89

Natural Language Processi

ng and Sentiment Analysis


/Opinion Mining
Contents for Presentation

 Introduction
 Natural Language Processing
 Sentiment Analysis
 Paper Case Study 1: Sentiment Analysis Techniques
 Paper Case Study 2 : Natural Language Processing For Sentiment Analysis
Goals for today

 This is a very busy research area.


 Even the number of survey articles is large.
 It is impossible to describe all relevant research in an hour.
 My aims:
 Give you a broad overview of the field
 Show “how it works“ with examples (high-level!), give you pointers to review articles, dat
asets, tools, ...
 Encourage a critical view of the topic
 Get you interested in reading further!
NLP (Natural Langauge Processing) …?

 Natural language processing (NLP) is the ability of a computer program to u


nderstand human language as it is spoken. NLP is a component of artificial i
ntelligence (AI).
 The development of NLP applications is challenging because computers tradi
tionally require humans to "speak" to them in a programming language that i
s precise, unambiguous and highly structured, or through a limited number of
clearly enunciated voice commands. Human speech, however, is not always
precise -- it is often ambiguous and the linguistic structure can depend on ma
ny complex variables, including slang, regional dialects and social context.
Why NLP…….?

 kJfmmfj mmmvvv nnnffn333


 Uj iheale eleee mnster vensi credur
 Baboi oi cestnitze
 Coovoel2^ ekk; ldsllk lkdf vnnjfj?
 Fgmflmllk mlfm kfre xnnn!
Why NLP use……….?

 Uses of natural language processing


Most of the research being done on natural language processing revolves around sea
rch, especially enterprise search. This involves allowing users to query data sets in t
he form of a question that they might pose to another person. The machine interprets
the important elements of the human language sentence, such as those that might cor
respond to specific features in a data set, and returns an answer.
NLP can be used to interpret free text and make it analyzable. There is a tremendous
amount of information stored in free text files, like patients' medical records, for exa
mple. Prior to deep learning-based NLP models, this information was inaccessible to
computer-assisted analysis and could not be analyzed in any kind of systematic way.
But NLP allows analysts to sift through massive troves of free text to find relevant i
nformation in the files.
Why NLP…………..Other Aspects

 Classify text into categories  Extract useful information from resumes


 Index and search large texts  Automatic summarization
 Automatic translation
 Speech understanding
 Understand phone conversations
 Information retrieval
Sentiment Analysis

 The process of computationally identifying and categorizing opinions expressed in a


piece of text, especially in order to determine whether the writer's attitude towards a
particular topic, product, etc. is positive, negative, or neutral.
 Sentiment analysis is another primary use case for NLP. Using sentiment analysis, d
ata scientists can assess comments on social media to see how their business's brand
is performing, for example, or review notes from customer service teams to identify
areas where people want the business to perform better.
Sentiment Analysis

 Sentiment analysis is used to find the author’s attitude towards something. Senti
ment analysis tools categorize pieces of writing as positive, neutral, or negative.
Some tools offer sentiment score which helps with the gradation of particular e
motions.
 Sentiment score is a scaling system that reflects the emotional depth of emotion
s in a piece of text. It detects emotions and assigns them a particular value, for e
xample, from 0 up to 10 – from the most negative to most positive.
Why sentiment analysis is important?

 First of all, sentiment analysis saves time and effort because the process of sentiment extraction i
s fully automated – it’s the algorithm that analyses sentiment data and so human participation is
sparse.
 Secondly, sentiment analysis is important because emotions and attitudes towards a topic can be
come actionable pieces of information useful in numerous areas of business and research. There
are many industries which benefit from knowing the feelings of a target audience towards servic
es, policies, etc. One of the more interesting examples is the Obama administration which use
d sentiment analysis to get insight of the public’s sentiments towards policies before the 2012 ele
ction.
 And lastly, sentiment analysis is becoming a more and more popular topic as artificial intelligenc
e, machine learning and natural language processing technologies that are booming these days.
Applications
 Mainstream applications
 Review-oriented search engines
 Market research (companies, politicians, ...)
 Improve information extraction, summarization, and question answering
 Discard subjecte sentences
 Show multiple viewpoints
 Improve communication and HCI?
 Detect flames in emails and forums
 Nudge people to avoid „angry“ Facebook posts?
 Augment recommender systems: downgrade items that received a lot of negative feedback
 Detect web pages with sensitive content inappropriate for ads placement
 ...

 Well kids, I had an awesome birthday thanks to you. =D
Just wanted to so thank you for coming and thanks for th
e gifts and junk. =) I have many pictures and I will post t current
hem later. hearts mood:

What are the


characteristic words
of these two moods?

Home alone for too many


many hours,
hours, all
all week
week long
long ...
... screaming
screaming current
child, headache, tears that just won’t let themselves mood:
loose.... and now I’ve lost my wedding band. I hate this.

[Mihalcea, R. & Liu, H. (2006).


In Proc. AAAI Spring Symposium CAAW.]
Slides based on Rada Mihalcea‘s presentation.
Data, data preparation and learning
- or: sentiment analysis is generally a form of text mining

 LiveJournal.com – optional mood annotation


 10,000 blogs:
 5,000 happy entries / 5,000 sad entries
 average size 175 words / entry
 pre-processing – remove SGML tags, tokenization, part-of-speec
h tagging
 quality of automatic “mood separation”
 naïve bayes text classifier
 five-fold cross validation
 Accuracy: 79.13% (>> 50% baseline)
Results: Corpus-derived happiness factors
yay 86.67 goodbye 18.81
shopping 79.56 hurt 17.39
awesome 79.71 tears 14.35
birthday 78.37 cried 11.39
lovely 77.39 upset 11.12
concert 74.85 sad 11.11
cool 73.72 cry 10.56
cute 73.20 died 10.07
lunch 73.02 lonely 9.50
books 73.02 crying 5.50
happiness factor of a word =
the number of occurrences in the happy blogposts / the total frequency in the corpus
Computers Lack Knowledge!

 People have no trouble understanding language


Common sense knowledge
Reasoning capacity
Experience
 Computers have
No common sense knowledge
No reasoning capacity
Where does it fit in the CS taxonomy?
Types of Sentiment Analysis

 Fine-grained Sentiment Analysis involves determining the polarity of the opinion. It can be a simple bina
ry positive/negative sentiment differentiation. This type can also go into the more higher specification (for
example, very positive, positive, neutral, negative, very negative), depending on the use case (for example,
as in five-star Amazon reviews).
 Emotion detection is used to identify signs of specific emotional states presented in the text. Usually, there
is a combination of lexicons and machine learning algorithms that determine what is what and why.
 Aspect-based sentiment analysis goes deeper. Its purpose is to identify an opinion regarding a specific ele
ment of the product. For example, the brightness of the flashlight in the smartphone. The aspect-based anal
ysis is commonly used in product analytics to keep an eye on how the product is perceived and what are the
strong and weak points from the customer point of view.
 Intent Analysis is all about the action. Its purpose is to determine what kind of intention is expressed in the
message. It is commonly used in customer support systems to streamline the workflow.
Issues in aspect-/sentence-oriented SA (1)

Yesterday, I bought a Nokia ph • Object identification


one and my girlfriend bought a
moto phone. We called each ot
her when we got home. The voi
ce on my phone was not clear. T
he camera was good. My girlfri
end said the sound of her phone
was clear. I wanted a phone wit
h good voice quality. So I was s
atisfied and returned the phone t
o BestBuy yesterday.
Issues in aspect-/sentence-oriented SA (2)

Yesterday, I bought a Nokia ph • Object identification


one and my girlfriend bought a • Aspect extraction
moto phone. We called each ot
her when we got home. The voi
ce on my phone was not clear. T
he camera was good. My girlfri
end said the sound of her phone
was clear. I wanted a phone wit
h good voice quality. So I was s
atisfied and returned the phone t
o BestBuy yesterday.
Find only the aspects belonging to the high-level object

 Simple idea: POS and co-occurrence


 find frequent nouns / noun phrases
 find the opinion words associated with them (from a dictionary: e.g. for positive good, clear, amazi
ng)
 Find infrequent nouns co-occurring with these opinion words
 BUT: may find opinions on aspects of other things
 Improvement (Popescu & Etzioni, 2005): meronymy
 evaluate each noun phrase by computing a pointwise mutual information (PMI) score between the p
hrase and some meronymy discriminators associated with the product class
 e.g., a scanner class: “of scanner", “scanner has", “scanner comes with", etc., which are used to fin
d components or parts of scanners by searching the Web.
 PMI(a, d) = hits(a & d) / ( hits(a) * hits(d) )
Simultaneous Opinion Lexicon Expansion and Aspect Extraction

 Double propagation (Qiu et al., 2009, 2011): bootstrap by tasks


1. extracting aspects using opinion words;
2. extracting aspects using the extracted aspects;
3. extracting opinion words using the extracted aspects;
4. extracting opinion words using both the given and the extracted opinion words.
 Adaptation of dependency grammar:
 direct dependency : one word depends on the other word without any additional words in their dependency p
ath or they both depend on a third word directly.
 POS tagging: Opinion words – adjectives; aspects - nouns or noun phrases.
 Input: Seed set of opinion words
Simultaneous Opinion Lexicon Expansion and Aspect Extraction

 Example
mod
 “Canon G3 produces great pictures”

 Rule: `a noun on which an opinion word directly depends through mod is taken as an aspect‘  allows extra
ction in both directions
Issues in aspect-/sentence-oriented SA (3)

Yesterday, I bought a Nokia ph • Object identification


one and my girlfriend bought a • Aspect extraction
moto phone. We called each ot • Grouping synonyms
her when we got home. The voi
ce on my phone was not clear. T
he camera was good. My girlfri
end said the sound of her phone
was clear. I wanted a phone wit
h good voice quality. So I was s
atisfied and returned the phone t
o BestBuy yesterday.
Grouping synonyms

 General-purpose lexical resources provide synonym links


 E.g. Wordnet

 But: domain-dependent:
 Movie reviews: movie ~ picture
 Camera reviews: movie  video; picture  photos
 Carenini et al (2005): extend dictionary using the corpus
 Input: taxonomy of aspects for a domain
 similarity metrics defined using string similarity, synonyms and distances measured using WordNet
 merge each discovered aspect expression to an aspect node in the taxonomy.
WordNet
Issues in aspect-/sentence-oriented SA (3)

Yesterday, I bought a Nokia ph • Object identification


one and my girlfriend bought a • Aspect extraction
moto phone. We called each ot • Grouping synonyms
her when we got home. The voi • Opinion orientation classification
ce on my phone was not clear.
The camera was good. My girl
friend said the sound of her pho
ne was clear. I wanted a phone
with good voice quality. So I w
as satisfied and returned the pho
ne to BestBuy yesterday.
Issues in aspect-/sentence-oriented SA (3)

Yesterday, I bought a Nokia ph • Object identification


one and my girlfriend bought a • Aspect extraction
moto phone. We called each ot • Grouping synonyms
her when we got home. The voi • Opinion orientation classification
ce on my phone was not clear.
The camera was good. My girl
friend said the sound of her pho
ne was clear. I wanted a phone
with good voice quality. So I w
as satisfied and returned the pho
ne to BestBuy yesterday.

Small phone – small battery life


Opinion orientation

 Start from lexicon


 E.g. dictionary SentiWordNet
 Assign +1/-1 to opinion words, change according to valence shifters (e.g. n
egation: not etc.)
 But clauses (“the pictures are good, but the battery life ...“)
 Dictionary-based: Use semantic relations (e.g. synonyms, antonyms)
Opinion orientation

 Corpus-based:
 learn from labelled examples
 Disadvantage: need these (expensive!)
 Advantage: domain dependence
Issues in aspect-/sentence-oriented SA (3)

Yesterday, I bought a Nokia ph • Object identification


one and my girlfriend bought a • Aspect extraction
moto phone. We called each ot • Grouping synonyms
her when we got home. The voi • Opinion orientation classification
ce on my phone was not clear.
• Integration / coreference resolution
The camera was good. My girl
friend said the sound of her ph
one was clear. I wanted a phon
e with good voice quality. So I
was satisfied and returned the p
hone to BestBuy yesterday.
Coreference resolution: Special characteristics in sentiment analys
is

 A well-studied problem in NLP


 Ding & Liu (2010): object&attribute coreference
 Comparative sentences and sentiment consistency:
 “The Sony camera is better than the Canon camera. It is cheap too.”  It = Sony
 Lightweight semantics (can be learned from corpus):
 “The picture quality of the Canon camera is very good. It is not expensive either.”  It
= camera
Not all sentences/clauses carry sentiment

Yesterday, I bought a Nokia pho • Neutral sentiment


ne and my girlfriend bought a m
oto phone. We called each other
when we got home. The voice o
n my phone was not clear. The c
amera was good. My girlfriend
said the sound of her phone was
clear. I wanted a phone with go
od voice quality. So I was satisfi
ed and returned the phone to Be
stBuy yesterday.all phone – sma
ll battery life.
Not all sentences/clauses in a review carry sentiment
neutral
neutral
“Headlong’s adaptation of George Orwell’s ‘Nineteen Eighty-Four’ is such a sense-overloadingly visceral exper
positive
ience that it was only the second time around, as it transfers to the West End, thatpositive
I realised quite how political i
t was.
Writer-directors […] have reconfigured Orwell’s plot, making it less about Stalinism, more about state-sponsor
ed torture. Which makes great, queasy theatre, as Sam Crane’s frail Winston stumbles negative?
through 101 minutes of d
negative?
isorientating flashbacks, agonising reminisce, blinding lights, distorted roars, walls that explode in hails of spar
ks, […] and the almost-too-much-to-bear Room 101 section, which churns past like ‘The Prisoner’ relocated to
Guantanamo Bay. Neutral?
Neutral?
[…] Crane’s traumatised Winston lives in two strangely overlapping time zones – 1984 and an unspecified pres
ent day. The former, with its two-minute hate and its sexcrime and its Ministry of Love, clearly never happened
. But the present day version, in which a shattered Winston groggily staggers through a 'normal' but entirely ind
ifferent world, is plausible. Any individual who has crossed the state – and there are some obvious examples –
could go through what Orwell’s Winston went through. Second time out, it feels like an angrier and more emoti
onally righteous play.
Some weaknesses become more apparent second time too.”
Linguistics Levels of Analysis

 Speech
 Written language
 Phonology: sounds / letters / pronunciation
 Morphology: the structure of words
 Syntax: how these sequences are structured
 Semantics: meaning of the strings
 Interaction between levels
Issues in Syntax

“the dog ate my homework” - Who did what?


1. Identify the part of speech (POS)
Dog = noun ; ate = verb ; homework = noun
English POS tagging: 95%

2. Identify collocations
mother in law, hot dog
Compositional versus non-compositional collocates
Issues in Syntax

 Shallow parsing:
“the dog chased the bear”
“the dog” “chased the bear”
subject - predicate
Identify basic structures
NP-[the dog] VP-[chased the bear]
Issues in Semantics

 Understand language! How?


 “plant” = industrial plant
 “plant” = living organism
 Words are ambiguous
 Importance of semantics?
 Machine Translation: wrong translations
 Information Retrieval: wrong information
 Anaphora Resolution: wrong referents
Issues in Information Extraction

 “There was a group of about 8-9 people close to the entrance on Hi


ghway M1”
 Who? “8-9 people”
 Where? “highway M1”

 Extract information
 Detect new patterns:
 Hidden information etc.
 Gov./mil. puts lots of money put into IE research
Sentiment Analysis Tasks

 Simplest task:
 Is the attitude of the text positive or negative?
 More complex:
 Rank the attitude of the text from 1 to 5
 Advanced:
 Detect the target, source, or complex attitude types
How does Sentiment Analysis work?

Sentiment analysis aims at finding an opinionated point of view and its disposition and highlighting the inform
ation of particular interest in the process. It is applied for the following operations:
 Find and extract the opinionated data (aka sentiment data) on a specific platform (customer support, review
s, etc.)
 Determine its polarity (positive or negative)
 Define the subject matter (what is being talked about in general and specifically)
 Identify the opinion holder (on its own and in correlation with the existing audience segments)
More specifically, depending on the purpose, sentiment analysis algorithm can be used at the following scopes:
 Document-level - for the entire text.
 Sentence-level - obtains the sentiment of a single sentence.
 Sub-sentence (Word) level - obtains the sentiment of sub-expressions within a sentence.
Document-level sentiment analysis

 The major challenges on document-level sentiment analysis are cross-domain sentiment analysis
and cross-language sentiment analysis. It has been shown that specific domain-oriented sentime
nt analysis has achieved remarkable accuracy, which is highly sensitive to the domain.
 The feature vector used in these tasks contains a bag of words, which should be specific to a part
icular domain and are limited. Sentiment classifier is applied as it is costly to annotate data for e
ach new domain. Spectral feature alignment, structural correspondence learning, and sentiment-s
ensitive thesaurus are three classical techniques. They are different in terms of feature vector exp
ansion, words relatedness measurement, and finally classifier used for classification.
 Many methods used for cross-domain classification usually utilize labeled or unlabeled data or b
oth of them. Hence, the techniques give different results for different domains as well as for diff
erent purposes.
Document-level sentiment analysis

 Bollegala et al. [1] developed a technique which uses sentiment-sensitive thesaurus (SST) for pe
rforming cross-domain sentiment analysis. They proposed a cross-domain sentiment classifier us
ing an automatically extracted sentiment-sensitive thesaurus.
 To handle the mismatch between features in cross-domain sentiment classification, they utilized
labeled data from multiple source domains and unlabeled data from target domains to compute t
he relatedness of features and construct a sentiment-sensitive thesaurus.
 Then the created thesaurus is used to expand feature vectors during training and testing process f
or a binary classifier. A relevant subset of the features is selected using L1 regularization.

[1] Bollegala D, Weir D, Carroll J (2013) Cross-domain sentiment classification using a sentiment sensitive thesaurus. IEEE
Trans Knowl Data Eng 25(8):1719–1731
Document-level sentiment analysis

 Another document-level sentiment analysis is cross-language sentiment analysis, which has been
studied by several researchers. Most of them focus on sentiment classification at the document le
vel.
 Xia et al. [1] proposed a three-stage cascade model for the polarity shift problem in the context o
f document-level sentiment classification, in which each document is split into a set of sub-sente
nces, and a hybrid model is built up with employing rules and statistical methods to detect explic
it and implicit polarity shifts. Then, a polarity shift elimination method is used to remove polarit
y shift in negations. Finally, different types of polarity shifts are used to train base classifiers.
 Li et al. [2] proposed a cross-lingual structural correspondence learning SCL based on the distrib
uted representation of words; it can learn meaningful one-to-many mappings for pivot words usi
ng large amounts of monolingual data and a small dictionary.

[1] Xia R, Xu F, Yu J, Qi Y, Cambria E (2016) Polarity shift detection, elimination and ensemble: a three-stage model for document-level sentiment analysis. Inf Process Manag
52(1):36–45
[2] Li N, Zhai S, Zhang Z, Liu B (2017) Structural correspondence learning for cross-lingual sentiment classification with one-to-many mappings. AAAI 2017:3490–3496
Sentence-level sentiment analysis

 Sometimes document-level sentiment analysis is too coarse for some special purposes. A lot of
early work at sentence-level analysis focuses on identifying subjective sentences. But there will
be complex tasks such as dealing with conditional sentences or dealing with sarcastic sentences.
In such cases, sentence-level sentiment analysis is desirable.
 Wu et al. [1] proposed an approach for sentence-level sentiment classification without labeling s
entence. It is a unified framework to incorporate two types of weak supervision with document-l
evel and word-level sentiment labels, to learn the sentence-level sentiment classifier.

[1] Wu F, Zhang J, Yuan Z, Wu S, Huang Y, Yan J (2017) Sentence-level sentiment classification with weak supervision. In:
Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, 2017.
ACM, pp 973–976
Word-level sentiment analysis

 Document-level analysis focuses on distinguishing the entire document from subjective or objective, and p
ositive or negative, while sentence-level analysis is more effective than document-level analysis, because a
document contains both subjective and objective sentences.
 While word is the basic unit of language, the polarity of a word is closely related to the subjectivity of co
rresponding sentence or document. There exists a huge possibility that a sentence containing an adjective is
a subjective sentence. In addition, choice of word for expression reflects not only the individual’s demog
raphic characteristic such as gender, age, but also reflects its motivation, personality, social status, and
other psychological or social traits. Therefore, word is the basis of text sentiment analysis.
 At present, the commonly used methods include: natural language processing technology-based approach a
nd machine learning-based approach. For sentiment analysis of micro-blog text, most researchers suggest t
hat term matching-based technique should be adopted. The emotional term is the link between the emotion
al orientation of the text and the single word. Each word can be regarded as the collection of certain kinds o
f viewpoint information, which is a clue to the emotion and subjectivity of the text. .
Units of analysis, methods, features
The unit of analysis

 community
 another person
 user / author
 document
 sentence or clause
 aspect (e.g. product feature)
The analysis method

 Machine learning
 Supervised
 Unsupervised
 Lexicon-based
 Dictionary
 Flat

 With semantics
 Corpus
 Discourse analysis
Features

 Features:
 Words (bag-of-words)
 N-grams
 Parts-of-speech (e.g. Adjectives and adjective-adverb combinations)
 Opinion words (lexicon-based: dictionary or corpus)
 Valence intensifiers and shifters (for negation); modal verbs; ...
 Syntactic dependency
Features

 Feature selection based on


 frequency
 information gain
 Odds ratio (for binary-class models)
 mutual information
 Feature weighting
 Term presence or term frequency
 Inverse document frequency ( TF.IDF)
 Term position : e.g. title, first and last sentence(s)
Stanford Parser

A natural language parser is a program that works out the grammatical structure of


sentences, for instance, which groups of words go together (as "phrases") and which
words are the subject or object of a verb.
Tasked performed by Stanford Parser
 POS Tagging
 Dependency Tree
 Dependency Relation
Working……….
A SIMPLE EXAMPLE
RNN sentiment analysis (using an embedding layer)

input layer (learned words)

[ “go”, “fuck”, “yourself” ]

RNN

output T+1 T T-1

embedding layer (for relationships)


THE PROBLEM (s)

The current approach to sentiment analysis (providing the Other problems related to the common approach come
RNN behaves as expected) suffers from the accuracy of from the type of NN used for prediction.
prediction being highly depended on the quality of the
encoded data. The most common being:
For example, one common approach known as hot-
encoding does not provide a reliant encoding as similar GRADIENT VANISHING
terms that might share contextual meaning are encoded as GRADIENT EXPLODING
separate entities.
I like cats. Kittens are really cute. These problems are however easily fixable through the
1000000 I use of the more advanced LSTM model, which allows a
0 1 0 0 0 0 0 like NN to ”forget” information.
0 0 1 0 0 0 0 cats
0 0 0 1 0 0 0 Kittens
0 0 0 0 1 0 0 are
0 0 0 0 0 1 0 really
0 0 0 0 0 0 1 cute
Case Study 1 => Analysis on Tweets
Natural Language Processing for Sentiment Analysis

 Introduction
 Sentiment Lexicon Model
 Tweets Sentiment Analysis
 Extract Sentiment Based on Subject
Sentiment Lexicon Model

 There are many ways to analyze sentiment, but the most common way being use
d today is the lexicon-based model. This model uses a dictionary of words that a
re annotated (by humans) with their polarity (good or bad) and strength (how go
od or how bad). The lexicon-based sentiment analyzer combs through text and pi
cks out specific words or phrases, called tokens, and classifies their polarity and
strength to capture the text’s opinion towards its main subject matter.
Paper Abstract Points

Sentiments Classification

 Subjectivity Classification
 Semantic Association
 Polarity Classification
Proposed System

 A system is proposed to carry out sentiment analysis on tweets based on specific


topic. Several pre-processes steps have been carried out to clean the noise in twe
ets and present tweets in formal language. To determine the sentiment pf tweets,
NLP is implemented to find out the subjective portion of tweets that associates to
subject, and classify the sentiment of tweets. The tweets will be labeled as positi
ve, negative or neutral.
Proposed System
Overview of Frame Work

 Tweets were extracted from a Twitter database for experiment. All tweets were m
anually labelled as positive, negative and neutral. This set of tweets was used to
evaluate the performance of the proposed system, using metrics such as the accur
acy and precision of the predictive result.
 To present the tweets in structured manner, some preprocessing have been done
on the dataset before being further analyzed by the proposed system. Pre-process
ing ensures that the tweets will be prepared in formal language format that can b
e read and understood by machine. After pre-processing, sentiment of tweets can
be determined through sentiment classification.
Classification

 Sentiment classification: subjectivity classification, semantic association and pol


arity classification.
Subjectivity classification was carried out to judge whether the tweets are subjective
or objective, and subjective tweets went forward for semantic association to find out
the sentiment lexicons that associates to subject. The sentiment classification predict
ed the tweets as positive,
Data Set

A data set is a collection of related, discrete items of related data that may be access
ed individually or in combination or managed as a whole entity.

A total of 1513 tweets were extracted from Twitter and manually labelled. These tw
eets contain keyword ‘Unifi’, which is a telecommunication service in Malaysia. It i
s also the subject in sentiment classification. There are 345 positive tweets, 641 nega
tive tweets and 531 neutral tweets. Tweets were analyzed by the proposed system to
obtain the predictive sentiment.
Analyzing Tweets

 Alchemy API and Weka


Alchemy API applies NLP techniques in sentiment analysis, while Weka is a tool whic
h performs data mining using machine learning algorithms.

 Naïve Bayes
 Decision Trees
 Support Vector Machines
Preprocessing

Pre-processing aims to process and present the tweets in an organized format an


d increase the machine understanding on the text, as most tweets are in the form of u
nstructured text. It includes URLs and hashtags removal, special symbol replacemen
t, repeated characters removal, abbreviations and acronyms expansion, and subject c
apitalization.
Special symbol are replaced as words to avoid confusion
during text processing, for example, ‘>’ replaced by
‘greater’, ‘&’ replaced by ‘and’.
For example, “I’m not going to work 2mr”
expands to “I am not going to work tomorrow”.
Sentiment Classification
Subjectivity Classification

 Subjective classification differentiates the tweets into subjective or objective. The system scan
s the tweets word by word, and finds out the word that contains sentiment. If the word in the t
weet carries positive or negative sentiment weightage, the tweet will be classified as subjective
. Else, it will be objective, is which also neutral.
For example, Come and get internet package” or Come and get new internet package”
The first tweet does not have any word that carries sentiment score. It will be classified as
Objective and tagged as Neutral. While in second tweet, “new” is a word that has sentiment
score. The tweet will be classified as subjective, and proceed to next step for semantic asso
ciation.
Semantic Association

In semantic association, the sentiment lexicons that associate to subject are bein
g defined through grammatical relationships between subject and sentiment lexicons
. As tweets are mostly short and straight forward, the grammar structure is simpler t
han normal text. Sentiment lexicons that mostly associate with subject will be adject
ives or verbs.

Sentiment Lexicon  “I Love Unify”


I Love Unify (Semantic Association)
Unifi is better than M…?
Polarity Classification

 In polarity classification, subjective tweets are classified as positive or negative.


The sentiment of tweets is classified based on the sentiment lexicons that associa
te to subject.

 For example, ‘I love Unifi’, the verb ‘love’ is the sentiment lexicons. While chec
king with Senti WordNet, ‘love’ has a positive score of 0.625. Hence, we can con
clude that ‘Unifi’ has Positive sentiment, thus classify the tweet as Positive.
Cont….

 For comparative opinion, the position of the subject is very important. For instan
ce, in the tweet ‘Unifi is better than M’, adjective ‘better’ is found, but there are
2 Subjects – ‘Unifi’ and ‘M’. The subject that exists after the comparative adjecti
ve carries a contrast sentiment with the Subject that appears before. In this case,
as ‘better’ carries a Positive score of 0.825, ‘Unifi’ will be classified as positive,
and ‘M’ will be classified as negative
Alchemy API

 Alchemy API is an IBM-owned company that uses machine learning (specificall


y, deep learning) to do natural language processing (specifically, semantic text an
alysis, including sentiment analysis) and computer vision (specifically, face detec
tion and recognition)
Weka Using Machine Learning Algo

 Weka is a collection of machine learning algorithms for data mining tasks.


Naïve Bayes
Naive Bayes classifier gives great results when we use it for textual data analysis. S
uch as Natural Language Processing.
Decision tree
Decision Tree algorithm belongs to the family of supervised learning algorithms. Un
like other supervised learning algorithms, decision tree algorithm can be used for sol
ving regression and classification problems too..
SVM
Support vector machines are supervised learning models with associated learning al
gorithms that analyze data used for classification
Results and Comparison
Cont…..
Cont…..
Conclusion

 Twitter is a very popular social media platform. In this paper, we present the prel
iminary results of our proposed system that corporates NLP technique to extract
subject from tweets, and classify the polarity of tweets by analyzing sentiment le
xicons that are associated to subject.
 From the experiments, the proposed system performs better compared to Alchem
y API, but still need to be improved as SVM is doing better. For future works, th
e focus will be on how to enhance the accuracy of sentiment analysis.
WWW 2018, April 23-27, 2018, Lyon, France

PROPOSED SOLUTION NR. 1


CASE STUDY 2 In order to deal with the drawbacks of the typical approach to sentiment
analysis, the authors propose an Attention based RNN-Capsule model.
sentiment analysis
by capsules Shortly, the model uses an LSTM model to encode the given inputs, which
are fed to multiple capsules (one capsule for each predictable sentiment).
The mentioned capsules use an attention mechanism to produce a
Yequan Wang, Aixin Sun representation of the input according to the capsule’s weight matrix.
Jialong Han, Ying Liu, Xiaoyan Zhu The capsule will then calculate the probability that the capsule will activate
and will attempt to reproduce the initial representation fed to the capsule.

The given solution has excelled in predicting sentiments on given data-sets


with having specialized linguistic knowledge on the given context.
A DEEPER LOOK INPUT

The model proposed by the team encodes a given instance


through an RNN network layer. RNN

The capsules embed the attention mechanism, the probability


of the prediction, and the representation of the instance.
The attention mechanism will interpret the encoded instance.
The probability represents how well the sentiment belonging
to the capsule is expected to represent the instance.
The reconstruction module attempts to reconstruct the initial
representation given to the capsule.

σ X CAP 1 CAP 2
THE MODEL OF A CAPSULE
REP. VECTOR Each model will be introduced by
itself later within the presentation. P1 REP. 1 REP. 2 P2

ATTENTION

Figure inspired from the original paper.


RECURRING NEURAL NETWORK

 The recurring neural network has the purpose of Vanishing gradient


encoding the text-based input. Given their nature,
RNN can use previously encountered information in The picture has been taken from:
https://machinelearningmastery.com/
order to better interpret new information.
All the credits belong to Jason Brownlee.

In this case the team has used LSTM type models


for this purpose, which are superior to standard
models due to their ability to prevent gradient
vanishing or exploding.

The instance representation (encoded) is an average


of the hidden vectors from the RNN.
 

The attention model proposed by the team calculates the capsule’s


representation as the sum of the products given by a word and its
”attention importance” over all the inputs given of the module.

The “attention importance” represents the weight given to a term by


ATTENTION the attention module.

The attention layer interprets the In simple terms, the attention mechanism is able to look at how well
content of the encoded instance. encoded weighted terms relate to the given context.

Attention matrix example

The picture has been taken from:


https://medium.com/@kion.kim/

All the credits belong to the author.

The given formulas have been provided within the paper.


 

The probability that a cell will activate is given by the product between
a learned weight and the representation of the capsule combined with
a learned bias.

These two arbitrary values are set during the supervised training of
the model, based on two goals described in the paper.

PROBABILITY  

The probability module calculates the


changes that a given capsule will activate. LEARNING GOAL I

Maximizing the probability that the capsule matching the true


sentiment will activate while minimizing the probability
that the other capsules will activate.

The given formulas have been provided within the paper.


 

The reconstruction model uses the product between the probability a


capsule will activate and the capsule’s representation vector.

The representation layer also is related to the second learning goal


mentioned below.

RECONSTRUCTION
 
The reconstruction module attempts to
replicate the encoded input given to the
capsule by using the capsule’s LEARNING GOAL II
representation.
Minimizing the error margin of the reproduction outputted by the
matching capsule while maximizing the error margin outputted
by the other capsules.

The given formulas have been provided within the paper.


RESULTS

The presented model has been trained and tested on three datasets.

Following the procedure, the model has been proven to be competitive


scoring the highest accuracy in the “Hospital Review” dataset,
the highest accuracy in the “Movie Review” dataset, and the
2nd highest accuracy in the “Stanford Sentiment” dataset.
WWW 2018, April 23-27, 2018, Lyon, France

PROPOSED SOLUTION NR. 2


The LANN model proposed within the paper attempts to improve the
performance of CNN models, by using the attention mechanism within CASE STUDY 3
a two stage sentence modeling process.
LANN
linguistic-aware attention network
This combination allows the model to interpret linguistic information,
with highly competitive results.

Zeyang Lei
The two stage process: Yujiu Yang
Firstly, the model analyses the input and decides which contextual Yi Liu
information is relevant to the feeling associated to given input.
Secondly, the model analyses sentence wise structures to form a
representation for the given input.
A DEEPER LOOK

As mentioned, the model uses a two


step process to process and analyze
input data. WORD LEVEL ATTENTION CNN

The two steps are named by the


authors as:
PHRASE-LEVEL ATTENTION
I. Word level interactive attention
INPUT
II. Phrase-level dynamic attention
As a note, the phrase-level dynamic
attention is combined with a CNN for
facilitating separating sentences into
manageable slices.

Note: each of these layers will be explained by themselves.


 
WORD-LEVEL
INTERACTIVE ATTENTION   word-level interactive attention layer maps individual
The
words to individual feelings to see.
(the C matrix)

One attention weight vector is generated for the context


words and one for the sentiment words.
These vectors are generated by applying a softmax function
over the average values or all rows and respectively columns
of the C matrix.
(, )

Correlation vectors for the contextual words and for the


sentiment words are then generated.

Finally, a “sentiment-enhanced” representation and a


“contextual-enhanced” representation is created.
W is the notation for this representations. U represents projection parameters.
The operator refers to the concatenation operator.

The given formulas have been provided within the paper.


 
SEMANTIC LEVEL LAYER The semantic level layer is composed of a convolutional layer and
a phrase-level attention layer.

This layer has the role of producing a semantically relevant


sentence representation.

Firstly, the convolutional layers extracts different semantic


structures from the previously obtained representations.
This layer outputs a feature map for both contextual words and
sentimental words.
The feature maps are labeled as: Hc, Hs.

The features maps are then fed to the phrase-level attention layer
that will process a “sentiment-specific” sentence representation.
NOTE: the sentiment-specific sentence representation is fed to a softmax layer in
order to predict the degree of each sentiment found within a particular sentence.
In training, the model attempts to minimize a cross-entropy error rate.

The given formulas have been provided within the paper.


RESULTS

The presented model has been trained and tested on two datasets.
They are as following: Movie Review, Stanford Sentiment Treebank.

Following the procedure the model has scored the highest accuracy
of all the tested models.
CONCLUSIONS
The mentioned papers attempt to deal with the
difficulty of translating linguistic information into
usable ML parameters through the use of ATTENTION.
Both suggested models have accomplished competitive
results.

It might, however, be beneficial to combine the attention mechanism


with other highly performing models, such as the LR-Bi-LSTM.

Thank you.

You might also like