Twitter Sentiment Analysis - Final - Report Copy Sahil
Twitter Sentiment Analysis - Final - Report Copy Sahil
of
BACHELOR OF TECHNOLOGY
In
INFORMATION TECHNOLOGY
By
Sahil Khunger ( 43715603117)
i
CERTIFICATE
CANDIDATES DECLARATION
I hereby certify that the work, which is being presented in the project synopsis, entitled
Twitter Sentiment Analysis, in partial fulfilment of the requirement for the award of the
Degree of Bachelor of Technology and submitted to the institution is an authentic record
of my own work carried out during the period 2019 – Present.
iii
TABLE OF CONTENTS
ABSTRACT...................................................................................................vi
ACKNOWLEDGEMENT….........................................................................vii
LIST OF FIGURES......................................................................................viii
CHAPTER 1 INTRODUCTION
1.1 Purpose................................................................................................9
1.2 Objective............................................................................................10
1.3 Motivation….....................................................................................11
1.4 Definition Overview…......................................................................12
4.1 Results.............................................................................................................20
4.2 Future Work....................................................................................................21
REFERENCES...........................................................................................................22
APPENDIX................................................................................................................23
ABSTRACT
This project addresses the problem of sentiment analysis in twitter; that is classifying
tweets according to the sentiment expressed in them: positive, negative or neutral.
Twitter is an online micro-blogging and social-networking platform which allows users
to write short status updates of maximum length 140 characters. It is a rapidly expanding
service with over 200 million registered users - out of which 100 million are active users
and half of them log on twitter on a daily basis - generating nearly 250 million tweets per
day. Due to this large amount of usage we hope to achieve a reflection of public
sentiment by analyzing the sentiments expressed in the tweets. Sentiment analysis has
many applications in different domains including, but not limited to, business
intelligence, politics, sociology, etc. Recent years, on the other hand, have witnessed the
advent of social networking websites, microblogs, wikis and Web applications and
consequently, an unprecedented growth in user-generated data is poised for sentiment
mining. Data such as web-postings, Tweets, videos, etc., all express opinions on various
topics and events, offer immense opportunities to study and analyze human opinions and
sentiment Analyzing the public sentiment is important for many applications such as
firms trying to find out the response of their products in the market, predicting political
elections and predicting socioeconomic phenomena like stock exchange. The aim of this
project is to develop a functional classifier for accurate and automatic sentiment
classification.
6
ACKNOWLEDGEMENTS
I am deeply thankful to my advisor. Mr. Shailendra Singh for helping me throughout the
course in accomplishing my final project. His guidance, support and motivation enabled
me in achieving the objectives of the project.
LIST OF FIGURES
1.1 Purpose
This project addresses the problem of sentiment analysis in twitter; that is classifying
tweets according to the sentiment expressed in them: positive, negative or neutral.
Twitter is an online micro-blogging and social-networking platform which allows users
to write short status updates of maximum length 140 characters. It is a rapidly expanding
service with over 200 million registered users - out of which 100 million are active users
and half of them log on twitter on a daily basis - generating nearly 250 million tweets per
day. Due to this large amount of usage we hope to achieve a reflection of public
sentiment by analyzing the sentiments expressed in the tweets. Sentiment analysis has
many applications in different domains including, but not limited to, business
intelligence, politics, sociology, etc. Recent years, on the other hand, have witnessed the
advent of social networking websites, microblogs, wikis and Web applications and
consequently, an unprecedented growth in user-generated data is poised for sentiment
mining. Data such as web-postings, Tweets, videos, etc., all express opinions on various
topics and events, offer immense opportunities to study and analyze human opinions and
sentiment Analyzing the public sentiment is important for many applications such as
firms trying to find out the response of their products in the market, predicting political
elections and predicting socioeconomic phenomena like stock exchange. The aim of this
project is to develop a functional classifier for accurate and automatic sentiment
classification.
Questions
Main question: How can Twitter tweets be automatically and accurately classified with
respect to their sentiment?
In this project the main goal is to accurately classify tweets with respect to their sentiment.
This is realized by developing a tool which can classify the tweets.
This project has been chosen to work with twitter since it is a better approximation of
public sentiment as opposed to conventional internet articles and web blogs. The reason
is that the amount of relevant data is much larger for twitter, as compared to traditional
blogging sites. Moreover, the response on twitter is more prompt and also more general
(since the number of users who tweet is substantially more than those who write web
blogs on a daily basis).
Sentiment analysis (also known as opinion mining) refers to the use of text analysis and
to identify and extract subjective information in source materials. Sentiment analysis is
widely applied to reviews and social media for a variety of applications, ranging from
marketing to customer service.
Sentiment analysis is the multidisciplinary field of study that deals with analyzing
people’s sentiments, attitudes, emotions and opinions about different entities such as
products, services, individuals, companies, organizations, events and topics and includes
multiple fields such as information retrieval, machine learning and artificial intelligence.
It is set of computational and NLP based techniques which could be leveraged in order to
extract subjective information in a given text unlike factual information, opinions and
sentiments are subjective.
Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a
writer with respect to some topic or the overall contextual polarity of a document. The
attitude may be his or her judgment or evaluation affective state (that is to say, the
emotional state of the author when writing), or the intended emotional communication
(that is to say, the emotional effect the author wishes to have on the reader.
▪ Public Actions: Sentiment analysis also is used to monitor and analyze social
phenomena, for the spotting of potentially dangerous situations and determining
the general mood of the public.
CHAPTER 2: OVERALL DESCRIPTION
The main perspective of Sentiment Analysis refers to the use of text analysis to identify
and extract subjective information in textual contents. There are two type of user-
generated content available on the web – facts and opinions. Facts are statements about
topics and in the current scenario, easily collectible from the Internet using search
engines that index documents based on topic keywords. Opinions are user specific
statement exhibiting positive or negative sentiments about a certain topic. Generally,
opinions are hard to categorize using keywords. Various text analysis and machine
learning techniques are used to mine opinions from a document. Sentiment Analysis
finds its application in a variety of domains.
Installation Process:
Tweepy: tweepy is the python client for the official Twitter API.
9ODKKNtTFpvf3Sdf7JxuWMxiJ (API key)
794095176427913216-mMiuQhWxaMVysL79czYW0tchubNCeYp (Access token)
Named Entity Recognition - What is the person actually talking about, e.g. is
300 Spartans a group of Greeks or a movie?
Parsing - What is the subject and object of the sentence, which one does the verb
and/or adjective actually refer to?
Sarcasm - If you don't know the author you have no idea whether 'bad' means
bad or good.
It assumes that data collected are from real account that is all accounts
are real on data and no account is parody.
Inputs: The software will receive input from two sources. First, the user
interface and second, the Twitter API. The user interface will supply the
keywords and the analysis session duration, while the Twitter API will
supply the Tweet text.
Outputs: The output is showing the current mood of the Twitter
community on a given topic in the form of a simple gauge.
Operating System: The software will run on the Microsoft Windows 8.1
and Mac OS 10 operating system.
Retrieving Input:
The software will receive three inputs: keywords and Tweets.
Real-Time Processing:
The software will take input, process data, and display output in real-time.
This will enforce that the snapshot provided by the simple gauge is a
current view of the Twitter community’s mood on the chosen topic.
Sentiment Analysis:
Sentiment analysis will be performed on the user-specified keywords
within the Tweet to determine the overall mood of the Tweet relative to
the topic. The sentiment analysis will provide a negative, neutral, or
positive numeric sentiment value.
Output:
The software must output real time data in the form of a simple gauge. In
addition, the software may output a graph of mood trends over time, as
well as additional statistics pertaining to a topic (average sentiment over
all analysis sessions and total number of tweets processed). This output
should be clear and easy to understand.
Availability:
The software will be available at all times on the user’s device, as long as
the device is in proper working order. The functionality of the software
will depend on any external services such as internet access that are
required. If those services are unavailable, the user should be alerted.
Security:
The software should never disclose any personal information of Twitter
users, and should collect no personal information from its own users.
Maintainability:
The software should be written clearly and concisely. The code will be
well documented. Particular care will be taken to design the software
modularly to ensure that maintenance is easy.
Portability: This software will be designed to run on any Python version
2.7 or higher. The software will be forward compatible for all currently
released Python versions.
The software will provide up-to-date information, limited only by the rate of Twitter
input. The gauge output should display the latest results at all times, and if it lags
behind, the user should be notified.
CHAPTER 4: RESULTS AND FUTURE WORK
4.1 Results
Positive tweets percentage: 31.746031746031747 %
Negative tweets percentage: 28.571428571428573 %
Neutral tweets percentage: 39.682539682539684 %
Positive tweets:
RT @Wyn1745: Trump whistleblower reportedly had ‘professional relationship’ with 2020 Democrat -
Whistleblower should be arrested for trea…
RT @brianstelter: The first excerpt from "All the President's Women: Donald Trump and the Making of a Predator," by Barry Levine and
Moniqu…
@Brooke_Kelly87 @realDonaldTrump Keep America Great. Re-elect President Donald J. Trump.
RT @axios: Lindsey Graham tells @jonathanvswan: "If I hear the president say one more time, 'I made a campaign promise to get out of Syria,
…
RT @maddenifico: "He grabbed me there in the front":
Sexual predator Trump allegedly hid behind a tapestry to sexually assault a woman at…
RT @JoeNBC: Ominous for White House: For the first time, 50% of voters support impeaching and removing Donald Trump from office.
https://t…
RT @joncoopertweets: Donald Trump Promised to Eliminate the Deficit in 8 Years. So Far, He Has Increased it by 68%.
https://t.co/UsINtJh5CP
RT @stuartpstevens: A lot of Christian Kurds are facing death because Donald Trump abandoned them. Will the Trump supporting
Evangelicals s…
RT @Will_Bunch: Good morning. There are now 43 NEW allegations of sexual misconduct against Donald Trump. It’s barely 8 a.m.
https://t.co/B…
RT @DearAuntCrabby: Donald Trump's Minneapolis rally to be met with mass protests: "'America First' is a racist lie"
Trump Logic:
"Let's…
Negative tweets:
RT @SexCounseling: @realDonaldTrump They will play their dirty tricks up until election day. That will not stop the red wave and the re el…
RT @jsolomonReports: Ukraine opened new investigation into Hunter Biden linked firm months before Donald Trump’s call with Ukraine’s
presid…
@realDonaldTrump Turkey said its military will cross the border into northern Syria "shortly." DONALD J TRUMP IS A…
https://t.co/96IwQRDtNe
RT @thebradfordfile: With Nancy Pelosi busy raising cash for Trump’s re-election with her fake impeachment, the president will do his part…
RT @RWPUSA: Harvard Psychologist Says Donald Trump's Claims About Destroying Turkey's Economy Would 'Normally Trigger a Mental
Health Hold'…
RT @nitrogenic: Donald Trump's claim that the military ran out of ammunition is "not true," House Armed Services Committee member says
http…
RT @RWPUSA: Between Russia, Turkey and the United States the situation in Syria is an emolumental mess.
It's Barack Obama holding a rally in Minneapolis' Target Center and only being charged $20,0…
RT @Strandjunker: @realDonaldTrump Why #ImpeachmentTaskForce?!
Above is a result of sentiment analysis is done on “Donald Trump”. Above figure is showing the percentage of positive,
negative and neutral tweets on “Donald Trump” are 31.74%, 28.57% and 39.68% respectively.
4.2 Future work
[1] M. Bautin, L. Vijayrenu L, and Skenia., “International sentiment analysis for news
and blogs”, In Second International Conference on Weblogs and Social Media
(ICWSM), 2008.
[3] B. Pang and L. Lee., “Opinion mining and sentiment analysis”, Foundations and Trends
in Information Retrieval, vol. 2, no. 1-2, pp. 1-135, 2008.
[4] B. Pang and L. Lee., “Using very simple statistics for review search: An exploration”,
In Proceedings of the International C3onference on Computational Linguistics
(COLING), 2008.
[5] K. Dave, S. Lawrence, and D. Pennock., “Mining the peanut gallery: Opinion
extraction and semantic classification of product reviews”, pp. 519-528, 2003.
[6] N. Godbole, M. Srinivasaiah, and S. Skiena., “Large-scale sentiment anal-ysis for news
and blogs,” 2007.
[10] A. Andreevskaia, S. Bergler, and M. Urseanu, “All Blogs Are Not Made Equal:
Exploring Genre Differences in Sentiment Tagging of Blogs”, In International
Conference on Weblogs and Social Media (ICWSM-2007), Boulder, CO, 2007.
[11] P. Turney, and M. Littman., “Measuring Praise and Criticism: Inference of Semantic
Orientation from Association”, ACM Transactions on Information Systems.
APPENDIX
import re
import tweepy
from tweepy import OAuthHandler
from textblob import TextBlob
class TwitterClient(object):
'''
Generic Twitter Class for sentiment analysis.
'''
def __init__(self):
'''
Class constructor or initialization method.
'''
# keys and tokens from the Twitter Dev Console
consumer_key = '9ODKKNtTFpvf3Sdf7JxuWMxiJ'
consumer_secret = 'nhIjH1RpNRRnIvLrp5BPmt9Sw26DbeYcg0h6vRnfxkLTsv571V'
access_token = '794095176427913216-mMiuQhWxaMVysL79czYW0tchubNCeYp'
access_token_secret = 'wzt6479cP7gSoGOCjpLriDZyv6k57u91S46VFLfFi0LAL'
# attempt authentication
try:
# create OAuthHandler object
self.auth = OAuthHandler(consumer_key, consumer_secret)
# set access token and secret
self.auth.set_access_token(access_token, access_token_secret)
# create tweepy API object to fetch tweets
self.api = tweepy.API(self.auth)
except:
print("Error: Authentication Failed")
try:
# call twitter api to fetch tweets
fetched_tweets = self.api.search(q = query, count = count)
except tweepy.TweepError as e:
# print error (if any)
print("Error : " + str(e))
def main():
# creating object of TwitterClient Class
api = TwitterClient()
# calling function to get tweets
tweets = api.get_tweets(query = 'Donald Trump', count = 200)
if __name__ == "__main__":
# calling main function
main()