0% found this document useful (0 votes)
47 views2 pages

Airline Tweets Classification Using Naive Bayes Classifier

This document summarizes a study that used a Naive Bayes classifier to analyze sentiment in airline tweets. The study collected tweets about airline services and classified them as positive, negative, or neutral. It found most tweets were negative (62.6%). The study used the Naive Bayes algorithm to assign probabilities to words and classify tweets based on the probabilities of sentiment categories. It achieved 82% accuracy, faster than a benchmark program using NLTK libraries. Further improving the tokenization and balancing the dataset could increase the accuracy of the Naive Bayes classifier for sentiment analysis of airline tweets.

Uploaded by

Kaninsan Joshua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views2 pages

Airline Tweets Classification Using Naive Bayes Classifier

This document summarizes a study that used a Naive Bayes classifier to analyze sentiment in airline tweets. The study collected tweets about airline services and classified them as positive, negative, or neutral. It found most tweets were negative (62.6%). The study used the Naive Bayes algorithm to assign probabilities to words and classify tweets based on the probabilities of sentiment categories. It achieved 82% accuracy, faster than a benchmark program using NLTK libraries. Further improving the tokenization and balancing the dataset could increase the accuracy of the Naive Bayes classifier for sentiment analysis of airline tweets.

Uploaded by

Kaninsan Joshua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Airline Tweets Classification Using Naive Bayes Classifier

Kaninson Joshua R, Sri Haran K, Sudalai Muthu Selva Kumar S.S, Dr. D. Jemi Florinabel *
Dr.Sivanthi Aditanar College of Engineering,Tiruchendur
I)Abstract: hence it can give results with better accuracy of
Every second, Our modern world classification.
produces terabytes of data. In this project, we
have developed a program to classify the
III) Methodology
reviews about a particular service based on 1.Dataset Collect
their customers' tweets. Our project would help We use the Kaggle dataset for this
them to analyze the overall thoughts about that project. we used the columns text and
particular service. Moreover, they can use this airline_sentiment to classify the tweets
to improve the quality of the services. Percentage of each class of tweets:
Furthermore, extending this project to a wide 1. positive-19.2
range of products and services resulting in a 2. neutral-21.2
better understanding between the service 3. negative-62.6
providers and the customers. On Twitter, the
customer of airline services can tweet their
opinions about their traveled experiences in
flight. So, Twitter contains a massive amount Fig:
of data and information regarding airline 1.0 :
services. These tweets are collected and
explore the sentiments about the airline
services to track customer satisfaction reports.
This project aims to analyze the twitter airline Dataset Analysis
dataset for finding the overall Positive, 2.Bayes Algorithm:
Negative, and Neural tweets using the Naive The below-given formula can explain
Bayes algorithm. Bayes Algorithm
p(A|B) =p(B|A) p(A)/p(B)
II)Introduction 1. p(A), p(B) -are the probabilities of
The simple solution would be to set a observing A and B respectively
number that humans consider positive and 2. (A|B)- the likelihood of event A
negative and let the program count the number occurring given that B is true.
of positive and negative words in each tweet.
The drawback would be that a word's set is 3.Naive Bayes Classifier Algorithm:
Naïve Bayes Classifier is one of the simple
limited and may not work in the particular
and most effective Classification algorithms,
domain, and that may omit some emotionless
which is the fast machine learning model that
keywords. Also, the accuracy of the algorithm
can make quick predictions. Naïve is because it
would be very low. However, according to the
assumes that the occurrence of a particular
domain, training the Naive Bayes is possible;
feature is independent of the occurrence of
hence, it performs better than this algorithm.
other features. Bayes that is because they
Naive Bayes gives weightage for each word;
depend on Bayes Algorithm
4.Working: Fig 1.2 : Result Analysis
Naive Bayes Classifier is for Text 1. The program with library function had an
classification. First, the algorithm Finds the accuracy of 82.0% of accuracy and took 8
total number of positive, negative, and neutral minutes.
2. The proposed program has 2.5% less
tweets. Then it finds the prior probability by
accuracy and is 320X faster than the
dividing the number of each class of tweets by benchmarking program.
total tweets. Then it separates every tweet into
tokens. Then find the number of occurrences of IX)Conclusion:
these tokens in positive, negative, and neutral The program can be implemented with a
tweets. The testing involves the following better tokenizing algorithm to increase its
steps. First, split each word in the tweets into accuracy by a bit. If we train the classifier with
tokens. Then Find the probability of being a balanced data set or use some data balancing
positive for each token and multiply it to prior algorithm to balance the dataset, the accuracy
probability. Then Find the probability of being will increase.
negative for each token and multiply it to prior
X) References:
probability. Then Find the probability of being
1. https://www.hindawi.com/journals/misy/
negative for each token and multiply it to prior 2019/1790429/
probability and compare the probability of each 2. Berrar, D., 2018. Bayes' theorem and naive
class of sentiment and return the sentiment Bayes classifier. Encyclopedia of
Bioinformatics and Computational
with the highest probability. Biology: ABC of Bioinformatics; Elsevier
VIII)Performance Measures: Science Publisher: Amsterdam, The
1. Sample Input/Output: Netherlands, pp.403-412.
3. Yang, Feng-Jen. "An implementation of
naive Bayes classifier." In 2018
International Conference on Computational
Science and Computational Intelligence
(CSCI), pp. 301-306. IEEE, 2018.
4. Keogh, Eamonn. "Naive Bayes classifier."
Accessed: Nov 5 (2006): 2017.

Fig 1.1:Sample Input/Output


Comparing the program with the other program
written using nltk library for benchmarking the
performance. The output of the program with
library function is as follows:

You might also like