Assignment No 4 - KNN Twitter

This document summarizes sentiment analysis and how it works using a KNN algorithm. Sentiment analysis detects positive and negative sentiment in text. It can analyze customer feedback to understand satisfaction. KNN is an algorithm that classifies new data based on similarity to existing classified data. It works by selecting K neighbors, calculating distances, and assigning the new data to the category of its K closest neighbors. The document implements KNN for sentiment analysis on tweets, preprocessing data, extracting features, performing KNN classification, and evaluating performance with metrics like accuracy and F1 score.

Uploaded by

Vaishnavi Gurav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views3 pages

Assignment No 4 - KNN Twitter

Uploaded by

Vaishnavi Gurav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

Honours* in Data Science #Fourth year of Engineering (Semester VII) #410502:

Machine Learning and Data Science Laboratory

Dr. Girija Gireesh Chiddarwar
Assignment No 4 - Text classification for Sentimental analysis using KNN Note: Use
twitter data

Sentiment Analysis
It is the process of detecting positive or negative sentiment in text.
It’s often used by businesses to detect sentiment in social data, gauge brand
reputation, and understand customers.
Sentiment analysis models focus on polarity (positive, negative, neutral) but also
on feelings and emotions (angry, happy, sad, etc), urgency (urgent, not urgent) and
even intentions (interested v. not interested).
Depending on how you want to interpret customer feedback and queries, you can
define and tailor your categories to meet your sentiment analysis needs.
Automatically analyzing customer feedback, such as opinions in survey responses and
social media conversations, allows brands to learn what makes customers happy or
frustrated, so that they can tailor products and services to meet their customers’
needs.
For example, using sentiment analysis to automatically analyze 4,000+ reviews about
your product could help you discover if customers are happy about your pricing
plans and customer service.

It’s estimated that 90% of the world’s data is unstructured, in other words it’s
unorganized. Huge volumes of unstructured business data are created every day:
emails, support tickets, chats, social media conversations, surveys, articles,
documents, etc).
Sentiment Analysis

How Does Sentiment Analysis Work?

Rule-based: these systems automatically perform sentiment analysis based on a set
of manually crafted rules.
Automatic: systems rely on machine learning techniques to learn from data.
Hybrid systems combine both rule-based and automatic approaches.

Automatic Approaches
Automatic methods, contrary to rule-based systems, don't rely on manually crafted
rules, but on machine learning techniques.
A sentiment analysis task is usually modeled as a classification problem, whereby a
classifier is fed a text and returns a category, e.g. positive, negative, or
neutral.

Working of Sentiment Analysis

K-Nearest Neighbor(KNN) Algorithm for Machine Learning

K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available
categories.
K-NN algorithm stores all the available data and classifies a new data point based
on the similarity. This means when new data appears then it can be easily
classified into a well suite category by using K- NN algorithm.
K-NN algorithm can be used for Regression as well as for Classification but mostly
it is used for the Classification problems.
K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
KNN algorithm at the training phase just stores the dataset and when it gets new
data, then it classifies that data into a category that is much similar to the new
data.

Working of K-NN
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each
category.
Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
Step-6: Our model is ready.

Implementation Algorithm
Storing the training and test datasets into their respective dataframes
Preprocessing
Parsing the stop_words.txt file and storing all the words in a list.
List of all special characters that are to be removed.
With training and testing data
Removing all stopwords from all the tweets.
Removing hyperlinks from all the tweets. They are not needed for classification.
Removing usernames from all the tweets.
Removing hashtags, including the text, from all the tweets. Hashtags are useless
since their words cannot be splitted with spaces.
Removing all special characters from all the tweets
Finding all the unique words in training and testing data's Tweet column
Feature Extraction
Training and testing Data: Extracting features and storing them into the training
feature matrix
Calculating distances between every test instance with all the train instances.
This returns a 2D distances vector.
K Nearest Neighbors & Performance Measures by plotting graphs
Making a general structure of our confusion matrix
Extracting values from the Frequency DataFrame and assigning to specific cells in
the confusion matrix.
Extracting all recalls from the matrix to measure macroaveraged F1_score,recall and
precision.

Performance Evaluation
confusion matrix

Accuracy -99.9%

F1 Score -F1 Score is needed when you want to seek a balance between Precision and
Recall.
Accuracy can be largely contributed by a large number of True Negatives which in
most business circumstances, we do not focus on much whereas False Negative and
False Positive usually has business costs (tangible & intangible)
Performance Evaluation

Assignment 5 - MLDS Lab
No ratings yet
Assignment 5 - MLDS Lab
4 pages
Sentiment Analysis of Twitter Data My
75% (4)
Sentiment Analysis of Twitter Data My
14 pages
4.0 Supervised Learning 4.1 Discuss Classification Model
No ratings yet
4.0 Supervised Learning 4.1 Discuss Classification Model
48 pages
Module Iii
No ratings yet
Module Iii
15 pages
Chapter Two
No ratings yet
Chapter Two
3 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
3 pages
PPPT
No ratings yet
PPPT
20 pages
ML-Unit 5
No ratings yet
ML-Unit 5
40 pages
Ijsrp p8252
No ratings yet
Ijsrp p8252
6 pages
Sentiment Analysis of Twitter Data: Radhi D. Desai
No ratings yet
Sentiment Analysis of Twitter Data: Radhi D. Desai
4 pages
Class Notes ML1
No ratings yet
Class Notes ML1
111 pages
Class Notes ML1
No ratings yet
Class Notes ML1
115 pages
ML Unit 5..
No ratings yet
ML Unit 5..
40 pages
Algorithm
No ratings yet
Algorithm
27 pages
Machine Learning With Advance Model
No ratings yet
Machine Learning With Advance Model
19 pages
K-Nearest Neighbor (KNN) 6
No ratings yet
K-Nearest Neighbor (KNN) 6
46 pages
KNN Algorithm for Car Classification
No ratings yet
KNN Algorithm for Car Classification
9 pages
Unit Iii PART B - 13 Marks 1. Explain Briefly About Text Classification. Introduction To Text Classification
No ratings yet
Unit Iii PART B - 13 Marks 1. Explain Briefly About Text Classification. Introduction To Text Classification
23 pages
Chapter II - Lecture 2 - KNN
No ratings yet
Chapter II - Lecture 2 - KNN
21 pages
SENTIMENT ANALYSIS From Text and Image
No ratings yet
SENTIMENT ANALYSIS From Text and Image
13 pages
DWDM PPT
No ratings yet
DWDM PPT
35 pages
Sentiment Classification System of Twitter Data For US Airline Service Analysis
No ratings yet
Sentiment Classification System of Twitter Data For US Airline Service Analysis
5 pages
CSL0777 L22
No ratings yet
CSL0777 L22
35 pages
Sentiment Analysis On Twitter Data
No ratings yet
Sentiment Analysis On Twitter Data
23 pages
Sentiment Analysis To Measure The Users Opinion by Using Machine Learning Techniques
No ratings yet
Sentiment Analysis To Measure The Users Opinion by Using Machine Learning Techniques
15 pages
Jadavpur University: Assignment Submission
No ratings yet
Jadavpur University: Assignment Submission
9 pages
Unit 3 - Supervise Learning Classification
No ratings yet
Unit 3 - Supervise Learning Classification
23 pages
ML Practical Kiranjot 6-10
No ratings yet
ML Practical Kiranjot 6-10
10 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
AAIML
No ratings yet
AAIML
10 pages
MIDS Lab Theory
No ratings yet
MIDS Lab Theory
6 pages
KNN Algorithm: Basics and Python Guide
No ratings yet
KNN Algorithm: Basics and Python Guide
17 pages
Sentiment Analysis of Twitter
No ratings yet
Sentiment Analysis of Twitter
26 pages
21bce3701 Senti K9ar
No ratings yet
21bce3701 Senti K9ar
28 pages
Business Sentiment Analysis Guide
No ratings yet
Business Sentiment Analysis Guide
6 pages
121a1114 D2 Sma Exp3
No ratings yet
121a1114 D2 Sma Exp3
9 pages
DS - Lab Report.
No ratings yet
DS - Lab Report.
25 pages
Thesis - Aru Omarali
No ratings yet
Thesis - Aru Omarali
34 pages
Unit 5
No ratings yet
Unit 5
28 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
18 pages
Intro to Machine Learning Concepts
No ratings yet
Intro to Machine Learning Concepts
70 pages
Sentiment Analysis of Talaash Movie Reviews Using Text Mining Approach
No ratings yet
Sentiment Analysis of Talaash Movie Reviews Using Text Mining Approach
9 pages
Web Mining Unit 2
No ratings yet
Web Mining Unit 2
12 pages
ML 11
No ratings yet
ML 11
13 pages
AI Unit 5 Part1
No ratings yet
AI Unit 5 Part1
6 pages
AIML8P
No ratings yet
AIML8P
23 pages
Sentiment Analysis On IMDB Movie Comments and Twit
No ratings yet
Sentiment Analysis On IMDB Movie Comments and Twit
8 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
130 pages
KNN Algorithm Guide for Students
No ratings yet
KNN Algorithm Guide for Students
7 pages
Sentiment Analysis for Data Scientists
No ratings yet
Sentiment Analysis for Data Scientists
22 pages
System For Sentiment Analysis of Big Text Data
No ratings yet
System For Sentiment Analysis of Big Text Data
4 pages
Sentiment Analysis Final Documentation Report
50% (2)
Sentiment Analysis Final Documentation Report
21 pages
Types of Data Represented As Strings
No ratings yet
Types of Data Represented As Strings
2 pages
ML Practical Lovepreet 6-10
No ratings yet
ML Practical Lovepreet 6-10
10 pages
MP 1
No ratings yet
MP 1
14 pages
Machine Lar Arii
No ratings yet
Machine Lar Arii
9 pages
Sentiment Analysis for Tweets
No ratings yet
Sentiment Analysis for Tweets
11 pages
Untitled 9
No ratings yet
Untitled 9
17 pages
Bayesian Networks for CCAT Prep
No ratings yet
Bayesian Networks for CCAT Prep
14 pages
Assignment No 6 - Polarity
No ratings yet
Assignment No 6 - Polarity
2 pages
Assignment No 3 - Hebb
No ratings yet
Assignment No 3 - Hebb
2 pages
Assignment No 5 K-Means Clustering
No ratings yet
Assignment No 5 K-Means Clustering
2 pages
Assigment No 1 - Ann
No ratings yet
Assigment No 1 - Ann
3 pages
Assignment No 2 - OCR CNN
No ratings yet
Assignment No 2 - OCR CNN
2 pages
Puretec: Compressed Air Filtration
No ratings yet
Puretec: Compressed Air Filtration
6 pages
WST 2010 240
No ratings yet
WST 2010 240
10 pages
CH 5 Sedimentation 2024
No ratings yet
CH 5 Sedimentation 2024
8 pages
Cloud Solutions for Developers
No ratings yet
Cloud Solutions for Developers
27 pages
05-Active Directory Rights Management Services
No ratings yet
05-Active Directory Rights Management Services
21 pages
Nursing Research CP 1
No ratings yet
Nursing Research CP 1
7 pages
Thin Cylinders Gate Notes 231686837008983
No ratings yet
Thin Cylinders Gate Notes 231686837008983
6 pages
Genericsking Trading Inc Generics Medicines Wholesale - New
No ratings yet
Genericsking Trading Inc Generics Medicines Wholesale - New
10 pages
A Critical Study On Campus Energy Monitoring System and Role of IoT
No ratings yet
A Critical Study On Campus Energy Monitoring System and Role of IoT
6 pages
Philippine History
No ratings yet
Philippine History
8 pages
Jones K Ihicertificate
No ratings yet
Jones K Ihicertificate
1 page
Hyperglycemic Crises in Adults With Diabetes, Cons - 250116 - 080146
No ratings yet
Hyperglycemic Crises in Adults With Diabetes, Cons - 250116 - 080146
19 pages
Math Intervention Plandocx Grade 3 and 4
No ratings yet
Math Intervention Plandocx Grade 3 and 4
4 pages
Airplane Stress Analysis Report
No ratings yet
Airplane Stress Analysis Report
72 pages
Boosting Mexico's Retirement Savings
No ratings yet
Boosting Mexico's Retirement Savings
53 pages
Domino's Bangladesh Service Review
100% (1)
Domino's Bangladesh Service Review
5 pages
Job - Application Form - Pre-Qualification-Right To Play Pakistan
No ratings yet
Job - Application Form - Pre-Qualification-Right To Play Pakistan
18 pages
E Commerce Employees Responsibilities PDF
No ratings yet
E Commerce Employees Responsibilities PDF
126 pages
Mayor Gives State of The City: Canstruction Will Continue Mission To End Hunger
No ratings yet
Mayor Gives State of The City: Canstruction Will Continue Mission To End Hunger
20 pages
Brochure - DE NORA TETRA MDBF - 650-0320
No ratings yet
Brochure - DE NORA TETRA MDBF - 650-0320
4 pages
Navarette Et Al Proposal
No ratings yet
Navarette Et Al Proposal
35 pages
Safety Data Sheet: Product Name: MOBIL SHC 630
No ratings yet
Safety Data Sheet: Product Name: MOBIL SHC 630
10 pages
Backpropagation Tahun Data Aktual Hasil Peramalan MAD MSE
No ratings yet
Backpropagation Tahun Data Aktual Hasil Peramalan MAD MSE
72 pages
Maggi Company
No ratings yet
Maggi Company
14 pages
Airport (Changi)
No ratings yet
Airport (Changi)
25 pages
Maritime ICT & Cybersecurity Guide
No ratings yet
Maritime ICT & Cybersecurity Guide
66 pages
Pre-Trial Preparation and Participation in Trial Proceedings of Criminal Cases
No ratings yet
Pre-Trial Preparation and Participation in Trial Proceedings of Criminal Cases
7 pages
The Design of Things To Come PDF
100% (2)
The Design of Things To Come PDF
268 pages
Administrative Law Is The by
No ratings yet
Administrative Law Is The by
4 pages
Air Cleaner Elements Brochure LR
No ratings yet
Air Cleaner Elements Brochure LR
4 pages

Assignment No 4 - KNN Twitter

Uploaded by

Assignment No 4 - KNN Twitter

Uploaded by

Honours* in Data Science #Fourth year of Engineering (Semester VII) #410502:

Machine Learning and Data Science Laboratory

How Does Sentiment Analysis Work?

Working of Sentiment Analysis

K-Nearest Neighbor(KNN) Algorithm for Machine Learning

You might also like