0% found this document useful (0 votes)

11 views79 pages

Unit1 Extra

Uploaded by

samridhsinghrajput7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views79 pages

Unit1 Extra

Uploaded by

samridhsinghrajput7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 79

Supervise

d ML
&
Sentiment
Analysis
CONTENTS
● Review Supervised ML

● Build your own tweet classifier!

TABLE OF CONTENTS

Sentiment
Supervised ML
01 Training 02 analysis
Use of ML in SA

Feature Extraction Logistic regression

03 Extract Features 04 Apply LR for SA
01
SUPERVISED ML
Training
Supervised ML (training)
Paramete
rs
Cost
Feature Prediction Outpu Outpu
Function t vs
s t Label

Label
s
02
SENTIMENT
ANALYSIS
Use of ML in SA
Sentiment analysis
Positive:
Tweet I am happy because I am 1
learning NLP
Negative:
:
0

Logistic
regression
Sentiment analysis

I am happy
because I Train
Classify Positive:
LR
am 1
learning
NLP
Summary
● Features, Trai Predic
Labels n t
● Extract Train Predict
features LR sentiment
03
FEATURE EXTRACTION
Extract Features
Outline
● Vocabulary

● Feature extraction

● Sparse representations and some of their

issues
Vocabulary
I am happy because I am
Tweets: learning
...N..
[tweet_1, tweet_2, ..., ...
.
tweet_m] LP
I hated the
movie

[ I, happ becaus learnin NLP ... the movie

am, y, e, g, , hated, , ]
Feature extraction
I am happy because I am learning
NLP
NLP, . hate the movie
[ I , am, happy, because, .. d, , ]
learning,
[ 1, 1 1, 1, 1, ... 0 0
1, , 0, , ]

A lot of zeros! That’s a sparse

representation.
Problems with sparse representations

I am happy because I am
learning NLP All
zeros!
[ 1 , 1 , 1 , 1 , 1 , 1 , ... , 0 , … ,
0,0,0]
1
1. Large training
time
2. Large prediction
time
Summary

● Vocabulary: set of unique words

● Vocabulary, Text [1 ….. 0 ….. 1 .. 0 .. 1 .. 0]

● Sparse representations are problematic for training and

prediction times
Negative and
Positive
Frequencies
Outline
● Populate your vocabulary with a frequency count for
each class
Positive and
negative counts Vocabulary
I
Corpu am
s
I am happy because I am happy
learning NLP because
I am happy learning

I am sad, I am not learning NLP

sad
NLP I am sad
not
Positive and
negative counts
Negative
I am happy because I am I am sad, I am tweets
not learning NLP
learning NLP
I am sad
I am happy
Positive and negative counts
Vocabular PosFreq
y (1) 3
I am happy because I am learning NLP aI 3
m
happ 2
y
becaus 1
I am happy
e 1
learnin 1
g NLP 0
sad 0
not
Positive and negative counts
Vocabulary NegFreq
(0) 3 Negative
I 3 tweets
I am sad, I am not learning NLP
am 0
I am sad
happy 0
because 1
learning 1
NLP 2
sad 1
not 1
Word frequency in classes

Vocabulary PosFreq (1) NegFreq (0)

I 3 3
am 3 3
freqs: dictionary mapping
happy 2 0
from
because 1 0 (word, class) to frequency
learning 1 1
NLP 1 1
sad 0 2
not 0 1
Summary
● Divide tweet corpus into two classes: positive and
negative

● Count each time each word appears in either class

➔ Feature extraction for training and prediction!

Feature
extraction
with
frequencies
Outline
● Extract features from your frequencies dictionary to create
a features
vector
Word frequency in classes
Vocabulary PosFreq (1) NegFreq (0)
I 3 3
am 3 3
freqs: dictionary mapping
happy 2 0
from
because 1 0 (word, class) to frequency
learning 1 1
NLP 1 1
sad 0 2
not 0 1
Feature extraction

freqs: dictionary mapping from (word, class) to

frequency

Features Sum Pos. Sum

of Bia Frequencie Neg.
tweet m s s Frequenci
es
Feature extraction
Vocabulary PosFreq (1) I am sad, I am not learning
I 3 NLP
am 3
happy 20
because 10
learning 1
NLP 1
8
sad 0
not 0
Feature extraction
Vocabulary NegFreq (0) I am sad, I am not learning
I 3 NLP
am 3
happy 0
because 0
learning 1
NLP 1
sad 2 11
not 1
Feature
extraction I am sad, I am not learning
NLP
Summary

● Dictionary mapping (word,class) to

frequencies

➔ Cleaning unimportant information from your

tweets
Preprocessin
g
Outline
● Removing stopwords, punctuation, handles
and URLs

● Stemming

● Lowercasing
Preprocessing: stop words and
punctuation
@Class and @Bishal are Stop words
tuning a GREAT AI model at and Punctuation
https://geeksforgeeks.com!!! is ,
are .
at :
has !
for “
a ‘
Preprocessing: stop words and
punctuation
@Class and @Bishal are Stop words
tuning a GREAT AI model at and Punctuation
https://geeksforgeeks.com!!! is ,
are .
at :
has !
@Class @Bishal tuning
for “
GREAT AI model a ‘
https://geeksforgeeks.com!!!
Preprocessing: stop words and
punctuation
@Class @Bishal tuning Stop words
GREAT AI model and Punctuation
https://geeksforgeeks.com!!! is ,
are .
at :
has !
@Class @Bishal tuning
for “
GREAT AI model a ‘
https://geeksforgeeks.com
Preprocessing: Handles and URLs

@Class @Bishal tuning GREAT AI

model
https://geeksforgeeks.com

tuning GREAT AI
model
Preprocessing: Stemming and
lowercasing
tuning GREAT AI
model GREAT

Great

tune great great

tun
Preprocessed
tune
tweet:
d [tun, great,ai,
tunin model]
g
Summary
● Stop words, punctuation, handles and
URLs

● Stemming

● Lowercasing
● Less unnecessary Better
info times
Putting it all
together
Outline
● Generalize the
process

● How to code it!

General overview
I am Happy Because i am learning NLP
@deeplearning Preprocessi
ng

[happy, learn,
Featurenlp]
Extraction
Bia [1, 4, 2] Sum
s negative
frequencies
Sum positive
frequencies
General overview

I am Happy Because i am

learning NLP [happy, learn, nlp] [[1, 40, 20],

@geeksforgeeks

[1, 20, 50],

I am sad not learning NLP [sad, not, learn,nlp]
...
... ... [1, 5, 35]]

I am sad :( [sad]
General overview

[[1, 40, 20],

[1, 20, 50],

...
[1, 5, 35]]
General Implementation

freqs = build_freqs(tweets,labels) #Build frequencies dictionary

X = np.zeros((m,3)) #Initialize matrix X

for i in range(m): #For every tweet

p_tweet = process_tweet(tweets[i]) #Process tweet X[i,:]

= extract_features(p_tweet,freqs) #Extract Features

Summary
● Implement the feature extraction algorithm for your
entire set of
tweets
● Almost ready to train!
04
LOGISTIC
REGRESSION
Plugin feature vector to the LR model
Outline
● Supervised learning and logistic
regression

● Sigmoid function
Overview of logistic regression

Paramete
rs
Cost
Feature F Outpu Outpu
t vs
s Sigmoid t Label

Label
s
Overview of logistic regression
Overview of logistic regression
Overview of logistic regression
Overview of logistic regression

@Class and @Bishal are 4.92

tuning a GREAT AI model
at
https://geeksforgeeks.com
[tun,
!!! ai, great,
model]
Summary
● Sigmoid
function
● ,
positive
● ,
negativ
e
Logistic
Regression:
Training
Outline
● Review the steps in the training
process

● Overview of gradient descent

Training
LR

Cos
t
Iteration
Training LR
Initialize
parameters

Classify/predict

Get gradient
U
n
t
i
l

g
o
o
d
e
Training LR
Initialize
parameters

Classify/predict

Get gradient
U
n
t
i
l

g
o
o
d
e
Summary
● Visualize how gradient descent works
● Use gradient descent to train your logistic regression
classifier

➔ Compute the accuracy of your model

Logistic
Regression:
Testing
Outline
● Using your validation set to compute model
accuracy

● What the accuracy metric means

Testing logistic regression
●

,
Testing logistic regression
●

,
Testing logistic regression

,
Testing logistic regression
Summary
● Performance on unseen
data
● Accurac
y

To improve model: step size, number of iterations, regularization, new

features, etc.
Logistic
Regression:
Cost Function
Outline
● Overview of the logistic cost function, AKA the binary cross-
entropy
function
Cost function for logistic
regression
Cost function for logistic
regression

For, y = 1; For, y = 0 ;
positive class negative class
Cost function for logistic
regression

0 any 0
1 0.99 ~0
1 ~0 -inf
Cost function for logistic
regression

1 any 0
0 0.01 ~0
0 ~1 -inf
Cost function for logistic
regression
Cost function for logistic
regression
Summary
● Strong disagreement =
high cost

● Strong agreement = low

cost

● Aim for the lowest cost!

ADDITIONAL
INFORMATION
Tokenizing: Breaks text into meaningful units (tokens).
Example: "This is an example" → ["This", "is", "an", "example"]

Encoding: Converts tokens into numerical representations (like integer IDs).

Example: ["This", "is", "an", "example"] → [1, 2, 3, 4] (where each word is mapped to a unique
integer).

Embedding: Maps encoded tokens into continuous vector spaces that capture
semantic meanings.
Example: [1, 2, 3, 4] → [ [0.1, 0.2, ...], [0.3, 0.1, ...], ... ] (each integer is mapped to a vector of
real numbers).
THANK YOU
!!!

Supervised ML and Sentiment Analysis: Deeplearning - Ai
No ratings yet
Supervised ML and Sentiment Analysis: Deeplearning - Ai
69 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
70 pages
Week 1
No ratings yet
Week 1
104 pages
NLP Essentials
No ratings yet
NLP Essentials
22 pages
C1 W1 Assignment
No ratings yet
C1 W1 Assignment
14 pages
C1 W1 Assignment
No ratings yet
C1 W1 Assignment
16 pages
C1 W1 Assignment
No ratings yet
C1 W1 Assignment
16 pages
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
No ratings yet
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
15 pages
Naive Bayes Sentiment Tutorial
No ratings yet
Naive Bayes Sentiment Tutorial
17 pages
PPPT
No ratings yet
PPPT
20 pages
Natural Language Processing
No ratings yet
Natural Language Processing
49 pages
Efficient NLP Classifiers for SMEs
No ratings yet
Efficient NLP Classifiers for SMEs
14 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
74 pages
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
No ratings yet
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
16 pages
Wa0002
No ratings yet
Wa0002
21 pages
Statistical Learning and Text Classification With NLTK and Scikit-Learn
No ratings yet
Statistical Learning and Text Classification With NLTK and Scikit-Learn
24 pages
NLP and ML Project
100% (1)
NLP and ML Project
37 pages
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
No ratings yet
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
9 pages
ML Projrct Article 2
No ratings yet
ML Projrct Article 2
6 pages
Qta Lse Day4 PDF
No ratings yet
Qta Lse Day4 PDF
59 pages
Sentiment Analysis Using Machine Learning Algorithms
No ratings yet
Sentiment Analysis Using Machine Learning Algorithms
23 pages
Naive Bayes Sentiment Analysis Guide
No ratings yet
Naive Bayes Sentiment Analysis Guide
18 pages
Document Dsbda Codes For Mini Project
No ratings yet
Document Dsbda Codes For Mini Project
9 pages
3 Classification 1
No ratings yet
3 Classification 1
55 pages
Unstructured Text Classification Guide
No ratings yet
Unstructured Text Classification Guide
37 pages
Text Classification
No ratings yet
Text Classification
60 pages
Stemming vs. Lemmatization in NLP
No ratings yet
Stemming vs. Lemmatization in NLP
66 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
Neural Language Models & Classifiers Guide
No ratings yet
Neural Language Models & Classifiers Guide
7 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
4 Classification 3
No ratings yet
4 Classification 3
59 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
4 Classification 2
No ratings yet
4 Classification 2
55 pages
Natural Language Processing-Section
No ratings yet
Natural Language Processing-Section
38 pages
Methodology
No ratings yet
Methodology
9 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
13 pages
Sentiment Analysis On User-Generated Tweets
No ratings yet
Sentiment Analysis On User-Generated Tweets
15 pages
Blue Doodle Project Presentation
No ratings yet
Blue Doodle Project Presentation
15 pages
IR - Group1
No ratings yet
IR - Group1
27 pages
Lab Report - CSE 816
No ratings yet
Lab Report - CSE 816
17 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
Al Phase3
No ratings yet
Al Phase3
9 pages
Malignant Comment Classifier Guide
No ratings yet
Malignant Comment Classifier Guide
30 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
Top 10 NLP Question - Answer
No ratings yet
Top 10 NLP Question - Answer
16 pages
Advanced NLP: Word Representation Techniques
No ratings yet
Advanced NLP: Word Representation Techniques
59 pages
Sentiment Analysis Final Documentation Report
50% (2)
Sentiment Analysis Final Documentation Report
21 pages
NLP Transformer-Based Models Used For Sentiment Analysis: 1. BERT
No ratings yet
NLP Transformer-Based Models Used For Sentiment Analysis: 1. BERT
98 pages
Nn4nlp 02 LM
No ratings yet
Nn4nlp 02 LM
47 pages
MLA TAB Lecture2
No ratings yet
MLA TAB Lecture2
84 pages
Anlp 02 Wordrep Textclass
No ratings yet
Anlp 02 Wordrep Textclass
58 pages
Logistic Regression for Sentiment Analysis
No ratings yet
Logistic Regression for Sentiment Analysis
7 pages
NLP For ML - Spam Classifier
No ratings yet
NLP For ML - Spam Classifier
14 pages
Unit 5
No ratings yet
Unit 5
8 pages
Group 4 MovieReview
No ratings yet
Group 4 MovieReview
10 pages
Classification
No ratings yet
Classification
81 pages
Upper Level SSAT 1500 Practice Problems Answer Explanations PDF
50% (2)
Upper Level SSAT 1500 Practice Problems Answer Explanations PDF
77 pages
Course - Outline - PHY 129 - CSE - L1T1
No ratings yet
Course - Outline - PHY 129 - CSE - L1T1
3 pages
Phimo Set 3
No ratings yet
Phimo Set 3
20 pages
STT843 HW2 Solution YiChen
No ratings yet
STT843 HW2 Solution YiChen
24 pages
Actuarial Exam MLC Formula Guide
No ratings yet
Actuarial Exam MLC Formula Guide
34 pages
Descriptive Writing Lesson Plan
No ratings yet
Descriptive Writing Lesson Plan
18 pages
ASG Sir Complex (Moderate)
No ratings yet
ASG Sir Complex (Moderate)
8 pages
Sample-Questions BSDS BSA-2024
No ratings yet
Sample-Questions BSDS BSA-2024
8 pages
Intro to Data Statistics
No ratings yet
Intro to Data Statistics
9 pages
SBST3203 Elementary Data Analysis MAY 2020: Name: Arif Soebah Id No: 830811125679001 Phone Number: 013-8880791 Email
No ratings yet
SBST3203 Elementary Data Analysis MAY 2020: Name: Arif Soebah Id No: 830811125679001 Phone Number: 013-8880791 Email
9 pages
1 - World Population and Demographics (An Introductory Overview)
50% (2)
1 - World Population and Demographics (An Introductory Overview)
77 pages
Estimation of The Ship Resistance & Engine Selection
50% (2)
Estimation of The Ship Resistance & Engine Selection
16 pages
GRE Prep Guide for Aspiring Grads
No ratings yet
GRE Prep Guide for Aspiring Grads
63 pages
1601-Article Text-8692-1-10-20230728
No ratings yet
1601-Article Text-8692-1-10-20230728
15 pages
7 Math PDF
100% (3)
7 Math PDF
162 pages
S2-Ch2 Curve Setting
No ratings yet
S2-Ch2 Curve Setting
23 pages
Dynamics of Flight
100% (2)
Dynamics of Flight
395 pages
MATH2581 2025 Algebra II Exam
No ratings yet
MATH2581 2025 Algebra II Exam
3 pages
Real World Drag Coefficient Analysis
No ratings yet
Real World Drag Coefficient Analysis
17 pages
Homework 8
No ratings yet
Homework 8
13 pages
Time Series Theory
No ratings yet
Time Series Theory
8 pages
Project Report
No ratings yet
Project Report
38 pages
Name: Teacher: Date: Score:: Solve The Missing Elements For Each Problem. Use 3.14 For - Area R C D
No ratings yet
Name: Teacher: Date: Score:: Solve The Missing Elements For Each Problem. Use 3.14 For - Area R C D
2 pages
DSD Module 5
No ratings yet
DSD Module 5
20 pages
Spreadsheet Modeling & Decision Analysis: A Practical Introduction To Business Analytics 8Th Edition by Cliff Ragsdale (Ebook PDF) PDF Download
100% (5)
Spreadsheet Modeling & Decision Analysis: A Practical Introduction To Business Analytics 8Th Edition by Cliff Ragsdale (Ebook PDF) PDF Download
104 pages
11th Physics HM Success Series 2025
No ratings yet
11th Physics HM Success Series 2025
4 pages
Celestial Navigation Basics
100% (3)
Celestial Navigation Basics
27 pages
Inroads: Real Analysis Exchange Vol. 27 (2), 2001/2002, Pp. 783-794
No ratings yet
Inroads: Real Analysis Exchange Vol. 27 (2), 2001/2002, Pp. 783-794
12 pages
Graphing Polynomials and Zeros
No ratings yet
Graphing Polynomials and Zeros
4 pages
Coil Length and Pendulum Period Experiment
No ratings yet
Coil Length and Pendulum Period Experiment
18 pages

Unit1 Extra

Uploaded by

Unit1 Extra

Uploaded by

Supervise

● Build your own tweet classifier!

Feature Extraction Logistic regression

● Sparse representations and some of their

[ I, happ becaus learnin NLP ... the movie

A lot of zeros! That’s a sparse

● Vocabulary: set of unique words

● Vocabulary, Text [1 ….. 0 ….. 1 .. 0 .. 1 .. 0]

● Sparse representations are problematic for training and

I am sad, I am not learning NLP

Vocabulary PosFreq (1) NegFreq (0)

● Count each time each word appears in either class

➔ Feature extraction for training and prediction!

freqs: dictionary mapping from (word, class) to

Features Sum Pos. Sum

● Dictionary mapping (word,class) to

➔ Cleaning unimportant information from your

@Class @Bishal tuning GREAT AI

tune great great

● How to code it!

learning NLP [happy, learn, nlp] [[1, 40, 20],

[1, 20, 50],

[[1, 40, 20],

[1, 20, 50],

freqs = build_freqs(tweets,labels) #Build frequencies dictionary

X = np.zeros((m,3)) #Initialize matrix X

for i in range(m): #For every tweet

p_tweet = process_tweet(tweets[i]) #Process tweet X[i,:]

= extract_features(p_tweet,freqs) #Extract Features

@Class and @Bishal are 4.92

● Overview of gradient descent

➔ Compute the accuracy of your model

● What the accuracy metric means

To improve model: step size, number of iterations, regularization, new

● Strong agreement = low

● Aim for the lowest cost!

Encoding: Converts tokens into numerical representations (like integer IDs).

You might also like