0% found this document useful (0 votes)

12 views17 pages

Naive Bayes Sentiment Tutorial

Uploaded by

22ee143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views17 pages

Naive Bayes Sentiment Tutorial

Uploaded by

22ee143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Naive Bayes Sentiment Analysis (Twitter) - Step-by-Step

Teaching Guide
A clear, beginner-friendly walkthrough with toy example, formulas, code, common pitfalls and improvement
ideas.

Prepared for: You (the teacher) — use this as a PDF handout or lecture notes

Author: Generated by ChatGPT (assisted)

Date: 2025-08-11
Table of Contents
1. 1. Project overview and goal
2. 2. The twitter_samples dataset
3. 3. Train/test splitting (and shuffle)
4. 4. Preprocessing (process_tweet) — full walkthrough
5. 5. Building frequency dictionary (count_tweets)
6. 6. Helper functions for frequencies
7. 7. Training Naive Bayes (train_naive_bayes) — step-by-step
8. 8. Log prior and Log likelihood — formulas & code mapping
9. 9. Predicting sentiment (naive_bayes_predict)
10. 10. Testing & evaluation (test_naive_bayes + metrics)
11. 11. Toy example — full numeric run
12. 12. Edge cases, limitations & improvements
13. 13. Interview Q&A; (expected questions + answers)
14. 14. Teaching tips & demo ideas
15. 15. Appendix: Full code (cleaned)
16. 16. References
1. Project overview and goal
Goal: Build a simple classifier that reads a tweet and predicts whether its sentiment is positive or negative.
We use the Naive Bayes algorithm because it is fast, interpretable, and effective for text classification.

2. The twitter_samples dataset

What it is:
NLTK provides a small labeled dataset called twitter_samples with files like 'positive_tweets.json' and
'negative_tweets.json'. Each contains ~5,000 tweets already labeled by sentiment. These are perfect for
learning and demos.

3. Train/test splitting (and shuffle)

Why split?
We split data into training and testing so the model learns on one set and is evaluated on unseen data.
Important: always shuffle before splitting to avoid ordering bias (tweets in dataset may be grouped).
Example (python):
all_pos = twitter_samples.strings('positive_tweets.json')
all_neg = twitter_samples.strings('negative_tweets.json')

# shuffle before slicing

import random
random.seed(42)
random.shuffle(all_pos)
random.shuffle(all_neg)

train_pos = all_pos[:4000]
test_pos = all_pos[4000:]
train_neg = all_neg[:4000]
test_neg = all_neg[4000:]
train_x = train_pos + train_neg
test_x = test_pos + test_neg
train_y = np.append(np.ones(len(train_pos)), np.zeros(len(train_neg)))
test_y = np.append(np.ones(len(test_pos)), np.zeros(len(test_neg)))
4. Preprocessing (process_tweet) — full walkthrough
Purpose:
Tweets are noisy. Preprocessing normalizes text so our model can focus on meaningful tokens. Steps
include removing links, mentions, hashtags (or the '#'), lowercasing, tokenizing, removing stopwords, and
stemming.
Detailed step explanations:
- Remove stock tickers: Regex r'\$\w*' removes tokens like $TSLA which add noise.
- Remove RT: Regex r'^RT[\s]+' removes old-style 'RT' markers.
- Remove hyperlinks: Regex r'https?:\/\/.*[\r\n]*' removes URLs.
- Hashtag handling: We remove the '#' but keep the word (e.g., #happy → happy).
- Tokenize: Use TweetTokenizer to keep emoticons and contractions sensible.
- Stopwords: Remove common words like 'the', 'is' which carry little sentiment.
- Punctuation: Remove punctuation tokens.
- Stemming: Reduce words to root form (PorterStemmer).
process_tweet example (python):
def process_tweet(tweet):
stemmer = PorterStemmer()
stopwords_english = stopwords.words('english')
tweet = str(tweet)
tweet = re.sub(r'\$\w*','',tweet)
tweet = re.sub(r'^RT[\s]+','',tweet)
tweet = re.sub(r'https?:\/\/.*[\r\n]*','',tweet)
tweet = re.sub(r'#','',tweet)
tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True, reduce_len=True)
tweet_tokens = tokenizer.tokenize(tweet)
tweets_clean = []
for w in tweet_tokens:
if w not in stopwords_english and w not in string.punctuation:
tweets_clean.append(stemmer.stem(w))
return tweets_clean
5. Building the frequency dictionary (count_tweets)
We store counts for (word, label) pairs. Example key: ('happy', 1) -> value: number of times 'happy'
appeared in positive tweets.
def count_tweets(result, tweets, ys):
for y, tweet in zip(ys, tweets):
for word in process_tweet(tweet):
pair = (word, y)
result[pair] = result.get(pair, 0) + 1
return result
# After running: freqs = count_tweets({}, train_x, train_y)
# freqs might look like: {('happy',1):27, ('happy',0):3, ('sad',0):15, ...}

Why this matters: Naive Bayes uses these counts to estimate P(word|class).
6. Helper functions for frequencies
Short purpose:
They are small, reusable functions to fetch positive and negative counts for a word. Makes later code
simpler and cleaner.
def freq_pos_count(word, freqs):
return freqs.get((word,1), 0)

def freq_neg_count(word, freqs):

return freqs.get((word,0), 0)
7. Training the Naive Bayes model (train_naive_bayes)
High-level idea:
Compute how often each word appears in positive vs negative tweets; turn those into probabilities. Also
compute the prior (how common positive vs negative tweets are).
Key variables computed in training:
- vocab (V): Set of unique words seen in training
- N_pos, N_neg: Total number of word occurrences in positive/negative tweets
- D_pos, D_neg: Number of positive and negative documents (tweets)
- logprior: log(D_pos) - log(D_neg)
- loglikelihood[word]: log(P(word|pos)) - log(P(word|neg)) for each word
Training code (python):
def train_naive_bayes(freqs, train_x, train_y):
loglikelihood = {}
vocab = set([k[0] for k in freqs.keys()])
V = len(vocab)
N_pos = N_neg = 0
for pair in freqs.keys():
if pair[1] > 0:
N_pos += freqs[pair]
else:
N_neg += freqs[pair]
D_pos = len(train_y[train_y == 1])
D_neg = len(train_y[train_y == 0])
logprior = np.log(D_pos) - np.log(D_neg)
for word in vocab:
freq_pos = freq_pos_count(word, freqs)
freq_neg = freq_neg_count(word, freqs)
p_w_pos = (freq_pos + 1) / (N_pos + V) # Laplace smoothing
p_w_neg = (freq_neg + 1) / (N_neg + V)
loglikelihood[word] = np.log(p_w_pos) - np.log(p_w_neg)
return logprior, loglikelihood
8. Log prior & Log likelihood — formulas and relationship
Math (plain text):
- Bayes (ratio form): log P(Pos|tweet) - log P(Neg|tweet) = log P(Pos) - log P(Neg) + sum_i [ log
P(w_i|Pos) - log P(w_i|Neg) ]
- logprior = log(D_pos) - log(D_neg)
- P(w|Pos) = (count(w,Pos) + 1) / (N_pos + V) # Laplace smoothing
- loglikelihood[word] = log(P(w|Pos)) - log(P(w|Neg))
- Final score = logprior + sum_{words} loglikelihood[word]
Interpretation: logprior is the baseline. Each word's loglikelihood nudges the score toward positive (if
positive value) or negative (if negative value).
9. Predicting sentiment (naive_bayes_predict)
Algorithm (simple):
1) Clean tweet (process_tweet) 2) Start with p = logprior 3) For each word in cleaned tweet, if present, add
loglikelihood[word] 4) If p > 0 => Positive else Negative
def naive_bayes_predict(tweet, logprior, loglikelihood):
word_l = process_tweet(tweet)
p = logprior
for word in word_l:
if word in loglikelihood:
p += loglikelihood[word]
return p # sign determines class: >0 -> positive
10. Testing & evaluation
In code you loop over the test set, predict labels and compare to true labels. Use metrics:
- Accuracy: Correct predictions / total
- Confusion matrix: Counts of TP, FP, FN, TN
- Precision: TP / (TP + FP) - how many predicted positive were actually positive
- Recall: TP / (TP + FN) - how many actual positives were found
- F1-score: Harmonic mean of precision and recall
Example (python):
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix
y_hats = test_naive_bayes(test_x, test_y, logprior, loglikelihood)
print("Accuracy:", accuracy_score(test_y, y_hats))
print("Confusion matrix:\n", confusion_matrix(test_y, y_hats))
print("P/R/F1:", precision_recall_fscore_support(test_y, y_hats, average='binary'))
11. Toy example — full numeric run
Small dataset (3 pos, 3 neg) and step-by-step calculations:
Training tweets:
Pos: "I love this", "This is great", "I am happy"
Neg: "I hate this", "This is bad", "I am sad"

Vocabulary (unique words): i, love, this, is, great, am, happy, hate, bad, sad
V = 10

Positive word counts (sum): N_pos = 7

Negative word counts (sum): N_neg = 7

D_pos = 3, D_neg = 3
logprior = log(3) - log(3) = 0

Laplace smoothing: p(w|pos) = (count_pos(w)+1) / (N_pos + V)

Example for 'love':

count_pos('love') = 1, count_neg('love') = 0
p(love|pos) = (1+1)/(7 + 10) = 2/17
p(love|neg) = (0+1)/17 = 1/17
loglikelihood['love'] = log(2) ≈ +0.693

Predict tweet: "I love this"

score = logprior + loglikelihood['i'] + loglikelihood['love'] + loglikelihood['this']
Assuming loglikelihood['i'] = 0, loglikelihood['this'] = 0 (equal counts),
score ≈ 0 + 0 + 0.693 + 0 = 0.693 -> Positive
12. Edge cases, limitations & ways to improve
- Negation handling: Use n-grams or a small rule-based negation flip so 'not good' becomes a separate
feature.
- N-grams: Include bi-grams/trigrams to capture short phrases.
- TF-IDF: Use TF-IDF weighting instead of raw counts for less frequent but informative words.
- Word embeddings: Use pretrained Word2Vec/GloVe/BERT for semantic understanding.
- Model upgrade: Try Logistic Regression, SVM, or Transformer-based models for better accuracy.
- Cross-validation: Use k-fold cross-validation instead of single split to get robust performance estimates.
- More metrics: Check precision, recall, F1 and confusion matrix, not only accuracy.
- Handle OOV: Map unseen words to an 'UNK' token or expand training data.
- Class imbalance: Use oversampling, undersampling, or class weights if classes are unbalanced.
13. Interview Q&A; (expected questions + short answers)
Q: Why Naive Bayes for text? A: Simple, fast, works well with small text datasets and provides
interpretable word scores.
Q: Why remove stopwords? A: They add noise and little sentiment information.
Q: What is Laplace smoothing? A: Add 1 to counts so unseen words don't make probabilities zero.
Q: How to handle sarcasm? A: Hard for NB; use larger models (transformers) and context-aware
embeddings.
Q: How to improve accuracy? A: Use n-grams, TF-IDF, embeddings, more data, or stronger classifiers.
Q: Is independence assumption realistic? A: No, but NB still often works well in practice for text.
14. Teaching tips & demo ideas
- Start with the toy dataset: manually count words and compute one prediction by hand.
- Show preprocessing effects: compare raw tweet vs cleaned tokens.
- Visualize loglikelihood scores for top positive/negative words.
- Demonstrate errors (sarcasm, negation) to show limitations.
- Provide a live demo where students type tweets and see prediction results.
15. Appendix: Full cleaned code (copy-paste ready)
# --- Full minimal Naive Bayes pipeline (cleaned version) ---
import re, string
import numpy as np
from nltk.corpus import twitter_samples, stopwords
from nltk.tokenize import TweetTokenizer
from nltk.stem import PorterStemmer
import random

# 1. Load and shuffle

all_pos = twitter_samples.strings('positive_tweets.json')
all_neg = twitter_samples.strings('negative_tweets.json')
random.seed(42)
random.shuffle(all_pos); random.shuffle(all_neg)

train_pos = all_pos[:4000]; test_pos = all_pos[4000:]

train_neg = all_neg[:4000]; test_neg = all_neg[4000:]
train_x = train_pos + train_neg; test_x = test_pos + test_neg
train_y = np.append(np.ones(len(train_pos)), np.zeros(len(train_neg)))
test_y = np.append(np.ones(len(test_pos)), np.zeros(len(test_neg)))

# 2. Preprocess
def process_tweet(tweet):
stemmer = PorterStemmer()
stopwords_english = stopwords.words('english')
tweet = str(tweet)
tweet = re.sub(r'\$\w*', '', tweet)
tweet = re.sub(r'^RT[\s]+', '', tweet)
tweet = re.sub(r'https?:\/\/.*[\r\n]*', '', tweet)
tweet = re.sub(r'#', '', tweet)
tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True, reduce_len=True)
tweet_tokens = tokenizer.tokenize(tweet)
tweets_clean = []
for word in tweet_tokens:
if (word not in stopwords_english and word not in string.punctuation):
stem_word = stemmer.stem(word)
tweets_clean.append(stem_word)
return tweets_clean

# 3. Build freqs
def count_tweets(result, tweets, ys):
for y, tweet in zip(ys, tweets):
for word in process_tweet(tweet):
pair = (word, y)
result[pair] = result.get(pair,0) + 1
return result

freqs = count_tweets({}, train_x, train_y)

def freq_pos_count(word, freqs):

return freqs.get((word,1),0)

def freq_neg_count(word, freqs):

return freqs.get((word,0),0)

# 4. Train
def train_naive_bayes(freqs, train_x, train_y):
vocab = set([k[0] for k in freqs.keys()])
V = len(vocab)
N_pos = sum([v for (w,l),v in freqs.items() if l==1])
N_neg = sum([v for (w,l),v in freqs.items() if l==0])
D_pos = len(train_y[train_y==1])
D_neg = len(train_y[train_y==0])
logprior = np.log(D_pos) - np.log(D_neg)
loglikelihood = {}
for w in vocab:
freq_pos = freq_pos_count(w, freqs)
freq_neg = freq_neg_count(w, freqs)
p_w_pos = (freq_pos + 1) / (N_pos + V)
p_w_neg = (freq_neg + 1) / (N_neg + V)
loglikelihood[w] = np.log(p_w_pos) - np.log(p_w_neg)
return logprior, loglikelihood

logprior, loglikelihood = train_naive_bayes(freqs, train_x, train_y)

# 5. Predict
def naive_bayes_predict(tweet, logprior, loglikelihood):
word_l = process_tweet(tweet)
p = logprior
for word in word_l:
if word in loglikelihood:
p += loglikelihood[word]
return p

# 6. Test and metrics

from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix
y_hats = [1 if naive_bayes_predict(t, logprior, loglikelihood) > 0 else 0 for t in test_x]
print("Accuracy:", accuracy_score(test_y, y_hats))
print("Confusion matrix:\\n", confusion_matrix(test_y, y_hats))
print("P/R/F1:", precision_recall_fscore_support(test_y, y_hats, average='binary'))
16. References
NLTK twitter_samples. NLTK documentation. Other references: Naive Bayes tutorials and standard ML
texts.

NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
No ratings yet
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
15 pages
Naive Bayes Sentiment Analysis Guide
No ratings yet
Naive Bayes Sentiment Analysis Guide
18 pages
NLP Essentials
No ratings yet
NLP Essentials
22 pages
08 - Testing Naive Bayes - en
No ratings yet
08 - Testing Naive Bayes - en
2 pages
Probability And: Bayes' Rule
No ratings yet
Probability And: Bayes' Rule
71 pages
NLP Twitter Sentiment Analysis
No ratings yet
NLP Twitter Sentiment Analysis
3 pages
C1 W1 Assignment
No ratings yet
C1 W1 Assignment
16 pages
C1 W1 Assignment
No ratings yet
C1 W1 Assignment
16 pages
Sentiment Analysis: Using Naïve Bayes Classifier
No ratings yet
Sentiment Analysis: Using Naïve Bayes Classifier
18 pages
C1 W1 Assignment
No ratings yet
C1 W1 Assignment
14 pages
Lecture 3 Sentiment Analysis
No ratings yet
Lecture 3 Sentiment Analysis
41 pages
NLP Transformer-Based Models Used For Sentiment Analysis: 1. BERT
No ratings yet
NLP Transformer-Based Models Used For Sentiment Analysis: 1. BERT
98 pages
PPPT
No ratings yet
PPPT
20 pages
Unit1 Extra
No ratings yet
Unit1 Extra
79 pages
NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
100% (1)
NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
86 pages
Lab Report - CSE 816
No ratings yet
Lab Report - CSE 816
17 pages
Transformer Models for Sentiment Analysis
No ratings yet
Transformer Models for Sentiment Analysis
45 pages
Module4 TextAnalytics
No ratings yet
Module4 TextAnalytics
9 pages
Fady Morris Natural Language Processing
No ratings yet
Fady Morris Natural Language Processing
10 pages
Naïve Bayes for Sentiment Analysis in Python
No ratings yet
Naïve Bayes for Sentiment Analysis in Python
23 pages
Sentiment Analysis Using Machine Learning Algorithms
No ratings yet
Sentiment Analysis Using Machine Learning Algorithms
23 pages
Twitter Sentiment Analysis Project
No ratings yet
Twitter Sentiment Analysis Project
18 pages
Document Dsbda Codes For Mini Project
No ratings yet
Document Dsbda Codes For Mini Project
9 pages
Naive Bates Classifier
No ratings yet
Naive Bates Classifier
18 pages
05 Text Classification - Naive Bayes
No ratings yet
05 Text Classification - Naive Bayes
64 pages
Airline Tweets Classification Using Naive Bayes Classifier
No ratings yet
Airline Tweets Classification Using Naive Bayes Classifier
2 pages
CSE4062S21 Group3 Project Delivery7 FinalReport
No ratings yet
CSE4062S21 Group3 Project Delivery7 FinalReport
9 pages
Naive Bayes for Sentiment Analysis Guide
No ratings yet
Naive Bayes for Sentiment Analysis Guide
10 pages
Social Media Sentiment Ppt1
No ratings yet
Social Media Sentiment Ppt1
16 pages
11 Error-Analysis - en
No ratings yet
11 Error-Analysis - en
1 page
Naive Bayes
No ratings yet
Naive Bayes
56 pages
Lab5 Example Fall 23
No ratings yet
Lab5 Example Fall 23
4 pages
Text Classification
No ratings yet
Text Classification
60 pages
NB 24 Aug
No ratings yet
NB 24 Aug
79 pages
BAI601 Module 3 PDF
No ratings yet
BAI601 Module 3 PDF
19 pages
Week 4
No ratings yet
Week 4
45 pages
MOD 4 Notes
No ratings yet
MOD 4 Notes
19 pages
Multimedia Application L7 - For
No ratings yet
Multimedia Application L7 - For
46 pages
Week 2
No ratings yet
Week 2
157 pages
05 Text Classification - Naive Bayes
No ratings yet
05 Text Classification - Naive Bayes
64 pages
Real-Time Twitter Sentiment Analysis
100% (1)
Real-Time Twitter Sentiment Analysis
19 pages
Ame: Waqar Ali
No ratings yet
Ame: Waqar Ali
22 pages
Data Mining Numericals
No ratings yet
Data Mining Numericals
38 pages
COL774: Assignment 4 Naive Bayes & Collaborative Filtering: Released On: 2nd October, 2024
No ratings yet
COL774: Assignment 4 Naive Bayes & Collaborative Filtering: Released On: 2nd October, 2024
4 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
33 pages
Naive Bayes for Python Newbies
No ratings yet
Naive Bayes for Python Newbies
3 pages
Naive Bayes Etc.
No ratings yet
Naive Bayes Etc.
1 page
Lab5 NaiveBayes Full
No ratings yet
Lab5 NaiveBayes Full
5 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
11 pages
Naive Bayes Classifier in Machine Learning Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning Javatpoint
23 pages
Twitter Sentiment Analysis Overview
No ratings yet
Twitter Sentiment Analysis Overview
26 pages
Lab7&8 NaiveBayes
No ratings yet
Lab7&8 NaiveBayes
5 pages
Cp4252 Machine Learning Lab Manual
No ratings yet
Cp4252 Machine Learning Lab Manual
40 pages
Multimedia Application L8
No ratings yet
Multimedia Application L8
68 pages
Twitter Sentiment Analysis
100% (2)
Twitter Sentiment Analysis
10 pages
ML Week10.1
No ratings yet
ML Week10.1
5 pages
Twitter Sentiment Analysis with TensorFlow
No ratings yet
Twitter Sentiment Analysis with TensorFlow
13 pages
Unit 2 AAM
No ratings yet
Unit 2 AAM
32 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
Oec352 Unit1 100 MCQ Clean
No ratings yet
Oec352 Unit1 100 MCQ Clean
38 pages
Stepper Motor
No ratings yet
Stepper Motor
16 pages
Unit 3
No ratings yet
Unit 3
26 pages
Unit 2
No ratings yet
Unit 2
6 pages
6 - Application of Deep Learning For High-Throughput Phenotyping of Seed A Review
No ratings yet
6 - Application of Deep Learning For High-Throughput Phenotyping of Seed A Review
34 pages
How Simple Arithmetic Unlocks State-Of-The-Art LLM Performance
No ratings yet
How Simple Arithmetic Unlocks State-Of-The-Art LLM Performance
18 pages
2024-Analyzing Classification and Feature Selection Strategies For Diabetes Prediction Across Diverse Diabetes Datasets
No ratings yet
2024-Analyzing Classification and Feature Selection Strategies For Diabetes Prediction Across Diverse Diabetes Datasets
23 pages
California Housing Dataset Analysis
No ratings yet
California Housing Dataset Analysis
6 pages
SudaBERT A Pre-Trained Encoder Representation
No ratings yet
SudaBERT A Pre-Trained Encoder Representation
4 pages
Elevate Labs Presentation
No ratings yet
Elevate Labs Presentation
7 pages
Lecture 06 Feature Engineering
No ratings yet
Lecture 06 Feature Engineering
26 pages
大模型算法参考文献
No ratings yet
大模型算法参考文献
5 pages
Unit-1 Deep Learning
No ratings yet
Unit-1 Deep Learning
20 pages
Text Snake
No ratings yet
Text Snake
17 pages
Al3502 - DLV Unit 3
No ratings yet
Al3502 - DLV Unit 3
11 pages
Qalam - A Multimodal LLM For Arabic Optical Character and Handwriting Recognition
No ratings yet
Qalam - A Multimodal LLM For Arabic Optical Character and Handwriting Recognition
15 pages
Csm-3-1-r20 Old Papers Upto May2024
No ratings yet
Csm-3-1-r20 Old Papers Upto May2024
38 pages
Oracle Cloud Infrastructure AI Foundations (1182)
No ratings yet
Oracle Cloud Infrastructure AI Foundations (1182)
23 pages
SDE2D Semantic-Guided Discriminability Enhancement Feature Detector and Descriptor
No ratings yet
SDE2D Semantic-Guided Discriminability Enhancement Feature Detector and Descriptor
12 pages
Whitepaper - uAI Vision - 2022!09!30
No ratings yet
Whitepaper - uAI Vision - 2022!09!30
16 pages
Machine Learning and Deep Learning - Fundamentals and Applications - Unit 7 - Week 4 - Perceptron Criteria and Discriminative Models
No ratings yet
Machine Learning and Deep Learning - Fundamentals and Applications - Unit 7 - Week 4 - Perceptron Criteria and Discriminative Models
4 pages
Llm Lab Record
No ratings yet
Llm Lab Record
29 pages
Auto Encoder
No ratings yet
Auto Encoder
13 pages
GELU Activation Functions
No ratings yet
GELU Activation Functions
13 pages
Palmer Penguin
No ratings yet
Palmer Penguin
50 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
Part B-Unit-2 Advanced Concepts of Modelling
No ratings yet
Part B-Unit-2 Advanced Concepts of Modelling
5 pages
NaiveBayesAssignment 2
No ratings yet
NaiveBayesAssignment 2
4 pages
Evaluation of Solar Energy Generation and Radiation Prediction Using Machine Learning
No ratings yet
Evaluation of Solar Energy Generation and Radiation Prediction Using Machine Learning
5 pages
Soft Computing MCQ 1 To 50
No ratings yet
Soft Computing MCQ 1 To 50
5 pages
Module 3 - 3
No ratings yet
Module 3 - 3
93 pages
Memory Based Networks - 20250827 - 195509 - 0000
No ratings yet
Memory Based Networks - 20250827 - 195509 - 0000
9 pages
Data Analysis Questions in R Language
No ratings yet
Data Analysis Questions in R Language
11 pages
20a02702c Intelligent Control Techniques
No ratings yet
20a02702c Intelligent Control Techniques
1 page

Naive Bayes Sentiment Tutorial

Uploaded by

Naive Bayes Sentiment Tutorial

Uploaded by

Naive Bayes Sentiment Analysis (Twitter) - Step-by-Step

Author: Generated by ChatGPT (assisted)

2. The twitter_samples dataset

3. Train/test splitting (and shuffle)

# shuffle before slicing

def freq_neg_count(word, freqs):

Positive word counts (sum): N_pos = 7

Laplace smoothing: p(w|pos) = (count_pos(w)+1) / (N_pos + V)

Example for 'love':

Predict tweet: "I love this"

# 1. Load and shuffle

train_pos = all_pos[:4000]; test_pos = all_pos[4000:]

freqs = count_tweets({}, train_x, train_y)

def freq_pos_count(word, freqs):

def freq_neg_count(word, freqs):

logprior, loglikelihood = train_naive_bayes(freqs, train_x, train_y)

# 6. Test and metrics

You might also like