0% found this document useful (0 votes)

154 views35 pages

CS109/Stat121/AC209/E-109 Data Science: Bayesian Methods Continued, Text Data

This document discusses various Bayesian methods for analyzing text data including topic modeling and hierarchical Bayesian models. It provides examples of latent Dirichlet allocation to model topics in documents and hierarchical Bayesian models for modeling rates of word usage across documents while accounting for author-level effects. Gibbs sampling and Metropolis-Hastings algorithms are introduced for approximating posterior distributions in Bayesian models. The document also discusses analyzing word n-grams and provides examples of generating randomized text using n-grams from different sources.

Uploaded by

ankit dev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

154 views35 pages

CS109/Stat121/AC209/E-109 Data Science: Bayesian Methods Continued, Text Data

Uploaded by

ankit dev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

CS109/Stat121/AC209/E-109

Data Science
Bayesian Methods Continued, Text Data
Hanspeter Pfister, Joe Blitzstein,Verena Kaynig

Topic proportions and

Topics Documents
assignments
gene 0.04
dna 0.02
genetic 0.01
.,,

life 0.02
evolve 0.01
organism 0.01
.,,

brain 0.04
neuron 0.02
nerve 0.01
...

data 0.02
number 0.02
computer 0.01
.,,

Figure 1: The intuitions behind latent Dirichlet allocation. We assume that some
Blei, https://www.cs.princeton.edu/~blei/papers/Blei2011.pdf
number of “topics,” which are distributions over words, exist for the whole collection (far left).
Each document is assumed to be generated as follows. First choose a distribution over the
topics (the histogram at right); then, for each word, choose a topic assignment (the colored
coins) and choose the word from the corresponding topic. The topics and topic assignments
This Week
• Project team info is due tonight at 11:59 pm via
the Google form:
http://goo.gl/forms/CzVRluCZk6

• HW4 is due this Thursday (Nov 5) at 11:59 pm

• Before this Thursday’s lecture on interactive

visualizations:

• Download/install Tableau Public at

https://public.tableau.com/

• Download data file (.zip) from

http://bit.ly/cs109data
MCMC as mountain exploration

vs.

http://healthyalgorithms.com/2010/03/12/a-useful-metaphor-for-explaining-mcmc/
Bayesian Hierarchical Models: Radon Example

Example from Gelman http://www.eecs.berkeley.edu/~russell/

classes/cs294/f05/papers/gelman-2005.pdf
Python-based exposition at
http://twiecki.github.io/blog/2014/03/17/bayesian-glms-3/
Complete Pooling vs. No pooling

complete pooling: radoni,c = ↵ + · floori,c + ✏

no pooling: radoni,c = ↵c + c · floori,c + ✏c

Partial Pooling
no pooling:

partial pooling/
hierarchical model:
Partial Pooling

radoni,c = ↵c + c · floori,c + ✏c
2
↵c ⇠ N (µ↵ , ↵)
2
c ⇠ N (µ , )
http://twiecki.github.io/blog/2014/03/17/bayesian-glms-3/
Hierarchical Models Provide:

• a compromise between no pooling and complete pooling

• regularization and shrinkage
• give sensible estimates even for small groups
• organize the parameters in an interpretable way
• incorporate information at different levels in the hierarchy
(e.g., individual level, county level, state level)
• predictions at various levels of the hierarchy (e.g., for
new house or for new county)
Gibbs Sampler
Explore space by updating one coordinate at a time.
2D parameter space version:
Draw new ✓1 from conditional distribution of ✓1 |✓2
Draw new ✓2 from conditional distribution of ✓2 |✓1

Repeat

http://zoonek.free.fr/blosxom//R/2006-06-22_useR2006_rbiNormGiggs.png
Gibbs sampler animation

http://twiecki.github.io/blog/2014/01/02/visualizing-mcmc/
Metropolis-Hastings Algorithm

Modify a Markov chain on a state space of interest to obtain

a new chain with any desired stationary distribution!
2 CHAPTER 1. MARKOV CHAIN MONTE CARLO

1. If Xn = i, propose a new state j using the transition probabilities pij of the

original Markov chain.

2. Compute an acceptance probability,

✓ ◆
sj pji
aij = min ,1 .
si pij

3. Flip a coin that lands Heads with probability aij , independently of the Markov
chain.

4. If the coin lands Heads, accept the proposal and set Xn+1 = j. Otherwise, stay
in state i; set Xn+1 = i.

In other words, the modified Markov chain uses the original transition probabilities
pij to propose where to go next, then accepts the proposal with probability aij ,
https://www.siam.org/pdf/news/637.pdf
Metropolis-Hastings animation

http://twiecki.github.io/blog/2014/01/02/visualizing-mcmc/
MCMC in Python
• Stan: http://mc-stan.org

• PyMC: https://pymc-devs.github.io/pymc/
Mosteller-Wallace, Federalist Papers Authorship
Mosteller-Wallace, Federalist Papers Authorship

https://www.stat.cmu.edu/Exams/mosteller.pdf
onian. In combination with a similar treatment of other “non-contextual” words in these
Use of “upon” by Hamilton vs. Madison
writings, this approach provided strong evidence that Madison was the author of all twelve
of the disputed papers, essentially settling the authorship debate.

Rate/1000 Words Authored by Hamilton Authored by Madison 12 Disputed Papers

Exactly 0 0 41 11
(0.0, 0.4) 0 2 0
[0.4, 0.8) 0 4 0
[0.8, 1.2) 2 1 1
[1.2, 1.6) 3 2 0
[1.6, 2.0) 6 0 0
[2.0, 3.0) 11 0 0
[3.0, 4.0) 11 0 0
[4.0, 5.0) 10 0 0
[5.0, 6.0) 3 0 0
[6.0, 7.0) 1 0 0
[7.0, 8.0) 1 0 0
Totals: 48 50 12
Table from Samaniego,
Table Stochastic
1.2.1. Frequency Modeling
distribution and “upon”
of the word Mathematical Statistics
in 110 essays.

But what is the probability that Madison authored a

Exercises 1.2.
particular disputed document, and how confident
1. Specify the sample space for the experiment consisting of three consecutive tosses of a
should we be about our answer?
fair coin, and specify a stochastic model for this experiment. Using that model, compute
Poisson Model
y
e
f (y| ) =
y!
y is the number of occurrences of a specific word
is the rate parameter

a 1 b
Gamma prior is conjugate: p( ) / e
Likelihood and Posterior for Madison’s use of “from”

6
Posterior
Likelihood

5
dgamma(x, shape = 331.6, rate = 270.3)

4
3
2
1
0

0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6

g. 12.2 Posterior and the likelihood function for the rate of using the word fr
n-grams
Data science is fun.

Unigrams: look at individual words.

“data”, “science”, “is”, “fun”

Bigrams: look at word pairs.

“data science”, “science is”, “is fun”

Trigrams: look at word triplets.

“data science is”, “science is fun”
n-grams: Randomized Hobbit

into trees, and then bore to the Mountain to go

through?” groaned the hobbit. “Well, are you
doing, And where are you doing, And where
are you?” it squeaked, as it was no answer.
They were surly and angry and puzzled at
finding them here in their holes

Karl Broman, Randomized Hobbit

http://www.r-bloggers.com/randomized-hobbit/
n-grams: Hobbit/Cat in the Hat Mixture

“I am Gandalf,” said the fish. This is no way at all!

already off his horse and among the goblin and the dragon,
who had remained behind to guard the door. “Something is
outside!” Bilbo’s heart jumped into his boat on to sharp
rocks below; but there was a good game, Said our fish No!
No! Those Things should not fly.

Karl Broman, Randomized Hobbit

http://www.r-bloggers.com/randomized-hobbit/
if current == ".": return " ".join(result) # if "." we're done

n-grams
The sentences it produces are gibberish, but they’re the kind of gibberish you might
put on your website if you were trying to sound data-sciencey. For example:
If you may know which are you want to data sort the data feeds web friend someone on
trending topics as the data in Hadoop is the data science requires a book demonstrates
why visualizations are but we do massive correlations across many commercial disk
drives in Python language and creates more tractable form making connections then
use and uses it to solve a data.
—Bigram Model
We can make the sentences less gibberishy by looking at trigrams, triplets of consecu‐
tive words. (More generally, you might look at n-grams consisting of n consecutive
words, but three will be plenty for us.) Now the transitions will depend on the previ‐
In hindsight MapReduce seems like an epidemic and if so does that give us new
two words:
ousinsights into how economies work That’s not a question we could even have asked a
few years there
trigrams has been instrumented.
= zip(document, document[1:], document[2:])
trigram_transitions = defaultdict(list) —Trigram Model
starts = []
Of course, they sound better because at each step the generation process has fewe
choices,
for and at current,
prev, many steps nextonly a single choice. This means that you frequently gen
in trigrams:

erate sentences
if prev(or
== at least
".": long
Joel phrases)
Grus, Data that
the were
# ifScience seenScratch
from
previous verbatim
"word" was ainperiod
the original data
Having morestarts.append(current)
data would help; it would # thenalso
thiswork
is a better if you collected n-gram
start word
rom multiple essays about data science.
trigram_transitions[(prev, current)].append(next)
Topic Modeling
Topic proportions and
Topics Documents
assignments
gene 0.04
dna 0.02
genetic 0.01
.,,

life 0.02
evolve 0.01
organism 0.01
.,,

brain 0.04
neuron 0.02
nerve 0.01
...

data 0.02
number 0.02
computer 0.01
.,,

Figure 1: The intuitions behind latent Dirichlet allocation. We assume that some
Blei,
numberhttps://www.cs.princeton.edu/~blei/papers/Blei2011.pdf
of “topics,” which are distributions over words, exist for the whole collection (far left).
Each document is assumed to be generated as follows. First choose a distribution over the
Topic Modeling

Topic proportions and

Topics Documents
assignments
gene 0.04
dna 0.02
genetics genetic
.,,
0.01

life 0.02
evolve 0.01
evolution organism 0.01
.,,

brain 0.04
brain neuron
nerve
0.02
0.01
...

data 0.02
number 0.02

computing computer 0.01

.,,

Figure 1: The intuitions behind latent Dirichlet allocation. We assume that some
number of “topics,” which are distributions over words, exist for the whole collection (far left).
Each document is assumed to be generated as follows. First choose a distribution over the
topics (the histogram at right); then, for each word, choose a topic assignment (the colored
coins) and choose the word from the corresponding topic. The topics and topic assignments
in this figure are illustrative—they are not fit from real data. See Figure 2 for topics fit from
data.
Blei, https://www.cs.princeton.edu/~blei/papers/Blei2011.pdf
model assumes the documents arose. (The interpretation of LDA as a probabilistic model is
fleshed out below in Section 2.1.)
17,000 articles from Science, 100 topics

“Genetics” “Evolution” “Disease” “Computers”

human evolution disease computer
genome evolutionary host models
0.4

dna species bacteria information

genetic organisms diseases data
0.3

genes life resistance computers

sequence origin bacterial system
Probability

gene biology new network

0.2

molecular groups strains systems

sequencing phylogenetic control model
0.1

map living infectious parallel

information diversity malaria methods
genetics group parasite networks
0.0

1 8 16 26 36 46 56 66 76 86 96 mapping new parasites software

Topics
project two united new
sequences common tuberculosis simulations

Figure 2: Real inference with LDA. We fit a 100-topic LDA model to 17,000 articles
from the journal Science. At left is the inferred topic proportions for the example article in
Blei, https://www.cs.princeton.edu/~blei/papers/Blei2011.pdf
Figure 1. At right are the top 15 most frequent words from the most frequent topics found in
this article.

is drawn from one of the topics (step #2b), where the selected topic is chosen from the
per-document distribution over topics (step #2a).2
Latent Dirichlet Allocation (LDA):
Generation and Estimation

http://mcburton.net/blog/joy-of-tm/
Dirichlet Distribution

http://blog.bogatron.net/blog/2014/02/02/visualizing-dirichlet-distributions/
Latent Dirichlet Allocation (LDA):
Generative Model

https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation
Latent Dirichlet Allocation (LDA):
Generative Model Example
• Pick 5 to be the number of words in D.
• Decide that D will be 1/2 about food and 1/2 about cute animals.
• Pick the first word to come from the food topic, which then gives you the word “broccoli”.
• Pick the second word to come from the cute animals topic, which gives you “panda”.
• Pick the third word to come from the cute animals topic, giving you “adorable”.
• Pick the fourth word to come from the food topic, giving you “cherries”.
• Pick the fifth word to come from the food topic, giving you “eating”.

http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/
Latent Dirichlet Allocation (LDA):
Generative Model

http://mcburton.net/blog/joy-of-tm/
Recommendation Systems and LDA in the NY Times

http://open.blogs.nytimes.com/2015/08/11/building-the-next-new-york-times-
recommendation-engine/?_r=2
Recommendation Systems and LDA in the NY Times

http://open.blogs.nytimes.com/2015/08/11/building-the-next-new-york-times-
recommendation-engine/?_r=2
LDA Visualization

http://cpsievert.github.io/LDAvis/reviews/reviews.html
pyLDAvis: https://pypi.python.org/pypi/pyLDAvis

Introduction to Conditional Random Fields
No ratings yet
Introduction to Conditional Random Fields
41 pages
Inference On Relational Models Using Markov Chain Monte Carlo
No ratings yet
Inference On Relational Models Using Markov Chain Monte Carlo
61 pages
Maximum Entropy Markov Models Overview
No ratings yet
Maximum Entropy Markov Models Overview
70 pages
5.2 Feature Engineering
No ratings yet
5.2 Feature Engineering
57 pages
Module5 DS PPT
No ratings yet
Module5 DS PPT
38 pages
Probabilistic Topic Models Overview
No ratings yet
Probabilistic Topic Models Overview
78 pages
HMMs and Particle Filtering
No ratings yet
HMMs and Particle Filtering
26 pages
hw3 Solution
No ratings yet
hw3 Solution
7 pages
Xu Ly Ngon Ngu Tu Nhien Regina Barzilay Lec20 Global Linear Models (Cuuduongthancong - Com)
No ratings yet
Xu Ly Ngon Ngu Tu Nhien Regina Barzilay Lec20 Global Linear Models (Cuuduongthancong - Com)
70 pages
Statistical NLP: Models & Applications
No ratings yet
Statistical NLP: Models & Applications
43 pages
Lecture 3 - Wray Buntine
No ratings yet
Lecture 3 - Wray Buntine
185 pages
Probability
No ratings yet
Probability
56 pages
Probabilistic Topic Models
No ratings yet
Probabilistic Topic Models
78 pages
CRF Eric Xing
No ratings yet
CRF Eric Xing
31 pages
Conditional Random Fields in Sequence Labeling
No ratings yet
Conditional Random Fields in Sequence Labeling
28 pages
PyCon 2015: Bayesian Statistics Simplified
100% (4)
PyCon 2015: Bayesian Statistics Simplified
145 pages
Machine Learning and Statistical Natural Language Processing
No ratings yet
Machine Learning and Statistical Natural Language Processing
27 pages
Practical Scientific Computing in Python A Workbook
No ratings yet
Practical Scientific Computing in Python A Workbook
43 pages
NLP Techniques for Word Prediction
No ratings yet
NLP Techniques for Word Prediction
77 pages
Language Models & N-Gram Analysis
No ratings yet
Language Models & N-Gram Analysis
41 pages
Contact Session6
No ratings yet
Contact Session6
57 pages
Lecture 05 Reasoning Under Uncertainty
No ratings yet
Lecture 05 Reasoning Under Uncertainty
41 pages
AI Probability for Students
No ratings yet
AI Probability for Students
54 pages
Understanding n-gram Models in AI
No ratings yet
Understanding n-gram Models in AI
32 pages
Slides Module9
No ratings yet
Slides Module9
53 pages
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
47 pages
POS Tagging with Hidden Markov Models
No ratings yet
POS Tagging with Hidden Markov Models
37 pages
Bayesian Inference
No ratings yet
Bayesian Inference
28 pages
LDA: Topic Discovery for Researchers
No ratings yet
LDA: Topic Discovery for Researchers
47 pages
AI Final Review: Probabilistic Reasoning
No ratings yet
AI Final Review: Probabilistic Reasoning
10 pages
02 Neural Lms
No ratings yet
02 Neural Lms
58 pages
Word Sense Disambiguation Techniques
No ratings yet
Word Sense Disambiguation Techniques
39 pages
Lectures 7 and 8
No ratings yet
Lectures 7 and 8
37 pages
N Grams
No ratings yet
N Grams
13 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
55 pages
Answer Kwy AIML NW
No ratings yet
Answer Kwy AIML NW
10 pages
24f 09 Hidden Markov Models
No ratings yet
24f 09 Hidden Markov Models
79 pages
Bayes Intro PT 2
No ratings yet
Bayes Intro PT 2
13 pages
N-Gram Language Modeling Techniques
No ratings yet
N-Gram Language Modeling Techniques
87 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
Machine Learning: Inference from Probabilities
No ratings yet
Machine Learning: Inference from Probabilities
52 pages
Lec23 PDF
No ratings yet
Lec23 PDF
7 pages
CS115 Probability
No ratings yet
CS115 Probability
41 pages
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
No ratings yet
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
148 pages
NLP Week 02
No ratings yet
NLP Week 02
54 pages
crf2 PDF
No ratings yet
crf2 PDF
10 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
58 pages
Probabilistic Modeling in Finance
No ratings yet
Probabilistic Modeling in Finance
35 pages
Conditional Random Fields
No ratings yet
Conditional Random Fields
10 pages
Probabilistic Graphical Models Course
No ratings yet
Probabilistic Graphical Models Course
98 pages
Math Foundations
No ratings yet
Math Foundations
49 pages
This Is A Traditional AI Topic, But We Need To Cover It in at Least A Little Detail Here There Are Many Different Approaches To Handling Uncertainty
No ratings yet
This Is A Traditional AI Topic, But We Need To Cover It in at Least A Little Detail Here There Are Many Different Approaches To Handling Uncertainty
32 pages
Noun Phrase Extraction: A Description of Current Techniques
No ratings yet
Noun Phrase Extraction: A Description of Current Techniques
36 pages
N-Grams - Text Representation
No ratings yet
N-Grams - Text Representation
23 pages
SP14 CS188 Lecture 12 - Probability
No ratings yet
SP14 CS188 Lecture 12 - Probability
35 pages
Https Raw - Githubusercontent.com Joelgrus Data-Science-From-Scratch Master Code Natural Language Processing
No ratings yet
Https Raw - Githubusercontent.com Joelgrus Data-Science-From-Scratch Master Code Natural Language Processing
5 pages
Wall Mounted Fan Heater Guide
No ratings yet
Wall Mounted Fan Heater Guide
5 pages
Downloaded File
No ratings yet
Downloaded File
6 pages
Module 1 OSH Awareness
75% (4)
Module 1 OSH Awareness
62 pages
Installation & Commissioning Manual: Fire Precept En13
No ratings yet
Installation & Commissioning Manual: Fire Precept En13
30 pages
Uses of Rectifiers in Electronics
No ratings yet
Uses of Rectifiers in Electronics
7 pages
Prohibitedlots PDF
100% (1)
Prohibitedlots PDF
17 pages
Twizy Brochure
No ratings yet
Twizy Brochure
16 pages
Science 8, Quarter 3, Week 8
No ratings yet
Science 8, Quarter 3, Week 8
30 pages
Unicon Wires
No ratings yet
Unicon Wires
12 pages
Ocarina of Time Walkthrough
100% (2)
Ocarina of Time Walkthrough
336 pages
LPDA Antenna Design Overview
No ratings yet
LPDA Antenna Design Overview
14 pages
Oil & Gas QA/QC Engineering Expert
No ratings yet
Oil & Gas QA/QC Engineering Expert
7 pages
Transformer Accessories
No ratings yet
Transformer Accessories
4 pages
Manual SINUMERIK Toolbox EnUS en-US
No ratings yet
Manual SINUMERIK Toolbox EnUS en-US
216 pages
Aurora Performance Optimization Techniques
No ratings yet
Aurora Performance Optimization Techniques
57 pages
Heat Transfer Documentation: Release 1.0.3
No ratings yet
Heat Transfer Documentation: Release 1.0.3
285 pages
Hven Top
100% (3)
Hven Top
5 pages
Ostrich Egg Adamu Mohammed
No ratings yet
Ostrich Egg Adamu Mohammed
1 page
Tds Denisil Superior
No ratings yet
Tds Denisil Superior
2 pages
1125 KVA (900 KW) Waukesha PDF
No ratings yet
1125 KVA (900 KW) Waukesha PDF
1 page
New Icpo Cif Oi̇l Company Icpo - Jet A1 Fuel Cif 04042024
100% (5)
New Icpo Cif Oi̇l Company Icpo - Jet A1 Fuel Cif 04042024
4 pages
Goa April To September-2025 Temp.
No ratings yet
Goa April To September-2025 Temp.
11 pages
WWII Skirmish Rules for Miniatures
No ratings yet
WWII Skirmish Rules for Miniatures
43 pages
Life and Legacy of Longhunters
No ratings yet
Life and Legacy of Longhunters
4 pages
Fair and Comprehensive Benchmarking of Machine Learning Processing Chips
No ratings yet
Fair and Comprehensive Benchmarking of Machine Learning Processing Chips
10 pages
CAN Communication: Error Identification Cause Notes Action
No ratings yet
CAN Communication: Error Identification Cause Notes Action
8 pages
Mandatory Safety Training for Construction
No ratings yet
Mandatory Safety Training for Construction
22 pages
Henry Miller
No ratings yet
Henry Miller
7 pages
Individual Septic Tanks in Malaysia
No ratings yet
Individual Septic Tanks in Malaysia
2 pages

CS109/Stat121/AC209/E-109 Data Science: Bayesian Methods Continued, Text Data

Uploaded by

CS109/Stat121/AC209/E-109 Data Science: Bayesian Methods Continued, Text Data

Uploaded by

CS109/Stat121/AC209/E-109

Topic proportions and

• HW4 is due this Thursday (Nov 5) at 11:59 pm

• Before this Thursday’s lecture on interactive

• Download/install Tableau Public at

• Download data file (.zip) from

Example from Gelman http://www.eecs.berkeley.edu/~russell/

complete pooling: radoni,c = ↵ + · floori,c + ✏

no pooling: radoni,c = ↵c + c · floori,c + ✏c

• a compromise between no pooling and complete pooling

Modify a Markov chain on a state space of interest to obtain

1. If Xn = i, propose a new state j using the transition probabilities pij of the

2. Compute an acceptance probability,

Rate/1000 Words Authored by Hamilton Authored by Madison 12 Disputed Papers

But what is the probability that Madison authored a

0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6

Unigrams: look at individual words.

Bigrams: look at word pairs.

Trigrams: look at word triplets.

into trees, and then bore to the Mountain to go

Karl Broman, Randomized Hobbit

“I am Gandalf,” said the fish. This is no way at all!

Karl Broman, Randomized Hobbit

Topic proportions and

computing computer 0.01

“Genetics” “Evolution” “Disease” “Computers”

dna species bacteria information

genes life resistance computers

gene biology new network

molecular groups strains systems

map living infectious parallel

1 8 16 26 36 46 56 66 76 86 96 mapping new parasites software

You might also like