Lexical Semantics
Regina Barzilay
MIT
July, 2005
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Today: Semantic Similarity
It’s not pinin,’ it’s passed on! This parrot is no more!
It has ceased to be! It’s expired and gone to meet its
maker! This is a late parrot! It’s a stiff! Bereft of life,
it rests in peace! If you hadn’t nailed him to the perch
he would be pushing up the daisies! Its metabolical pro
cesses are of interest only to historians! It’s hopped the
twig! It’s shuffled off this mortal coil! It’s run down the
curtain and joined the choir invisible! This.... is an EX
PARROT!
From Monty Python's "Dead Parrot" sketch: http://en.wikipedia.org/wiki/Dead_Parrot
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Today: Semantic Similarity
This parrot is no more!
It has ceased to be!
It’s expired and gone to meet its maker!
This is a late parrot!
This. . . is an EX-PARROT!
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Motivation
Smoothing for statistical language models
• Two alternative guesses of speech recognizer:
For breakfast, she ate durian.
For breakfast, she ate Dorian.
• Our corpus contains neither “ate durian” nor “ate
Dorian”
• But, our corpus contains “ate orange”, “ate banana”
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Motivation
Aid for Question-Answering and Information Retrieval
• Task: “Find documents about women astronauts”
• Problem: some documents use paraphrase of
astronaut
In the history of Soviet/Russian space exploration, there
have only been three Russian women cosmonauts:
Valentina Tereshkova, Svetlana Savitskaya, and Elena
Kondakova.
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Motivation
Exploration in language acquisition
• Miller&Charles: judgments of semantic similarity
can be explained by the degree of contextual
interchangeability
• Can we automatically predict which words human
perceive as similar?
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Computing Semantic Similarity
• Use human-created resources
• Acquire required knowledge from text
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Lexicons and Semantic Nets
• Lexicons are word lists augmented with some subset
of information
– Parts-of-speech
– Different word senses
– Synonyms
• Semantic Nets
– Links between terms (IS-A, Part-Of)
CuuDuongThanCong.com https://fb.com/tailieudientucntt
WordNet
• A big lexicon with properties of a semantic net
• Started as a language project by George Miller and
Christiane Fellbaum at Princeton
• First became available in 1990
Category Unique Forms Number of Senses
Noun 114648 79689
Verb 11306 13508
Adjective 21436 18563
Adverb 4669 3664
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Synset Example
1. water, H2O – (binary compound that occurs at room temperature as a
clear colorless odorless tasteless liquid; freezes into ice below 0 degrees
centigrade and boils above 100 degrees centigrade; widely used as a sol
vent)
2. body of water, water – (the part of the earth’s surface covered with
water (such as a river or lake or ocean); ”they invaded our territorial
waters”; ”they were sitting by the water’s edge”)
3. water system, water supply, water – (facility that provides a source of
water; ”the town debated the purification of the water supply”; ”first you
have to cut off the water”)
4. water – (once thought to be one of four elements composing the uni
verse (Empedocles))
5. urine, piss, pee, piddle, weewee, water – (liquid excretory product;
”there was blood in his urine”; ”the child had to make water”)
6. water – (a fluid necessary for the life of most animals and plants; ”he
asked for a drink of water”)
CuuDuongThanCong.com https://fb.com/tailieudientucntt
WordNet Relations
• Original core relations:
– Synonymy
– Polysemy
– Metonymy
– Hyponymy/Hyperonymy
– Meronymy
– Antonymy
• New, useful addition for NLP:
– Glosses
– Links between derivationally and semantically related
noun/verb pairs
– Domain/topical terms
– Groups of similar verbs
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Synonymy
• Synonyms are different ways of expressing related
concepts
– Examples: marriage, matrimony, union, wedlock
• Synonyms are almost never truly substitutable:
– Used in different contexts
– Have different implications
� This is a point of contention
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Polysemy
• Most words have more than one sense
– Homonymy: same word, unrelated meanings
� bank (river)
� bank (financial)
– Polysemy: same word, related meanings
� Bob has ugly ears.
� Alice has a good ear for jazz.
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Polysemy Information
POS Monosemous Polysemous
Noun 99524 15124
Verb 6256 5050
Adverb 16103 5333
Adjective 3901 768
Total 125784 26275
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Metonymy
• Use one aspect of something to stand for the whole
– Newscast: The White House released new figures
today.
– Waitperson: The tofu sandwich spilled his drink.
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Hyponymy/Hyperonymy (ISA)
A is a hypernym of B if B is a type of A
A is a hyponym of B if A is a type of B
Example:
• bigamy (having two spouses at the same time)
• open marriage (a marriage in which each partner is free to enter
into extraneous sexual relationships without guilt or jealousy
from the other)
• cuckoldom (the state of a husband whose wife has committed
adultery)
• polygamy (having more than one spouse at a time)
– polyandry (having more than one husband at a time)
– polygyny (having more than one wife at a time)
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Meronymy
• Part-of relation
– part-of (beak, bird)
– part-of (bark, tree)
• Transitive conceptually but not lexically:
– The knob is a part of the door.
– The door is a part of the house.
– ? The knob is a part of the house.
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Antonymy
• Lexical opposites
– antonym (large, small)
– antonym (big, small)
– antonym (big, little)
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Computing Semantic Similarity
Suppose you are given the following words. Your task is
to group them according to how similar they are:
apple
banana
grapefruit
grape
man
woman
baby
infant
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Using WordNet to Determine Similarity
apple man
fruit male, male person
produce person, individual
... organism
banana ...
fruit woman
produce female , female person
... person, individual
organism
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Similarity by Path Length
• Count the edges (is-a links) between two concepts
and scale
• Leacock and Chodorow, 1998:
length(c1 , c2 )
d(c1 , c2 ) = −log
2 � M axDepth
• Wu and Palmer, 1994:
depth(lcs(c1 , c2 ))
d(c1 , c2 ) = −log
depth(c1 ) + depth(c2 )
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Why use WordNet?
• Quality
– Developed and maintained by researchers
• Habit
– Many applications are currently using WordNet
• Available software
– SenseRelate(Pedersen et al):
http://wn-similarity.sourceforge.com
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Similarity by Path Length
baby
child, kid
man offspring, progeny
male, male person relative, relation
person, individual person, individual
organism organism
... ...
woman
female , female person
person, individual
organism
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Why not use WordNet?
• Incomplete (technical terms may be absent)
• The length of the paths are irregular across the
hierarchies
• How to relate terms that are not in the same
hierarchies?
The “tennis problem”:
– Player
– Racquet
– Ball
– Net
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Learning Similarity from Corpora
• You shall know a word by the company it keeps (Firth
1957)
• Key assumption: Words are similar if they occur in similar
contexts
What is tizguino? (Nida, 1975)
A bottle of tizguino is on the table.
Tizguino makes you drunk.
We make tizguino out of corn.
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Learning Similarity from Corpora
CAT
cute smart dirty
DOG
cute smart dirty
PIG
cute smart dirty
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Learning Similarity from Corpora
• Define the properties one cares about, and be able
to give numerical values for each property
• Create a vector of length n with the n numerical
values for each item to be classified
• Viewing the n-dimensional vector as a point in an
n-dimensional space cluster points that are near one
another
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Key Parameters
• The properties used in the vector
• The distance metric used to decide if two points are
“close”
• The algorithm used to cluster
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Example 1: Clustering by Next Word
Brown et al. (1992)
• C(x) denotes the vector of properties of x (“context”
of x)
• Assume alphabet of size K: w 1 , . . . , w K
• C(w i ) = ∗|w 1 |, |w 2 |, . . . , |w K |≈, where |w j | followed
|w i | in the corpus
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Vector Space Model
man woman
grape
orange
apple
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Similarity Measure: Euclidean
⎨⎩n
Euclidean |�x, �y| = |x
� − �y| = i=1 (xi − y i )2
cosmonaut astronaut moon car truck
Soviet 1 0 0 1 1
American 0 1 0 1 1
spacewalking 1 1 0 0 0
red 0 0 0 1 1
full 0 0 1 0 0
old 0 0 0 1 1
cos(cosm, astr) =
⎨
(1 − 0)2 + (0 − 1)2 + (1 − 1)2 + (0 − 0)2 + (0 − 0)2 + (0 − 0)2
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Similarity Measure: Cosine
Each word is represented as a vector �x = (x1 , x2 , . . . , xn )
⎩n
�
x��y xi y i
• Cosine cos(�x, �y) = |�x||�y| = ⎩n 2 ⎩n 2
⎨ i=1⎨
x y
i=1 i=1
– Angle between two vectors
– Ranges from 0 (cos(90)=0) to 1 (cos(0)=1)
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Computing Similarity: Cosine
cosmonaut astronaut moon car truck
Soviet 1 0 0 1 1
American 0 1 0 1 1
spacewalking 1 1 0 0 0
red 0 0 0 1 1
full 0 0 1 0 0
old 0 0 0 1 1
1�0+0�1+1�1+0�0+0�0+0�0
cos(cosm, astr) = ⎨ ⎨
12 +02 +12 +02 +02 +02 02 +12 +12 +02 +02 +02
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Term Weighting
Quantity Symbol Definition
term frequency tfi,j # occurrences of wi in dj
document frequency dfi # documents that wi occurs in
�
� (1 + log(tf ))log N if tfi,j � 1
i,j dfi
tf × idf =
� 0 otherwise
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Cosine vs. Euclidean
• Cosine applied to normalized vectors gives the same
ranking of similarities as Euclidean distance does.
• Both metrics assume Euclidean space
– Suboptimal for vectors of probabilities (0.0 and
0.1 vs. 0.9 and 1)
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Mutual Information
• Definition: The mutual information I(x; y) of two
particular outcomes x and y is the amount of
information one outcome gives us about another
one
(x,y)
• I(x; y) = (− log P (x)) − (−logP (x|y)) = log PP(x)P (y)
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Example
P (pancake,syrup)
I(pancake; syrup) = log P (pancake)P (syrup)
P (Wi =pancake,Wi+1 =syrup)
I(Wi = pancake; Wi+1 = syrup) = log P (Wi =pancake)P (Wi+1 =syrup)
• “pancake” and “syrup” have no relation to each
other (P (syrup|puncake) = P (syrup))
P (pancake, syrup)
I(pancake, syrup) = log
P (pancake)P (syrup)
P (syrup|pancake)
= log
P (syrup)
P (syrup)
= log =0
P (syrup)
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Example(cont)
P (pancake,syrup)
I(pancake; syrup) = log P (pancake)P (syrup)
• “pancake” and “syrup”are perfectly coordinated
P (pancake, syrup)
I(pancake, syrup) = log
P (pancake)P (syrup)
P (pancake)
= log
P (pancake)P (syrup)
1
= log
P (syrup)
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Similarity for LM
Goal: find word clustering that decreases perplexity
1
H(L) = − log P (w1 , . . . , wN )
N
N
−1 ⎧
� log P (wi |wi−1 )
N −1 i=2
−1 �
� Count(w 1 w2 ) log P (w 2 |W 1 )
N −1 1 2
w w
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Similarity for LM
Cluster-based generalization:
−1 �
H(L, �) � Count(w 1 w2 ) log P (c2 |c1 )P (w 2 |c1 )
N −1 1 2
w w
� H(w) − I(c1 , c2 )
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Average Mutual Information
• Definition: Average mutual information of the
random variables X and Y , I(X, Y ) is the amount
of information we get about X from knowing the
value of Y , on the average.
⎩K ⎩K
• I(X; Y ) = y=1 x=1 P (w x , w y )I(w x , w y )
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Example: Syntax-Based Representation
• The vector C(n) for a word n is the distribution of
verbs for which it served as direct object
• C(n) = P (v 1 |n), P (v 2 |n), . . . , P (v K |n)
• Representation can be expanded to account for
additional syntactic relations (subject, object,
indirect-object, neutral)
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Kullback Leibler Distance (Relative
Entropy)
• Definition: The relative entropy D(p||q) is a
measure of the inefficiency of assuming that the
distribution is q when the true distribution is p
p(x)log p(x)
⎩
• D(p||q) = q(x)
• Properties:
– Non-negative
– D(p||q) = 0 iff p = q
– Not symmetric and doesn’t satisfy triangle
inequality
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Representation
• Representation
– Syntactic vs. Window-based
– Context granularity
– Alphabet size
– Counts vs. Probability
• Distance
– Vector-based vs. Probabilistic
– Weighted vs. Unweighted
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Problems with Corpus-based Similarity
• Low-frequency words skew the results
– “breast-undergoing”, “childhood-phychosis”,
“outflow-infundibulum”
• Semantic similarity does not imply synonymy
– “large-small”, “heavy-light”, “shallow-coastal”
• Distributional information may not be sufficient for
true semantic grouping
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Not-so-semantic grouping
Method Clinker
Direct Object pollution increase failure
Next Word addiction medalist inhalation Arabia growers
Adjective full increase
CuuDuongThanCong.com https://fb.com/tailieudientucntt
State-of-the-art Methods
http://www.cs.ualberta.ca/~lindek/demos/depsim.htm
Closest words for president
leader 0.264431, minister 0.251936, vice president 0.238359,
Clinton 0.238222, chairman 0.207511, government 0.206842,
Governor 0.193404, official 0.191428, Premier 0.177853,
Yeltsin 0.173577, member 0.173468, foreign minister
0.171829, Mayor 0.168488, head of state 0.167166, chief
0.164998, Ambassador 0.162118, Speaker 0.161698, General
0.159422, secretary 0.156158, chief executive 0.15158
CuuDuongThanCong.com https://fb.com/tailieudientucntt
State-of-the-art Methods
Closest words for ?
anthropology 0.275881, sociology 0.247909, comparative lit
erature 0.245912, computer science 0.220663, political sci
ence 0.219948, zoology 0.210283, biochemistry 0.197723,
mechanical engineering 0.191549, biology 0.189167, crim
inology 0.178423, social science 0.176762, psychology
0.171797, astronomy 0.16531, neuroscience 0.163764, psy
chiatry 0.163098, geology 0.158567, archaeology 0.157911,
mathematics 0.157138
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Beyond Pairwise Similarity
• Clustering is “The art of finding groups in
data”(Kaufmann and Rousseeu)
• Clustering algorithms divide a data set into
homogeneous groups (clusters), based on their
similarity under the given representation.
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Hierarchical Clustering
Greedy, bottom-up version:
• Initialization: Create a separate cluster for each object
• Each iteration: Find two most similar clusters and merge
them
• Termination: All the objects are in the same cluster
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Agglomerative Clustering
E D C B
A 0.1 0.2 0.2 0.8
B 0.1 0.1 0.2
C 0.0 0.7
D 0.6
A B C D E
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Agglomerative Clustering
E D C B
A 0.1 0.2 0.2 0.8
B 0.1 0.1 0.2
C 0.0 0.7
D 0.6
A B C D E
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Agglomerative Clustering
E D C B
A 0.1 0.2 0.2 0.8
B 0.1 0.1 0.2
C 0.0 0.7
D 0.6
A B C D E
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Clustering Function
E D C B
A 0.1 0.2 0.2 0.8
B 0.1 0.1 0.2
0.6
C 0.0 0.7
D 0.6
A B C D E
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Clustering Function
E D C B
A 0.1 0.2 0.2 0.8
B 0.1 0.1 0.2
0.0
C 0.0 0.7
D 0.6
A B C D E
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Clustering Function
E D C B
A 0.1 0.2 0.2 0.8
B 0.1 0.1 0.2
0.3
C 0.0 0.7
D 0.6
A B C D E
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Clustering Function
• Single-link: Similarity of two most similar members
• Complete-link: Similarity of two least similar members
• Group-average: Average similarity between members
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Single-Link Clustering
• Achieves Local Coherence
• Complexity O(n2 )
• Fails when clusters are not well separated
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Complete-Link Clustering
• Achieves Global Coherence
• Complexity O(n2 log n)
• Fails when clusters aren’t spherical, or of uniform
size
CuuDuongThanCong.com https://fb.com/tailieudientucntt
K-Means Algorithm: Example
Iterative, hard, flat clustering algorithm based on
Euclidean distance
CuuDuongThanCong.com https://fb.com/tailieudientucntt
K-Means Algorithm
1. Choose k points at random as cluster centers
2. Assign each instance to its closest cluster center
3. Calculate the centroid (mean) for each cluster, use it as a
new cluster center
4. Iterate (2-3) until the cluster centers don’t change
anymore
CuuDuongThanCong.com https://fb.com/tailieudientucntt
K-Means Algorithm: Hard EM
1. Guess initial parameters
2. Use model to make the best guess of ci (E-step)
3. Use the new complete data to learn better model (M-step)
4. Iterate (2-3) until convergence
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Evaluating Clustering Methods
• Perform task-based evaluation
• Test the resulting clusters intuitively, i.e., inspect
them and see if they make sense. Not advisable.
• Have an expert generate clusters manually, and test
the automatically generated ones against them.
• Test the clusters against a predefined classification if
there is one
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Comparing Clustering Methods
(Meila, 2002)
n total # of points
nk # of points in cluster Ck
K # of nonempty clusters
N11 # of pairs that are in the same cluster under C and C �
N00 # of pairs that are in the different clusters under C and C �
N10 # of pairs that are in the the same cluster under C but not C �
N01 # of pairs that are in the the same cluster under C � but not C
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Comparing by Counting Pairs
• Wallace criteria
� N11
W1 (C, C ) = ⎩
k nk (nk − 1)/2
� N11
W2 (C, C ) = ⎩
n k � (n� k� − 1)/2
k �
• Fowles-Mallows criterion
�
⎨
F (C, C ) = W1 (C, C � )W2 (C, C � )
Problems: ?
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Comparing Clustering by Set Matching
Contingency table M is a K × K matrix, whose kk �
element is the number of points in the intersection of
clusters Ck and Ck� �
1 � 2mkk�
�
L(C, C ) = max
K k� nk + n�k
k
Problems: ?
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Comparing Clustering by Set Matching
� 1 � 2mkk�
L(C, C ) = max
K k� nk + n�k
k
C C’ C’’
C1 C2 C 3 C1 C2 C3 C1 C2 C3
C3
C C3 CC
2 1 2
CC
1 3
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Summary
• Lexicon-based Similarity Computation
– WordNet relations
– Path-based similarity
• Corpus-based Similarity Computation
– Vector Space Model
– Similarity Measures
– Hierarchical Clustering
CuuDuongThanCong.com https://fb.com/tailieudientucntt