Ranking Problems: 9.520 Class 09, 08 March 2006 Giorgos Zacharia
Ranking Problems: 9.520 Class 09, 08 March 2006 Giorgos Zacharia
• Preference Modeling:
• Preference modeling:
– Metric based:
• User rated configuration xi with yi=U (xi)
– Choice based:
• Given choices x1, x2,…xd, the user chose xf
– Prior information about the features:
• Cheaper is better
• Faster is better
• etc
Types of information available
• Information Retrieval:
– Metric based:
• Users clicked on link xi with a frequency yi=U (xi)
– Choice based:
• Given choices x1, x2,…xd, the user clicked on xf
– Prior information about the features:
• Keyword matches (the more the better)
• Unsupervised similarity scores (TFIDF)
• etc
Types of information available
• Information Extraction:
– Choice based:
• Given tagging choices x1, x2,…xd, the hand labeling chose xf
– Prior information about the features:
• Unsupervised scores
• Multiclass:
– Choice based:
• Given vectors the confidence scores c1, c2,…cd for class labels
1,2,…d the correct label was yf.. . The confidence scores may be
coming from set of weak classifiers, and/or OVA comparisons.
– Prior information about the features:
• The higher the confidence score the more likely to represent the
correct label.
(Semi-)Unsupervised Ranking Problems
P − Q 2Q 2 P
τ= = 1− = −1
P + Q ⎛ n ⎞ ⎛ n ⎞
⎜ ⎟ ⎜ ⎟
⎝2⎠ ⎝2⎠
• P is the number of concordant pairs
Example
A B C D E F G H
Person
Rank by Height 1 2 3 4 5 6 7 8
Rank by Weight 3 4 1 2 5 7 8 6
• P = 5 + 4 + 5 + 4 + 3 + 1 + 0 + 0 = 22
2
P
44
τ =
− 1
= −
1 = 0.57
⎛n⎞ 22
⎜ ⎟
⎝2⎠
Minimizing discordant pairs
2Q
maximize Kendall ' s τ = 1 −
⎛n⎞
⎜ ⎟
⎝ 2
⎠
Equivalent to
satisfying all ∀ r(x i ) ≥ r(x j ): wΦ(x i ) ≥ wΦ(x j )
constraints:
Familiar problem
ξ ij ≥ 0
rearranging :
w ( Φ(x i ) -Φ(x j ) ) ≥ 1-ξ ij
equivalent to classification of pairwise difference vectors
Regularized Ranking
( )
∑ V yi − y j , f ( xi − x j ) + γ f
l
2
min f ∈H K K
j ,i=1
Notes:
V(.) can be any relevant loss function
We could use any binary classifier; RLSC, SVM, Boosted Trees, etc
The framework for classifying vectors of differences is general
enough to apply to both metric, and choice based problems
Bound on Mean Average Precision
∑p
i=1
i
= Q + n ( n +1) / 2
n ⎝ i =1 ⎠
Prior Information
• Ranking problems come with a lot of prior
knowledge
– Positivity constraints
• For a pairwise comparison, where all attributes are
equal, except one, the instance with the highest
(lowest) value is preferred.
– If A is better than B, then B is worse than A
Prior information
Positivity constraints Symmetric comparisons
w f ≥ 1-ξ f , ∀f = 1, . . .m
∑Wij ( K ( x, xi ) − K ( y, xi ) )
2
=
i =1
this leads to :
1
+ C ∑ ξijk
2
min AWAT
2 i , j ,k
or we can write it as :
1
min wT Lw + C ∑ ξijk
2 i , j ,k
2
with A = Φ, L = ( AT A)( AT A) s.t. AWAT = wT Lw
Learning distance metrics
Experiments (Schultz&Joachims)
Note: Schultz&Joachims report that they got the best results with a linear kernel
where A=I. They do not regularize the complexity of their weighted distance
metric (Remember Regularized Manifolds from previous class)
Learning from seemingly-unrelated comparisons
(Evgeniou&Pontil; Chappelle&Harchaoui )
Given l comparisons from the same user
and u comparisons from seemingly-unrelated users:
l l +u
∑V ( y − f ( x )) ∑ V ( y − f ( x ))
2
min f ∈H K i i +μ 2
i i +γ f K
i =1 i = l +1
0 ≤ μ ≤1
where yi = y j − yk and xi = x j − xk , ∀j ≠ k
Results of RLSC experiments with l=10 comparisons per
user, with u instances of seemingly-unrelated comparisons,
and weight μ on loss contributed by the seemingly-
unrelated data.
u=10 u=20 u=30 u=50 u=100
μ=0 18.141 % 18.090 % 18.380 % 18.040% 18.430 %
μ=0.00000 18.268 % 18.117 % 17.847 % 18.152% 18.009 %
1
μ=0.00001 17.897 % 18.123 % 18.217 % 18.182% 18.164 %
μ=0.0001 17.999 % 18.135 % 18.067 % 18.089 % 18.036 %
μ=0.001 18.182 % 17.835 % 18.092 % 18.140 % 18.135 %
μ=0.01 17.986 % 17.905 % 18.043 % 18.023 % 18.174 %
μ=0.1 17.132 % 16.508 % 16.225 % 15.636 % 15.242%
μ=0.2 16.133 % 15.520 % 15.157 % 15.323 % 15.276 %
μ=0.3 15.998 % 15.602 % 15.918 % 16.304 % 17.055 %
μ=0.4 16.581 % 16.786 % 17.162 % 17.812 % 19.494 %
μ=0.5 17.455 % 17.810 % 18.676 % 19.838 % 22.090 %
μ=0.6 18.748 % 19.589 % 20.440 % 22.355 % 25.258 %
Ranking learning with seemingly-
unrelated data
• More seemingly-unrelated comparisons in the
training set improve results
• There is no measure of similarity of the
seemingly-unrelated data (recall
Schultz&Joachims)
Regularized Manifolds
2
γI
∑ V ( f ( x ) − f ( x )) W
1 l l
∑ V ( xi , yi , f ) + γ A f
2
f * = argmin +
(u + l )
f ∈H K K 2 i j ij
l i =1 i , j =1
1 l γI
∑ V ( xi , yi , f ) + γ A f
2
= argmin + f T Lf
(u + l )
f ∈H K K 2
l i =1
Laplacian L = D − W
Laplacian RLSC:
1 l γI
∑ ( yi − f ( xi ) ) + γ A f
2 2
min + f T Lf
(u + l )
f ∈H K K 2
l i =1
Laplacian RLSC for ranking with seemingly-
unrelated data
1 l μ2 l +u
γI
∑ ( yi − f ( xi ) ) + ∑ ( yi − f ( xi ) ) + γ A f
2 2 2
min + f T Lf
(u + l )
f ∈H K K 2
l i =1 u i =l +1
1 l +u μ γI
(
∑ yi − f ( xi ) + γ A f)
2 2
μ
min + f T Lf
(u + l )
f ∈H K K 2
l i =1
Laplacian RLSC for ranking with seemingly-
unrelated data
l +u
f *
( x ) = ∑ α i* K μ ( x, xi )
i =1