0% found this document useful (0 votes)

88 views46 pages

Lecture15 Learning Ranking

This document provides an introduction to using machine learning techniques for information retrieval ranking. It discusses how early attempts at using machine learning for IR ranking were not very successful due to limited training data and features, but that modern search engines now use hundreds of features making machine learning more applicable. The document describes using classification and regression models for learning to rank, including using SVMs to learn ranking functions from query-document feature vectors and relevance judgments. Pairwise learning approaches are emphasized for modeling the ordinal nature of relevance rankings.

Uploaded by

Edgar Martínez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views46 pages

Lecture15 Learning Ranking

Uploaded by

Edgar Martínez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Introduction to Information Retrieval

Introduction to
Information Retrieval
CS276: Information Retrieval and Web
Search
Christopher Manning and Pandu Nayak

Lecture 15: Learning to Rank

Introduction to Information Retrieval Sec. 15.4

Machine learning for IR ranking?

 We’ve looked at methods for ranking documents
in IR
 Cosine similarity, inverse document frequency,
proximity, pivoted document length normalization,
Pagerank, …
 We’ve looked at methods for classifying
documents using supervised machine learning
classifiers
 Naïve Bayes, Rocchio, kNN, SVMs

 Surely we can also use machine learning to rank

the documents displayed in search results?
Introduction to Information Retrieval
Introduction to Information Retrieval

Machine learning for IR ranking

 This “good idea” has been actively researched –
and actively deployed by major web search
engines – in the last 7 or so years
 Why didn’t it happen earlier?
 Modern supervised ML has been around for about 20
years…
 Naïve Bayes has been around for about 50 years…
Introduction to Information Retrieval

Machine learning for IR ranking

 There’s some truth to the fact that the IR
community wasn’t very connected to the ML
community
 But there were a whole bunch of precursors:
 Wong, S.K. et al. 1988. Linear structure in information
retrieval. SIGIR 1988.
 Fuhr, N. 1992. Probabilistic methods in information
retrieval. Computer Journal.
 Gey, F. C. 1994. Inferring probability of relevance
using the method of logistic regression. SIGIR 1994.
 Herbrich, R. et al. 2000. Large Margin Rank
Boundaries for Ordinal Regression. Advances in
Introduction to Information Retrieval

Why weren’t early attempts very

successful/influential?
 Sometimes an idea just takes time to be
appreciated…
 Limited training data
 Especially for real world use (as opposed to writing
academic papers), it was very hard to gather test
collection queries and relevance judgments that are
representative of real user needs and judgments on
documents returned
 This has changed, both in academia and industry
 Poor machine learning techniques
 Insufficient customization to IR problem
 Not enough features for ML to show value
Introduction to Information Retrieval

Why wasn’t ML much needed?

 Traditional ranking functions in IR used a very
small number of features, e.g.,
 Term frequency
 Inverse document frequency
 Document length
 It was easy to tune weighting coefficients by
hand
 And people did
Introduction to Information Retrieval

Why is ML needed now?

 Modern systems – especially on the Web – use a
great number of features:
 Arbitrary useful features – not a single unified model
 Log frequency of query word in anchor text?
 Query word in color on page?
 # of images on page?
 # of (out) links on page?
 PageRank of page?
 URL length?
 URL contains “~”?
 Page edit recency?
 Page length?
 The New York Times (2008-06-03) quoted Amit
Singhal as saying Google was using over 200 such
Introduction to Information Retrieval Sec. 15.4.1

Simple example:
Using classification for ad hoc IR
 Collect a training corpus of (q, d, r) triples
 Relevance r is here binary (but may be multiclass, with 3–7
values)
 Document is represented by a feature vector
 x = (α, ω) α is cosine similarity, ω is minimum query
window size
 ω is the the shortest text span that includes all query
words
 Query term proximity is a very important new weighting
factor
 Train a machine learning model to predict the class r of
a document-query pair
Introduction to Information Retrieval Sec. 15.4.1

Simple example:
Using classification for ad hoc IR
 A linear score function is then
Score(d, q) = Score(α, ω) = aα + bω + c
 And the linear classifier is
Decide relevant if Score(d, q) > θ

 … just like when we were doing text

classification
Introduction to Information Retrieval Sec. 15.4.1

Simple example:
Using classification for ad hoc IR

0.05
cosine score 

R Decision
R N surface
R R
R
R R

N N
0.025 R
R R
R N N
N
N
N
N N

0
2 3 4 5
Term proximity 
Introduction to Information Retrieval

More complex example of using

classification for search ranking [Nallapati 2004]

 We can generalize this to classifier functions

over more features
 We can use methods we have seen previously
for learning the linear classifier weights
Introduction to Information Retrieval

An SVM classifier for information

retrieval [Nallapati 2004]
 Let g(r|d,q) = wf(d,q) + b
 SVM training: want g(r|d,q) ≤ −1 for nonrelevant
documents and g(r|d,q) ≥ 1 for relevant
documents
 SVM testing: decide relevant iff g(r|d,q) ≥ 0

 Features are not word presence features (how

would you deal with query words not in your training
data?) but scores like the summed (log) tf of all
query terms
 Unbalanced data (which can result in trivial always-
say-nonrelevant classifiers) is dealt with by
undersampling nonrelevant documents during
Introduction to Information Retrieval

An SVM classifier for information

retrieval [Nallapati 2004]
 Experiments:
 4 TREC data sets
 Comparisons with Lemur, a state-of-the-art open
source IR engine (Language Model (LM)-based – see
IIR ch. 12)
 Linear kernel normally best or almost as good as
quadratic kernel, and so used in reported results
 6 features, all variants of tf, idf, and tf.idf scores
Introduction to Information Retrieval

An SVM classifier for information retrieval

[Nallapati 2004]
Train \ Test Disk 3 Disk 4-5 WT10G
(web)
Disk 3 LM 0.1785 0.2503 0.2666
SVM 0.1728 0.2432 0.2750
Disk 4-5 LM 0.1773 0.2516 0.2656
SVM 0.1646 0.2355 0.2675
 At best the results are about equal to LM
 Actually a little bit below
 Paper’s advertisement: Easy to add more
features
 This is illustrated on a homepage finding task on
WT10G:
 Baseline LM 52% success@10, baseline SVM
58%
Introduction to Information Retrieval Sec. 15.4.2

“Learning to rank”
 Classification probably isn’t the right way to think
about approaching ad hoc IR:
 Classification problems: Map to a unordered set of
classes
 Regression problems: Map to a real value
 Ordinal regression problems: Map to an ordered set
of classes
 A fairly obscure sub-branch of statistics, but what we want
here
 This formulation gives extra power:
 Relations between relevance levels are modeled
 Documents are good versus other documents for
Introduction to Information Retrieval

“Learning to rank”
 Assume a number of categories C of relevance
exist
 These are totally ordered: c1 < c2 < … < cJ
 This is the ordinal regression setup
 Assume training data is available consisting of
document-query pairs represented as feature
vectors ψi and relevance ranking ci

 We could do point-wise learning, where we try to

map items of a certain relevance rank to a
subinterval (e.g, Crammer et al. 2002 PRank)
 But most work does pair-wise learning, where the
input is a pair of results for a query, and the class is
the relevance ordering relationship between them
Introduction to Information Retrieval

Point-wise learning
 Goal is to learn a threshold to separate each
rank
Pairwise learning: The Ranking
Introduction to Information Retrieval Sec. 15.4.2

SVM
[Herbrich et al. 1999, 2000; Joachims et al. 2002]

 Aim is to classify instance pairs as correctly

ranked or incorrectly ranked
 This turns an ordinal regression problem back into a
binary classification problem
 We want a ranking function f such that
ci > ck iff f(ψi) > f(ψk)
 … or at least one that tries to do this with
minimal error
 Suppose that f is a linear function
f(ψi) = wψi
Introduction to Information Retrieval Sec. 15.4.2

The Ranking SVM

[Herbrich et al. 1999, 2000; Joachims et al. 2002]

 Ranking Model: f(ψi)

Introduction to Information Retrieval Sec. 15.4.2

The Ranking SVM

[Herbrich et al. 1999, 2000; Joachims et al. 2002]

 Then (combining the two equations on the last

slide):
ci > ck iff w(ψi − ψk) > 0
 Let us then create a new instance space from
such pairs:
Φu = Φ(di, dj, q) = ψi − ψk
zu = +1, 0, −1 as ci >,=,< ck
 We can build model over just cases for which zu
= −1
 From training data S = {Φu}, we train an SVM
Introduction to Information Retrieval

Two queries in the original space

Introduction to Information Retrieval

Two queries in the pairwise space

Introduction to Information Retrieval Sec. 15.4.2

The Ranking SVM

[Herbrich et al. 1999, 2000; Joachims et al. 2002]

 The SVM learning task is then like other

examples that we saw before

 Find w and ξu ≥ 0 such that

 ½wTw + C Σ ξu is minimized, and
 for all Φu such that zu < 0, wΦu ≥ 1 − ξu

 We can just do the negative zu, as ordering is

antisymmetric

 You can again use SVMlight (or other good SVM

libraries) to train your model (SVMrank
specialization)
Introduction to Information Retrieval

Aside: The SVM loss function

 The minimization
minw ½wTw + C Σ ξu
and for all Φu such that zu < 0, wΦu ≥ 1 − ξu
 can be rewritten as
minw (1/2C)wTw + Σ ξu
and for all Φu such that zu < 0, ξu ≥ 1 − (wΦu)

 Now, taking λ = 1/2C, we can reformulate this as

minw Σ [1 − (wΦu)]+ + λwTw
 Where []+ is the positive part (0 if a term is
negative)
Introduction to Information Retrieval

Aside: The SVM loss function

 The reformulation Hinge loss Regularizer of‖w‖
minw Σ [1 − (wΦu)]+ + λwTw
 shows that an SVM can be thought of as having
an empirical “hinge” loss combined with a
weight regularizer

Loss

1 wΦu
Introduction to Information Retrieval

Adapting the Ranking SVM for (successful)

Information Retrieval
[Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang,
Hsiao-Wuen Hon SIGIR 2006]
 A Ranking SVM model already works well
 Using things like vector space model scores as
features
 As we shall see, it outperforms them in evaluations
 But it does not model important aspects of
practical IR well
 This paper addresses two customizations of the
Ranking SVM to fit an IR utility model
Introduction to Information Retrieval

The ranking SVM fails to model the IR

problem well …
1. Correctly ordering the most relevant
documents is crucial to the success of an IR
system, while misordering less relevant results
matters little
 The ranking SVM considers all ordering violations as
the same
2. Some queries have many (somewhat) relevant
documents, and other queries few. If we treat
all pairs of results for a query equally, queries
with many results will dominate the learning
 But actually queries with few relevant results are at
least as important to do well on
Introduction to Information Retrieval

Based on the LETOR test

collection
 From Microsoft Research Asia
 An openly available standard test collection with
pregenerated features, baselines, and research
results for learning to rank
 It’s availability has really driven research in this area
 OHSUMED, MEDLINE subcollection for IR
 350,000 articles
 106 queries
 16,140 query-document pairs
 3 class judgments: Definitely relevant (DR), Partially
Relevant (PR), Non-Relevant (NR)
 TREC GOV collection (predecessor of GOV2, cf. IIR
p. 142)
 1 million web pages
 125 queries
Introduction to Information Retrieval
Principal components projection of 2 queries
[solid = q12, open = q50; circle = DR, square = PR, triangle
= NR]
Introduction to Information Retrieval
discrepancy
[r3 = Definitely Relevant, r2 = Partially Relevant, r1 =
Nonrelevant]
Introduction to Information Retrieval

Number of training documents per

query discrepancy [solid = q12, open = q50]
Introduction to Information Retrieval

Recap: Two Problems with Direct

Application of the Ranking SVM
 Cost sensitiveness: negative effects of making errors on top
ranked documents

d: definitely relevant, p: partially relevant, n: not relevant

ranking 1: p d p n n n n
ranking 2: d p n p n n n

 Query normalization: number of instance pairs varies

according to query

q1: d p p n n n n
q2: d d p p p n n n n n
q1 pairs: 2*(d, p) + 4*(d, n) + 8*(p, n) = 14
q2 pairs: 6*(d, p) + 10*(d, n) + 15*(p, n) = 31
Introduction to Information Retrieval

These problems are solved with a new Loss

function

 τ weights for type of rank difference

 Estimated empirically from effect on NDCG
 μ weights for size of ranked result set
 Linearly scaled versus biggest result set
Introduction to Information Retrieval

Experiments
 OHSUMED (from LETOR)
 Features:
 6 that represent versions of tf, idf, and tf.idf factors
 BM25 score (IIR sec. 11.4.3)
 A scoring function derived from a probabilistic approach to
IR, which has traditionally done well in TREC evaluations,
etc.
Introduction to Information Retrieval

Experimental Results (OHSUMED)

Introduction to Information Retrieval

MSN Search [now Bing]

 Second experiment with MSN search
 Collection of 2198 queries
 6 relevance levels rated:
 Definitive 8990
 Excellent 4403
 Good 3735
 Fair 20463
 Bad 36375
 Detrimental 310
Introduction to Information Retrieval
Experimental Results (MSN
search)
Introduction to Information Retrieval
Alternative: Optimizing Rank-Based
Measures
[Yue et al. SIGIR 2007]

 If we think that NDCG is a good approximation

of the user’s utility function from a result ranking
 Then, let’s directly optimize this measure
 As opposed to some proxy (weighted pairwise prefs)

 But, there are problems …

 Objective function no longer decomposes
 Pairwise prefs decomposed into each pair

 Objective function is flat or discontinuous

Introduction to Information Retrieval

Discontinuity Example

 NDCG computed using rank positions

 Ranking via retrieval scores
 Slight changes to model parameters
 Slight changes to retrieval scores
 No change to ranking
 No change to NDCG
d1 d2 d3
NDCG = 0.63 Retrieval Score 0.9 0.6 0.3

NDCG discontinuous
Rank 1 2 3
w.r.t model parameters! Relevance 0 1 0
Introduction to Information Retrieval

Structural SVMs [Tsochantaridis et al.,

2007]
 Structural SVMs are a generalization of SVMs where the
output classification space is not binary or one of a set of
classes, but some complex object (such as a sequence or a
parse tree)
 Here, it is a complete (weak) ranking of documents for a
query
 The Structural SVM attempts to predict the complete ranking
for the input query and document set
 The true labeling is a ranking where the relevant documents
are all ranked in the front, e.g.,

 An incorrect labeling would be any other ranking, e.g.,

 There are an intractable number of rankings, thus an

Introduction to Information Retrieval

Structural SVM training

[Tsochantaridis et al., 2007]
Structural SVM training proceeds incrementally by starting with a working set
of constraints, and adding in the most violated constraint at each iteration

Original SVM Problem Structural SVM Approach

 Repeatedly finds the next most
 Exponential constraints violated constraint…
 Most are dominated by a small  …until a set of constraints which is a
set of “important” constraints good approximation is found
Introduction to Information Retrieval

Other machine learning methods for

learning to rank
 Of course!
 I’ve only presented the use of SVMs for machine
learned relevance, but other machine learning
methods have also been used successfully
 Boosting: RankBoost
 Ordinal Regression loglinear models
 Neural Nets: RankNet
 (Gradient-boosted) Decisision Trees
Introduction to Information Retrieval

The Limitation of Machine Learning

 Everything that we have looked at (and most
work in this area) produces linear models of
features by weighting different base features
 This contrasts with most of the clever ideas of
traditional IR, which are nonlinear scalings and
combinations of basic measurements
 log term frequency, idf, pivoted length normalization
 At present, ML is good at weighting features, but
not at coming up with nonlinear scalings
 Designing the basic features that give good signals
for ranking remains the domain of human creativity
Introduction to Information Retrieval

Summary
 The idea of learning ranking functions has been
around for about 20 years
 But only recently have ML knowledge, availability of
training datasets, a rich space of features, and
massive computation come together to make this a
hot research area
 It’s too early to give a definitive statement on what
methods are best in this area … it’s still advancing
rapidly
 But machine learned ranking over many features
now easily beats traditional hand-designed ranking
functions in comparative evaluations [in part by using the hand-
designed functions as features!]

 And there is every reason to think that the

Introduction to Information Retrieval

Resources
 IIR secs 6.1.2–3 and 15.4
 LETOR benchmark datasets
 Website with data, links to papers, benchmarks, etc.
 http://research.microsoft.com/users/LETOR/
 Everything you need to start research in this area!
 Nallapati, R. Discriminative models for information
retrieval. SIGIR 2004.
 Cao, Y., Xu, J. Liu, T.-Y., Li, H., Huang, Y. and Hon,
H.-W. Adapting Ranking SVM to Document
Retrieval, SIGIR 2006.
 Y. Yue, T. Finley, F. Radlinski, T. Joachims. A
Support Vector Method for Optimizing Average
Precision. SIGIR 2007.

Lecture15 Learning Ranking
No ratings yet
Lecture15 Learning Ranking
46 pages
Lecture14 Learning Ranking
No ratings yet
Lecture14 Learning Ranking
51 pages
Support Vector Machines in Text Classification
No ratings yet
Support Vector Machines in Text Classification
28 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
40 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Evaluating Search Engine Relevance
No ratings yet
Evaluating Search Engine Relevance
44 pages
4 Lec 2025
No ratings yet
4 Lec 2025
57 pages
Chap 13
No ratings yet
Chap 13
68 pages
Materi Pertemuan Ke-1-Dno 2018-1
No ratings yet
Materi Pertemuan Ke-1-Dno 2018-1
42 pages
IR - 2 Unit
No ratings yet
IR - 2 Unit
46 pages
Evaluation and Result Summaries
No ratings yet
Evaluation and Result Summaries
60 pages
10 Prob
No ratings yet
10 Prob
33 pages
Ranked Retrieval: Thus Far, Our Queries Have All Been Boolean
No ratings yet
Ranked Retrieval: Thus Far, Our Queries Have All Been Boolean
40 pages
Information Storage and Retrival
No ratings yet
Information Storage and Retrival
31 pages
Precision and Recall in IR Evaluation
No ratings yet
Precision and Recall in IR Evaluation
50 pages
Ip 8
No ratings yet
Ip 8
51 pages
Feature Selection in Text Classification
No ratings yet
Feature Selection in Text Classification
66 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
32 pages
Vector Space Model & Term Weighting
No ratings yet
Vector Space Model & Term Weighting
41 pages
6 Tfidf
No ratings yet
6 Tfidf
48 pages
Relevance Feedback
No ratings yet
Relevance Feedback
47 pages
Lecture6-Tfidf Vector Space Model
No ratings yet
Lecture6-Tfidf Vector Space Model
45 pages
Enhancing Search Intent Matching in IR
No ratings yet
Enhancing Search Intent Matching in IR
45 pages
Manning Christopher, Prabhakar Raghavan, Hinrich Schu Tze: Introduction To Information Retrieval
No ratings yet
Manning Christopher, Prabhakar Raghavan, Hinrich Schu Tze: Introduction To Information Retrieval
4 pages
An Overview of Information Retrieval Outline: A (Simple) Database Example Databases vs. IR
No ratings yet
An Overview of Information Retrieval Outline: A (Simple) Database Example Databases vs. IR
16 pages
Overview of Rocchio's Algorithm
No ratings yet
Overview of Rocchio's Algorithm
67 pages
IR System Evaluation Methods
No ratings yet
IR System Evaluation Methods
25 pages
Information Retrieval System and The Pagerank Algorithm
No ratings yet
Information Retrieval System and The Pagerank Algorithm
37 pages
5-Introduction To Information Retrieval
No ratings yet
5-Introduction To Information Retrieval
3 pages
Intro to Info Retrieval Course
No ratings yet
Intro to Info Retrieval Course
31 pages
IR Models for Students
No ratings yet
IR Models for Students
62 pages
Introduction To Information Retrieval - DR Alshli
No ratings yet
Introduction To Information Retrieval - DR Alshli
46 pages
Introduction to Information Retrieval
No ratings yet
Introduction to Information Retrieval
77 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
54 pages
IRS Unit 3 by Krishna
No ratings yet
IRS Unit 3 by Krishna
50 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
42 pages
Web Search
No ratings yet
Web Search
30 pages
Efficient Information Retrieval Techniques
No ratings yet
Efficient Information Retrieval Techniques
19 pages
Index Construction in Information Retrieval
No ratings yet
Index Construction in Information Retrieval
46 pages
Relevance Feedback: Improving Results
No ratings yet
Relevance Feedback: Improving Results
41 pages
8.relavance Feedback - II
No ratings yet
8.relavance Feedback - II
52 pages
1 Overview
No ratings yet
1 Overview
44 pages
I R Rank
No ratings yet
I R Rank
52 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
ReNeuIR at SIGIR 2023: The Second Workshop On Reaching Efficiency in Neural Information Retrieval
No ratings yet
ReNeuIR at SIGIR 2023: The Second Workshop On Reaching Efficiency in Neural Information Retrieval
4 pages
6 Lec 2025
No ratings yet
6 Lec 2025
20 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
49 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
50 pages
Lecture 6 Score - Term Weight - Vector Space Model
No ratings yet
Lecture 6 Score - Term Weight - Vector Space Model
43 pages
Probabilistic Information Retrieval
No ratings yet
Probabilistic Information Retrieval
38 pages
Information Retrieval: Introduction To
No ratings yet
Information Retrieval: Introduction To
48 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
67 pages
Information Retrieval Models
No ratings yet
Information Retrieval Models
113 pages
Introduction to Information Retrieval
No ratings yet
Introduction to Information Retrieval
48 pages
Lecture 5 - Scoring, Term Weighting, Vector Space Model - Part 1
No ratings yet
Lecture 5 - Scoring, Term Weighting, Vector Space Model - Part 1
45 pages
Unit 2 - Modern Information Retrieval - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Modern Information Retrieval - WWW - Rgpvnotes.in
8 pages
Lecture16 Linkanalysis
No ratings yet
Lecture16 Linkanalysis
58 pages
Information Retrieval Models Guide
No ratings yet
Information Retrieval Models Guide
30 pages
Docker & PySpark for Data Enthusiasts
100% (1)
Docker & PySpark for Data Enthusiasts
15 pages
Kubernetes For Developers
No ratings yet
Kubernetes For Developers
2 pages
Orchestrating The Cloud With Kubernetes
No ratings yet
Orchestrating The Cloud With Kubernetes
26 pages
Docker & PySpark for Data Enthusiasts
100% (1)
Docker & PySpark for Data Enthusiasts
15 pages
Docker & PySpark for Data Enthusiasts
100% (1)
Docker & PySpark for Data Enthusiasts
15 pages
01 - YABLUKO - Ukrainian Elementary - Student's Book by UkrainianSchoolUCU - Issuu
71% (7)
01 - YABLUKO - Ukrainian Elementary - Student's Book by UkrainianSchoolUCU - Issuu
83 pages
Russian Verbs of Motion PDF
No ratings yet
Russian Verbs of Motion PDF
276 pages
Código Del Documento HTML.: Factura
No ratings yet
Código Del Documento HTML.: Factura
4 pages
Aircraft Modelling
No ratings yet
Aircraft Modelling
152 pages
Yucatec Mayan Dictionary and Phrasebook PDF
100% (2)
Yucatec Mayan Dictionary and Phrasebook PDF
108 pages
Appel Maths For Physicists PDF
100% (1)
Appel Maths For Physicists PDF
666 pages
Monty Python's Holy Grail
100% (2)
Monty Python's Holy Grail
61 pages
Maths 1st Year Full Book
0% (1)
Maths 1st Year Full Book
3 pages
Comb Game Theory
100% (1)
Comb Game Theory
5 pages
Parameter Passing Techniques
No ratings yet
Parameter Passing Techniques
5 pages
01 Introduction To Surveying
No ratings yet
01 Introduction To Surveying
41 pages
Foundations of Ancient Greek Geometry
No ratings yet
Foundations of Ancient Greek Geometry
9 pages
Winter 2025 MATH 1LS3 C01 TSAIY11 Calculus For The Life Sciences I
No ratings yet
Winter 2025 MATH 1LS3 C01 TSAIY11 Calculus For The Life Sciences I
12 pages
Python Basics Cheat Sheet
No ratings yet
Python Basics Cheat Sheet
2 pages
QB - Mathematics Physics.
No ratings yet
QB - Mathematics Physics.
6 pages
UAV Wing FEM Analysis
No ratings yet
UAV Wing FEM Analysis
14 pages
Mathematical Optimization
No ratings yet
Mathematical Optimization
11 pages
12th Math Part I & II All Chapters Exercise Solution
No ratings yet
12th Math Part I & II All Chapters Exercise Solution
1,792 pages
(J. Fernández de Cañete, C. Galindo, J. Barbanch (B-Ok - CC)
No ratings yet
(J. Fernández de Cañete, C. Galindo, J. Barbanch (B-Ok - CC)
373 pages
Result of Your IQ Test: Synonyms
No ratings yet
Result of Your IQ Test: Synonyms
6 pages
Gefran Adl300-Dcp 3-3-14 en
No ratings yet
Gefran Adl300-Dcp 3-3-14 en
38 pages
Trigonometry Trigonometric Functions and Identities Math 11th Grade
No ratings yet
Trigonometry Trigonometric Functions and Identities Math 11th Grade
25 pages
ACS710 Datasheet
No ratings yet
ACS710 Datasheet
23 pages
Menda EdgeBafflesAutomation
No ratings yet
Menda EdgeBafflesAutomation
4 pages
Paths Over Graph Knowledge Graph Empowered LLM Reasoning
No ratings yet
Paths Over Graph Knowledge Graph Empowered LLM Reasoning
18 pages
Lab 03 - Escape Sequence, Casting and Math Function
No ratings yet
Lab 03 - Escape Sequence, Casting and Math Function
7 pages
Shift Register Basics & Applications
No ratings yet
Shift Register Basics & Applications
7 pages
DTC Unicode Programming
No ratings yet
DTC Unicode Programming
14 pages
21VD205 - Digital System Design Lab 2025
No ratings yet
21VD205 - Digital System Design Lab 2025
54 pages
To Accompany: Quantitative Analysis For Management, Tenth Edition
No ratings yet
To Accompany: Quantitative Analysis For Management, Tenth Edition
22 pages
Linear Regression and Correlation
No ratings yet
Linear Regression and Correlation
6 pages
CW Math in Digital Communications Codes PDF
No ratings yet
CW Math in Digital Communications Codes PDF
5 pages
5.4 Longest Common Subsequence Problem
No ratings yet
5.4 Longest Common Subsequence Problem
10 pages
Efbdac 49
No ratings yet
Efbdac 49
90 pages
PPL Unit-5
No ratings yet
PPL Unit-5
6 pages
Understanding Hurst's Cyclic Theory
No ratings yet
Understanding Hurst's Cyclic Theory
13 pages

Lecture15 Learning Ranking

Uploaded by

Lecture15 Learning Ranking

Uploaded by

Introduction to Information Retrieval

Lecture 15: Learning to Rank

Machine learning for IR ranking?

 Surely we can also use machine learning to rank

Machine learning for IR ranking

Machine learning for IR ranking

Why weren’t early attempts very

Why wasn’t ML much needed?

Why is ML needed now?

 … just like when we were doing text

More complex example of using

 We can generalize this to classifier functions

An SVM classifier for information

 Features are not word presence features (how

An SVM classifier for information

An SVM classifier for information retrieval

 We could do point-wise learning, where we try to

 Aim is to classify instance pairs as correctly

The Ranking SVM

 Ranking Model: f(ψi)

The Ranking SVM

 Then (combining the two equations on the last

Two queries in the original space

Two queries in the pairwise space

The Ranking SVM

 The SVM learning task is then like other

 Find w and ξu ≥ 0 such that

 We can just do the negative zu, as ordering is

 You can again use SVMlight (or other good SVM

Aside: The SVM loss function

 Now, taking λ = 1/2C, we can reformulate this as

Aside: The SVM loss function

Adapting the Ranking SVM for (successful)

The ranking SVM fails to model the IR

Based on the LETOR test

Number of training documents per

Recap: Two Problems with Direct

d: definitely relevant, p: partially relevant, n: not relevant

 Query normalization: number of instance pairs varies

These problems are solved with a new Loss

 τ weights for type of rank difference

Experimental Results (OHSUMED)

MSN Search [now Bing]

 If we think that NDCG is a good approximation

 But, there are problems …

 Objective function is flat or discontinuous

 NDCG computed using rank positions

Structural SVMs [Tsochantaridis et al.,

 An incorrect labeling would be any other ranking, e.g.,

 There are an intractable number of rankings, thus an

Structural SVM training

Original SVM Problem Structural SVM Approach

Other machine learning methods for

The Limitation of Machine Learning

 And there is every reason to think that the

You might also like