Information Retrieval: Practice Exercises

The document summarizes the answers to 5 practice exercises on information retrieval. 1) It provides the results of calculating relevance scores for questions to the query "SQL relation" based on term frequencies. 2) It describes an efficient algorithm to find documents containing at least k keywords out of a set of n keywords using a reference count approach. 3) No answer is given for implementing PageRank with the adjacency matrix not fitting in memory. 4) It suggests indexing documents containing more specific words under more general concepts to allow retrieval by related queries. 5) No answer is given for early termination of inverted list merging when only the top K answers are needed.

Uploaded by

NUBG Gamer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views4 pages

Information Retrieval: Practice Exercises

Uploaded by

NUBG Gamer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

CHAPTER

21
Information Retrieval

Practice Exercises
21.1 Compute the relevance (using appropriate definitions of term fre-
quency and inverse document frequency) of each of the Practice Ex-
ercises in this chapter to the query “SQL relation”.
Answer: We do not consider the questions containing neither of the
keywords as their relevance to the keywords is zero. The number of
words in a question include stop words. We use the equations given
in Section 21.2 to compute relevance; the log term in the equation is
assumed to be to the base 2.

Q# #wo- # #“rela- “SQL” “relation ” “SQL” “relation ” Tota

-rds “SQL” -tion” term freq. term freq. relv. relv. relv.
1 84 1 1 0.0170 0.0170 0.0002 0.0002 0.0004
4 22 0 1 0.0000 0.0641 0.0000 0.0029 0.0029
5 46 1 1 0.0310 0.0310 0.0006 0.0006 0.0013
6 22 1 0 0.0641 0.0000 0.0029 0.0000 0.0029
7 33 1 1 0.0430 0.0430 0.0013 0.0013 0.0026
8 32 1 3 0.0443 0.1292 0.0013 0.0040 0.0054
9 77 0 1 0.0000 0.0186 0.0000 0.0002 0.0002
14 30 1 0 0.0473 0.0000 0.0015 0.0000 0.0015
15 26 1 1 0.0544 0.0544 0.0020 0.0020 0.0041

21.2 Suppose you want to find documents that contain at least k of a given
set of n keywords. Suppose also you have a keyword index that gives
you a (sorted) list of identifiers of documents that contain a specified
keyword. Give an efficient algorithm to find the desired set of docu-
ments.
Answer: Let S be a set of n keywords. An algorithm to find all docu-
ments that contain at least k of these keywords is given below :

1
2 Chapter 21 Information Retrieval

This algorithm calculates a reference count for each document identi-

fier. A reference count of i for a document identifier d means that at
least i of the keywords in S occur in the document identified by d.
The algorithm maintains a list of records, each having two fields – a
document identifier, and the reference count for this identifier. This list
is maintained sorted on the document identifier field.

initialize the list L to the empty list;

for (each keyword c in S) do
begin
D := the list of documents identifiers corresponding to c;
for (each document identifier d in D) do
if (a record R with document identifier as d is on list L) then
R.r e f er ence count := R.r e f er ence count + 1;
else begin
make a new record R;
R.document id := d;
R.r e f er ence count := 1;
add R to L;
end;
end;
for (each record R in L) do
if (R.r e f er ence count >= k) then
output R;

Note that execution of the second for statement causes the list D to
“merge” with the list L. Since the lists L and D are sorted, the time
taken for this merge is proportional to the sum of the lengths of the two
lists. Thus the algorithm runs in time (at most) proportional to n times
the sum total of the number of document identifiers corresponding to
each keyword in S.
21.3 Suggest how to implement the iterative technique for computing Page-
Rank given that the T matrix (even in adjacency list representation)
does not fit in memory.
Answer: No answer
21.4 Suggest how a document containing a word (such as “leopard”) can
be indexed such that it is efficiently retrieved by queries using a more
general concept (such as “carnivore” or “mammal”). You can assume
that the concept hierarchy is not very deep, so each concept has only
a few generalizations (a concept can, however, have a large number of
specializations). You can also assume that you are provided with a func-
tion that returns the concept for each word in a document. Also suggest
how a query using a specialized concept can retrieve documents using
a more general concept.
Answer: Add doc to index lists for more general concepts also.
Practice Exercises 3

21.5 Suppose inverted lists are maintained in blocks, with each block not-
ing the largest popularity rank and TF-IDF scores of documents in the
remaining blocks in the list. Suggest how merging of inverted lists can
stop early if the user wants only the top K answers.
Answer: For all documents whose scores are not complete use upper
bounds to compute best possible score. If K th largest completed score is
greater than the largest upper bound among incomplete scores output
the K top answers.No answer

Premium Digital Product Bundle
70% (10)
Premium Digital Product Bundle
22 pages
Agile 56031 v2
71% (35)
Agile 56031 v2
17 pages
Practice Question For Information Retrieval Subject
No ratings yet
Practice Question For Information Retrieval Subject
5 pages
CSI 4107 - Winter 2016 - Midterm
0% (1)
CSI 4107 - Winter 2016 - Midterm
10 pages
2023 Summer Model Answer Paper (Msbte Study Resources)
88% (8)
2023 Summer Model Answer Paper (Msbte Study Resources)
20 pages
dxl5000 RX
No ratings yet
dxl5000 RX
72 pages
IR END PYQ SOLS
No ratings yet
IR END PYQ SOLS
8 pages
Information Retrieval: Practice Exercises
No ratings yet
Information Retrieval: Practice Exercises
2 pages
3 Retrieval Models
No ratings yet
3 Retrieval Models
87 pages
2 Introduction To Information Retrieval
No ratings yet
2 Introduction To Information Retrieval
38 pages
L02-IR Models MMN
No ratings yet
L02-IR Models MMN
27 pages
ir
No ratings yet
ir
120 pages
Information Retrieval: Solutions To Practice Exercises
No ratings yet
Information Retrieval: Solutions To Practice Exercises
2 pages
Information Retrieval: Solutions To Practice Exercises
No ratings yet
Information Retrieval: Solutions To Practice Exercises
2 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
31 pages
ir-journal
No ratings yet
ir-journal
41 pages
IR Journal
No ratings yet
IR Journal
36 pages
IR Unit 2
No ratings yet
IR Unit 2
54 pages
inverted index-unit-3
No ratings yet
inverted index-unit-3
11 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
CS8080 INFORMATION RETRIEVAL TECHNIQUES II INTERNAL EXAMINATION - Google Forms
No ratings yet
CS8080 INFORMATION RETRIEVAL TECHNIQUES II INTERNAL EXAMINATION - Google Forms
420 pages
Relevance of A Document To A Query
No ratings yet
Relevance of A Document To A Query
10 pages
Module1PartBInformationRetrievalWebdocuments
No ratings yet
Module1PartBInformationRetrievalWebdocuments
49 pages
IR Journal (Printable)
No ratings yet
IR Journal (Printable)
20 pages
asila-IR
No ratings yet
asila-IR
16 pages
E-Commerce Data: Topic-7: Text Mining/Analytics
No ratings yet
E-Commerce Data: Topic-7: Text Mining/Analytics
37 pages
NLP-week10-IR-enc-dec-annotated_by_Ces
No ratings yet
NLP-week10-IR-enc-dec-annotated_by_Ces
83 pages
Lec2 2
No ratings yet
Lec2 2
17 pages
01 Intro
No ratings yet
01 Intro
145 pages
Term Weighting 2021
100% (2)
Term Weighting 2021
38 pages
1 Overview
No ratings yet
1 Overview
44 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Unit V Easy To Learn
No ratings yet
Unit V Easy To Learn
21 pages
IRS III Year UNIT-3 Part 1
50% (2)
IRS III Year UNIT-3 Part 1
18 pages
3 Term Weighting
No ratings yet
3 Term Weighting
34 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
27 pages
IR QB
No ratings yet
IR QB
8 pages
Text Mining
No ratings yet
Text Mining
23 pages
3 Termweighting
No ratings yet
3 Termweighting
34 pages
Prepared by : 2014510013 SEREN BOLAT 2014510043 ÖZGÜR HEPSAĞ 2014510091 ABDULSAMET İleri
No ratings yet
Prepared by : 2014510013 SEREN BOLAT 2014510043 ÖZGÜR HEPSAĞ 2014510091 ABDULSAMET İleri
16 pages
ACFrOgAhDKMNiLdAKJ27Hzg52gNTQw 5K PHitykqmtwIgd9UKTVkmihywbzrIyBvrHsHZZ9wixYTTAUoZYnERTr6vUQ Cfqlt65bXEVoMBh Ta3S1geQE-C8DUlimE
No ratings yet
ACFrOgAhDKMNiLdAKJ27Hzg52gNTQw 5K PHitykqmtwIgd9UKTVkmihywbzrIyBvrHsHZZ9wixYTTAUoZYnERTr6vUQ Cfqlt65bXEVoMBh Ta3S1geQE-C8DUlimE
2 pages
Unit 2
No ratings yet
Unit 2
58 pages
Lecture 5 - Scoring, Term Weighting, Vector Space Model - Part 1
No ratings yet
Lecture 5 - Scoring, Term Weighting, Vector Space Model - Part 1
45 pages
Chapter 4 IR Models
No ratings yet
Chapter 4 IR Models
43 pages
IR - Set 1
No ratings yet
IR - Set 1
5 pages
Chapter-3 Termweighting
No ratings yet
Chapter-3 Termweighting
17 pages
CSE508: Information Retrieval Assignment 2: Question 1 - (40 Points) Scoring and Term-Weighting
No ratings yet
CSE508: Information Retrieval Assignment 2: Question 1 - (40 Points) Scoring and Term-Weighting
3 pages
NLP SEE
No ratings yet
NLP SEE
27 pages
Hatakenaka 2011
No ratings yet
Hatakenaka 2011
6 pages
CS583 Info Retrieval
No ratings yet
CS583 Info Retrieval
34 pages
Module 3 Indexing Part A
No ratings yet
Module 3 Indexing Part A
46 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
IR Practical Theory.docx
No ratings yet
IR Practical Theory.docx
9 pages
Supervisionguide15 16 Students
No ratings yet
Supervisionguide15 16 Students
18 pages
Supervisionguide16 17 Students
No ratings yet
Supervisionguide16 17 Students
17 pages
6-Query Languages
No ratings yet
6-Query Languages
19 pages
ir
No ratings yet
ir
23 pages
IR Practical 1
No ratings yet
IR Practical 1
5 pages
Chapter 3 IR
No ratings yet
Chapter 3 IR
34 pages
Tf-Idf: David Kauchak cs160 Fall 2009
No ratings yet
Tf-Idf: David Kauchak cs160 Fall 2009
51 pages
NLP SEE
No ratings yet
NLP SEE
9 pages
Warm Up Exercise (Do by Yourself at Class)
No ratings yet
Warm Up Exercise (Do by Yourself at Class)
1 page
FOP Efficiency Indexing 13
No ratings yet
FOP Efficiency Indexing 13
22 pages
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
13s PDF
No ratings yet
13s PDF
10 pages
18s PDF
No ratings yet
18s PDF
6 pages
22s PDF
No ratings yet
22s PDF
6 pages
Distributed Databases: Practice Exercises
No ratings yet
Distributed Databases: Practice Exercises
8 pages
Data Analysis and Mining: Practice Exercises
No ratings yet
Data Analysis and Mining: Practice Exercises
4 pages
Advanced Data Types and New Applications: Practice Exercises
No ratings yet
Advanced Data Types and New Applications: Practice Exercises
6 pages
Advanced Transaction Processing: Practice Exercises
No ratings yet
Advanced Transaction Processing: Practice Exercises
4 pages
Advanced Application Development: Practice Exercises
No ratings yet
Advanced Application Development: Practice Exercises
4 pages
The Lean Agile Dilemma Product Management Inside a Chunky Corporate 1st Edition Katie Tamblin - Read the ebook online or download it to own the full content
100% (2)
The Lean Agile Dilemma Product Management Inside a Chunky Corporate 1st Edition Katie Tamblin - Read the ebook online or download it to own the full content
51 pages
Build a Robo Advisor with Python From Scratch Automate your financial and investment decisions MEAP Rob Reider instant download
100% (1)
Build a Robo Advisor with Python From Scratch Automate your financial and investment decisions MEAP Rob Reider instant download
52 pages
IF-CO VIComputerAmp NetworkSecuritySem (CO, IF) 141220181851 GBE2
No ratings yet
IF-CO VIComputerAmp NetworkSecuritySem (CO, IF) 141220181851 GBE2
9 pages
ATM033I Instructions for use of EasyLoad Parameter CD on AU480 (for internal use) V01
No ratings yet
ATM033I Instructions for use of EasyLoad Parameter CD on AU480 (for internal use) V01
14 pages
x15 Hbo Max Accounts #BRAZIL
No ratings yet
x15 Hbo Max Accounts #BRAZIL
2 pages
Cosc 3021 (Selif Test Exercise)
No ratings yet
Cosc 3021 (Selif Test Exercise)
3 pages
Dunkes Katalog Englisch
No ratings yet
Dunkes Katalog Englisch
36 pages
Assignment-1-SE
No ratings yet
Assignment-1-SE
2 pages
Software Engineering: Referred Textbook: Software Engineering: A Practitioner's Approach, 7/e, by Roger S. Pressman
No ratings yet
Software Engineering: Referred Textbook: Software Engineering: A Practitioner's Approach, 7/e, by Roger S. Pressman
21 pages
Com225 - Web Technology
No ratings yet
Com225 - Web Technology
58 pages
Ultra Compare Help
No ratings yet
Ultra Compare Help
74 pages
Running C Programs Bare Metal Arm Gnu Toolchain Foss GBG 20180926
No ratings yet
Running C Programs Bare Metal Arm Gnu Toolchain Foss GBG 20180926
118 pages
MM Mellowmuse - CP3V Compressor
No ratings yet
MM Mellowmuse - CP3V Compressor
3 pages
Silverwood Grove FAQ LR
No ratings yet
Silverwood Grove FAQ LR
12 pages
Ieee Paper N3PTU0N3
No ratings yet
Ieee Paper N3PTU0N3
12 pages
Asoc Drive by Download
No ratings yet
Asoc Drive by Download
11 pages
Gintec SurPad 4.0
No ratings yet
Gintec SurPad 4.0
2 pages
DataPreparation Outlier Treatment
No ratings yet
DataPreparation Outlier Treatment
5 pages
Dependent LOV in OAF
No ratings yet
Dependent LOV in OAF
5 pages
Lab 1 To 10 DLD
No ratings yet
Lab 1 To 10 DLD
43 pages
Project Report ASP.net
No ratings yet
Project Report ASP.net
4 pages
Operating System and Its Types
No ratings yet
Operating System and Its Types
8 pages
Comp 110 Introduction
No ratings yet
Comp 110 Introduction
11 pages
Digital Tools Rating Rubric - Canva
No ratings yet
Digital Tools Rating Rubric - Canva
3 pages
Log
No ratings yet
Log
143 pages
Ms Whitepaper Build A Foundation For AI With Integration
No ratings yet
Ms Whitepaper Build A Foundation For AI With Integration
18 pages

Information Retrieval: Practice Exercises

Uploaded by

Information Retrieval: Practice Exercises

Uploaded by

CHAPTER

Q# #wo- # #“rela- “SQL” “relation ” “SQL” “relation ” Tota

This algorithm calculates a reference count for each document identi-

initialize the list L to the empty list;

You might also like