0% found this document useful (0 votes)
78 views50 pages

Self-Quiz Review: Information Retrieval

Uploaded by

Song Benard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views50 pages

Self-Quiz Review: Information Retrieval

Uploaded by

Song Benard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

10/31/24, 4:35 PM Self-Quiz Unit 1: Attempt review | Home

Started on Friday, 6 September 2024, 5:54 AM


State Finished
Completed on Friday, 6 September 2024, 5:56 AM
Time taken 2 mins 11 secs
Grade 10.00 out of 10.00 (100%)

Question 1
Correct
Mark 1.00 out of 1.00

A model of information retrieval in which we can pose any query in which search terms are combined with the operators AND, OR, and
NOT:

Select one:
a. Ad Hoc Retrieval
b. Ranked Retrieval Model
c. Boolean Information Model 
d. Proximity Query Model

The correct answer is: Boolean Information Model

Question 2
Correct
Mark 1.00 out of 1.00

A data structure that maps terms back to the parts of a document in which they occur is called an (select the best answer):

Select one:
a. Postings list
b. Incidence Matrix
c. Dictionary
d. Inverted Index 

The correct answer is: Inverted Index

[Link] 1/4
10/31/24, 4:35 PM Self-Quiz Unit 1: Attempt review | Home

Question 3
Correct
Mark 1.00 out of 1.00

A process to efficiently intersect lists to be able to quickly find documents that contain both terms is referred to as merging postings
lists.

Select one:
True 
False

The correct answer is 'True'.

Question 4
Correct
Mark 1.00 out of 1.00

The model of information retrieval in which we can pose any query in the form of a Boolean expression is called the ranked retrieval
model.

Select one:
True
False 

The correct answer is 'False'.

Question 5
Correct
Mark 1.00 out of 1.00

The number of times that a word or term occurs in a document is called the:

Select one:
a. Proximity Operator
b. Vocabulary Lexicon
c. Term Frequency 
d. Indexing Granularity

The correct answer is: Term Frequency

[Link] 2/4
10/31/24, 4:35 PM Self-Quiz Unit 1: Attempt review | Home

Question 6
Correct
Mark 1.00 out of 1.00

Stemming increases the size of the vocabulary.

Select one:
True
False 

The correct answer is 'False'.

Question 7
Correct
Mark 1.00 out of 1.00

In information retrieval, extremely common words which would appear to be of little value in helping select documents that are
excluded from the index vocabulary are called:

Select one:
a. Stop Words 
b. Tokens
c. Lemmatized Words
d. Stemmed Terms

The correct answer is: Stop Words

Question 8
Correct
Mark 1.00 out of 1.00

A crude heuristic process that chops off the ends of the words to reduce inflectional forms of words and reduce the size of the
vocabulary is called:

Select one:
a. Lemmatization
b. Case Folding
c. True casing
d. Stemming 

The correct answer is: Stemming

[Link] 3/4
10/31/24, 4:35 PM Self-Quiz Unit 1: Attempt review | Home

Question 9
Correct
Mark 1.00 out of 1.00

An advantage of a positional index is that it reduces the asymptotic complexity of a postings intersection operation.

Select one:
True
False 

The correct answer is 'False'.

Question 10
Correct
Mark 1.00 out of 1.00

An index that includes sequences of words or terms of variable length that have been extracted from a source document is called a:

Select one:
a. Phrase Index 
b. Biword index
c. Positional index
d. Inverted Index

The correct answer is: Phrase Index

[Link] 4/4
10/31/24, 4:35 PM Self-Quiz Unit 2: Attempt review | Home

Started on Saturday, 14 September 2024, 3:48 AM


State Finished
Completed on Saturday, 14 September 2024, 3:50 AM
Time taken 2 mins 47 secs
Marks 9.00/9.00
Grade 10.00 out of 10.00 (100%)

Question 1
Correct
Mark 1.00 out of 1.00

One disadvantage, as outlined in our text, of using a permuterm index for wild card queries is:

Select one:
a. It requires complex code that is difficult to maintain
b. It has the risk of key collisions which are difficult to resolve
c. The required rotations creates a very large dictionary 
d. It cannot be used to find terms that are not spelled correctly

The correct answer is: The required rotations creates a very large dictionary

Question 2
Correct
Mark 1.00 out of 1.00

Which of the following is a technique for context sensitive spelling correction:

Select one:
a. the Jaccard Coefficient
b. Soundex algorithms 
c. k-gram indexes
d. Levenshtein distance

The correct answer is: Soundex algorithms

[Link] 1/4
10/31/24, 4:35 PM Self-Quiz Unit 2: Attempt review | Home

Question 3
Correct
Mark 1.00 out of 1.00

For a very large collection of books of classic literature the most appropriate indexing algorithm would be:

Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm
c. Distributed Map-Reduce indexing algorithm 
d. Dynamic indexing process employing an auxiliary index

The correct answer is: Distributed Map-Reduce indexing algorithm

Question 4
Correct
Mark 1.00 out of 1.00

For a large collection of documents such as the internet that experience frequent change the most appropriate indexing algorithm
would be:

Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm
c. Distributed Map-Reduce indexing algorithm
d. Dynamic indexing process employing an auxiliary index 

The correct answer is: Dynamic indexing process employing an auxiliary index

Question 5
Correct
Mark 1.00 out of 1.00

Given two strings s1 and s2, the edit distance between them is sometimes known as the:

Select one:
a. Levenshtein distance 
b. isolated-term distance
c. k-gram overlap
d. Jaccard Coefficient

The correct answer is: Levenshtein distance

[Link] 2/4
10/31/24, 4:35 PM Self-Quiz Unit 2: Attempt review | Home

Question 6
Correct
Mark 1.00 out of 1.00

For a moderately large collection of static documents maintained on a single system the most appropriate indexing algorithm would
be:

Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm 
c. Distributed Map-Reduce indexing algorithm
d. Dynamic indexing process employing an auxiliary index

The correct answer is: Single-pass in memory indexing algorithm

Question 7
Correct
Mark 1.00 out of 1.00

For a small collection of documents on a personal computer that don't experience any change the most appropriate indexing
algorithm would be:

Select one:
a. Block sort-based indexing algorithm 
b. Single-pass in memory indexing algorithm
c. Distributed Map-Reduce indexing algorithm
d. Dynamic indexing process employing an auxiliary index

The correct answer is: Block sort-based indexing algorithm

Question 8
Correct
Mark 1.00 out of 1.00

Hashing is a process where an item is reduced, through a mathematical process, to an integer.

Select one:
True 
False

The correct answer is 'True'.

[Link] 3/4
10/31/24, 4:35 PM Self-Quiz Unit 2: Attempt review | Home

Question 9
Correct
Mark 1.00 out of 1.00

The size of the document collection that can be indexed by single-pass in-memory indexing algorithm is limited by the size of the disk
storage the computer running the indexer process has access to.

Select one:
True 
False

The correct answer is 'True'.

[Link] 4/4
10/31/24, 4:35 PM Self-Quiz Unit 3: Attempt review | Home

Started on Friday, 20 September 2024, 2:21 AM


State Finished
Completed on Friday, 20 September 2024, 2:23 AM
Time taken 1 min 53 secs
Marks 6.00/6.00
Grade 10.00 out of 10.00 (100%)

Question 1
Correct
Mark 1.00 out of 1.00

The formula used to estimate the vocabulary size of a collection is known as:

Select one:
a. Zipf's law
b. Power law
c. Heap's law 
d. Compression ratio

The correct answer is: Heap's law

Question 2
Correct
Mark 1.00 out of 1.00

Which of the following is NOT a benefit of index compression?

Select one:
a. Simplified algorithm design 
b. Reduction of disk space
c. Faster transfer of data from disk to memory
d. Increased Use of caching

The correct answer is: Simplified algorithm design

[Link] 1/3
10/31/24, 4:35 PM Self-Quiz Unit 3: Attempt review | Home

Question 3
Correct
Mark 1.00 out of 1.00

A compression algorithm that results in some loss of data is called:

Select one:
a. zipf compression
b. dictionary compression
c. lossless compression
d. lossy compression 

The correct answer is: lossy compression

Question 4
Correct
Mark 1.00 out of 1.00

An approach to compression that takes advantage of the redundancy in the dictionary that results from common prefixes that come
from sorted terms is called:

Select one:
a. Front Coding 
b. Blocked storage
c. Prefix Coding
d. Variable byte encoding

The correct answer is: Front Coding

Question 5
Correct
Mark 1.00 out of 1.00

A disadvantage of compression is that it reduces the transfer of data from disk to memory.

Select one:
True
False 

The correct answer is 'False'.

[Link] 2/3
10/31/24, 4:35 PM Self-Quiz Unit 3: Attempt review | Home

Question 6
Correct
Mark 1.00 out of 1.00

The 30 most common words account for 30% of the tokens in written text is known as front coding.

Select one:
True
False 

The correct answer is 'False'.

[Link] 3/3
10/31/24, 4:35 PM Self-Quiz Unit 4: Attempt review | Home

Started on Sunday, 29 September 2024, 11:47 PM


State Finished
Completed on Sunday, 29 September 2024, 11:50 PM
Time taken 2 mins 57 secs
Grade 10.00 out of 10.00 (100%)

Question 1
Correct
Mark 1.00 out of 1.00

Weighted zone scoring is sometimes referred to as ranked Boolean retrieval.

Select one:
True 
False

The correct answer is 'True'.

Question 2
Correct
Mark 1.00 out of 1.00

In the bag of words model, the exact ordering of terms within the document is both significant and relevant to processing.

Select one:
True
False 

The correct answer is 'False'.

[Link] 1/4
10/31/24, 4:35 PM Self-Quiz Unit 4: Attempt review | Home

Question 3
Correct
Mark 1.00 out of 1.00

The purpose of the inverse document frequency is to increase the weight of terms with high collection frequency.

Select one:
True
False 

The correct answer is 'False'.

Question 4
Correct
Mark 1.00 out of 1.00

A scheme where a weight is assigned to a term based upon the number of occurrences of the term within a document is called:

Select one:
a. Bag of Words
b. Document Frequency
c. Term Frequency 
d. Optimal weight

The correct answer is: Term Frequency

Question 5
Correct
Mark 1.00 out of 1.00

The number of documents within a collection that contain a particular term is the collection frequency of the term.

Select one:
True
False 

The correct answer is 'False'.

[Link] 2/4
10/31/24, 4:35 PM Self-Quiz Unit 4: Attempt review | Home

Question 6
Correct
Mark 1.00 out of 1.00

A metric derived by taking the log of N divided by the document frequency where N is the total number of documents in a collection is
called:

Select one:
a. document frequency
b. tf-idf weight
c. collection frequency
d. inverse document frequency 

The correct answer is: inverse document frequency

Question 7
Correct
Mark 1.00 out of 1.00

The tf-idf weight is highest when a term t occurs many times within a small number of documents.

Select one:
True 
False

The correct answer is 'True'.

Question 8
Correct
Mark 1.00 out of 1.00

The tf-idf weight is lower when a term t occurs many times in a document or occurs in relatively few documents.

Select one:
True
False 

The correct answer is 'False'.

[Link] 3/4
10/31/24, 4:35 PM Self-Quiz Unit 4: Attempt review | Home

Question 9
Correct
Mark 1.00 out of 1.00

A measure of similarity between two vectors which is determined by measuring the angle between them is called:

Select one:
a. cosine similarity 
b. sin similarity
c. vector similarity
d. vector scoring

The correct answer is: cosine similarity

Question 10
Correct
Mark 1.00 out of 1.00

An index that is often supplemental to the inverted index and contains terms from only a particular field or section of a document is
called a parametric index.

Select one:
True 
False

The correct answer is 'True'.

[Link] 4/4
10/31/24, 4:34 PM Self-Quiz Unit 5: Attempt review | Home

Started on Friday, 4 October 2024, 4:21 AM


State Finished
Completed on Friday, 4 October 2024, 4:23 AM
Time taken 2 mins 13 secs
Marks 8.00/8.00
Grade 10.00 out of 10.00 (100%)

Question 1
Correct
Mark 1.00 out of 1.00

An approach to retrieval in a search that is likely (but not precisely) to produce the top K scoring documents is called:

Select one:
a. Exact top K document retrieval
b. top scoring document retrieval
c. Inexact top K document retrieval 
d. Imprecise top K document retrieval

The correct answer is: Inexact top K document retrieval

Question 2
Correct
Mark 1.00 out of 1.00

An approach to computing scores in an IR system that pre-computes for each term in the dictionary, the set of documents with the
highest weights for the term is:

Select one:
a. Champion list 
b. Impact ordering
c. Cluster pruning
d. Tiered indexes

The correct answer is: Champion list

[Link] 1/3
10/31/24, 4:34 PM Self-Quiz Unit 5: Attempt review | Home

Question 3
Correct
Mark 1.00 out of 1.00

An approach to computing scores in an IR system that orders documents in the posting list of a term by decreasing order of term
frequency is called:

Select one:
a. Champion list
b. Impact ordering 
c. Cluster pruning
d. Tiered indexes

The correct answer is: Impact ordering

Question 4
Correct
Mark 1.00 out of 1.00

An approach to computing scores in an IR system that selects a sample of documents randomly from the collection as leaders which
are in the index and links similar documents to it (followers) is called:

Select one:
a. Champion list
b. Impact ordering
c. Cluster pruning 
d. Tiered indexes

The correct answer is: Cluster pruning

Question 5
Correct
Mark 1.00 out of 1.00

Which of the following items is not a component of a complete search system?

Select one:
a. Document cache
b. Indexers
c. Spell correction
d. Horizontal index 

The correct answer is: Horizontal index

[Link] 2/3
10/31/24, 4:34 PM Self-Quiz Unit 5: Attempt review | Home

Question 6
Correct
Mark 1.00 out of 1.00

Which of the following is NOT one of the types of queries in a complete search system discussed in our text?

Select one:
a. Wildcard Query
b. Boolean retrieval
c. Phrase Query
d. Ranked retrieval Query 

The correct answer is: Ranked retrieval Query

Question 7
Correct
Mark 1.00 out of 1.00

Considering only documents containing terms whose idf exceeds a preset threshold is an index elimination.

Select one:
True 
False

The correct answer is 'True'.

Question 8
Correct
Mark 1.00 out of 1.00

A scoring function that computes an aggregate of a document's relevance from multiple sources is called evidence accumulation.

Select one:
True 
False

The correct answer is 'True'.

[Link] 3/3
10/31/24, 4:34 PM Self-Quiz Unit 6: Attempt review | Home

Started on Friday, 11 October 2024, 7:17 AM


State Finished
Completed on Friday, 11 October 2024, 7:21 AM
Time taken 3 mins 31 secs
Marks 7.00/7.00
Grade 10.00 out of 10.00 (100%)

Question 1
Correct
Mark 1.00 out of 1.00

To evaluate the effectiveness of an IR system the output from a standard query executed against the test IR system is compared with
the known output from a:

Select one:
a. internet collection
b. reference book
c. separate IR system.
d. standard test collection 

The correct answer is: standard test collection

Question 2
Correct
Mark 1.00 out of 1.00

Precision is the fraction of retrieved documents that are relevant.

Select one:
True 
False

The correct answer is 'True'.

[Link] 1/3
10/31/24, 4:34 PM Self-Quiz Unit 6: Attempt review | Home

Question 3
Correct
Mark 1.00 out of 1.00

Recall is the fraction of non relevant documents that are retrieved.

Select one:
True
False 

The correct answer is 'False'.

Question 4
Correct
Mark 1.00 out of 1.00

Accuracy is typically the most accurate measure of IR system effectiveness.

Select one:
True
False 

The correct answer is 'False'.

Question 5
Correct
Mark 1.00 out of 1.00

The F-measure is a single measure that balances precision versus recall.

Select one:
True 
False

The correct answer is 'True'.

[Link] 2/3
10/31/24, 4:34 PM Self-Quiz Unit 6: Attempt review | Home

Question 6
Correct
Mark 1.00 out of 1.00

The purpose of the inverse document frequency is to increase the weight of terms with high collection frequency.

Select one:
True
False 

The correct answer is 'False'.

Question 7
Correct
Mark 1.00 out of 1.00

The standard approach to information retrieval system evaluation involves around the notion of:

Select one:
a. Quantity of documents in the collection
b. Relevant and non relevant documents. 
c. Accuracy
d. user happiness

The correct answer is: Relevant and non relevant documents.

[Link] 3/3
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home

Started on Friday, 18 October 2024, 5:59 AM


State Finished
Completed on Friday, 18 October 2024, 6:12 AM
Time taken 12 mins 57 secs
Marks 9.00/9.00
Grade 10.00 out of 10.00 (100%)

Question 1
Correct
Mark 1.00 out of 1.00

A web server communicates with a client (browser) using which protocol:

Select one:
a. HTML
b. HTTP 
c. FTP
d. Telnet

The correct answer is: HTTP

Question 2
Correct
Mark 1.00 out of 1.00

The basic operation of a web browser is to pass a request to the web server. This request is an address for a web page and is known as
the:

Select one:
a. UAL: Universal Address Locator
b. HTML: Hypertext Markup Language
c. URL: Universal Resource Locator 
d. HTTP: Hypertext transfer protocol

The correct answer is: URL: Universal Resource Locator

[Link] 1/4
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home

Question 3
Correct
Mark 1.00 out of 1.00

A web page whose content doesn't vary from one request to another is called a:

Select one:
a. Text Page
b. Dynamic Page
c. Active Server Page
d. Static Page 

The correct answer is: Static Page

Question 4
Correct
Mark 1.00 out of 1.00

A web link within a web page that references another part of the same page is called a:

Select one:
a. Out link
b. Vector
c. In link 
d. Tendril

The correct answer is: In link

Question 5
Correct
Mark 1.00 out of 1.00

In the context of web search engines the manipulation of web page content for the purpose of appearing high up in search results for
selected query terms is called:

Select one:
a. Paid inclusion
b. SPAM 
c. SEO
d. Link Analysis

The correct answer is: SPAM

[Link] 2/4
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home

Question 6
Correct
Mark 1.00 out of 1.00

Results from a search engine that are based upon the retrieval of items using a method of term weighting such as cosine similarity is a
form of:

Select one:
a. Sponsored Search
b. Algorithmic Search 
c. Informational Search
d. Navigational Search

The correct answer is: Algorithmic Search

Question 7
Correct
Mark 1.00 out of 1.00

A program that captures and indexes content from web pages is known as what insect:

Select one:
a. Fly
b. Centipede
c. Mosquito
d. Spider 

The correct answer is: Spider

Question 8
Correct
Mark 1.00 out of 1.00

The list of web pages that a web crawler has queued up to index is called the:

Select one:
a. Web Page Queue
b. Seed set
c. URL Filter
d. URL Frontier 

The correct answer is: URL Frontier

[Link] 3/4
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home

Question 9
Correct
Mark 1.00 out of 1.00

In order to access a particular web site in the internet, the URL must be converted into an IP address. Which service does this
conversion?

Select one:
a. HTTP
b. TNS
c. DNS 
d. DHCP

The correct answer is: DNS

[Link] 4/4
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home

Started on Friday, 18 October 2024, 5:51 AM


State Finished
Completed on Friday, 18 October 2024, 5:58 AM
Time taken 6 mins 47 secs
Marks 7.00/9.00
Grade 7.78 out of 10.00 (77.78%)

Question 1
Correct
Mark 1.00 out of 1.00

A web server communicates with a client (browser) using which protocol:

Select one:
a. HTML
b. HTTP 
c. FTP
d. Telnet

The correct answer is: HTTP

Question 2
Correct
Mark 1.00 out of 1.00

The basic operation of a web browser is to pass a request to the web server. This request is an address for a web page and is known as
the:

Select one:
a. UAL: Universal Address Locator
b. HTML: Hypertext Markup Language
c. URL: Universal Resource Locator 
d. HTTP: Hypertext transfer protocol

The correct answer is: URL: Universal Resource Locator

[Link] 1/4
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home

Question 3
Correct
Mark 1.00 out of 1.00

A web page whose content doesn't vary from one request to another is called a:

Select one:
a. Text Page
b. Dynamic Page
c. Active Server Page
d. Static Page 

The correct answer is: Static Page

Question 4
Correct
Mark 1.00 out of 1.00

A web link within a web page that references another part of the same page is called a:

Select one:
a. Out link
b. Vector
c. In link 
d. Tendril

The correct answer is: In link

Question 5
Incorrect
Mark 0.00 out of 1.00

In the context of web search engines the manipulation of web page content for the purpose of appearing high up in search results for
selected query terms is called:

Select one:
a. Paid inclusion
b. SPAM
c. SEO 
d. Link Analysis

The correct answer is: SPAM

[Link] 2/4
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home

Question 6
Correct
Mark 1.00 out of 1.00

Results from a search engine that are based upon the retrieval of items using a method of term weighting such as cosine similarity is a
form of:

Select one:
a. Sponsored Search
b. Algorithmic Search 
c. Informational Search
d. Navigational Search

The correct answer is: Algorithmic Search

Question 7
Correct
Mark 1.00 out of 1.00

A program that captures and indexes content from web pages is known as what insect:

Select one:
a. Fly
b. Centipede
c. Mosquito
d. Spider 

The correct answer is: Spider

Question 8
Incorrect
Mark 0.00 out of 1.00

The list of web pages that a web crawler has queued up to index is called the:

Select one:
a. Web Page Queue
b. Seed set
c. URL Filter 
d. URL Frontier

The correct answer is: URL Frontier

[Link] 3/4
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home

Question 9
Correct
Mark 1.00 out of 1.00

In order to access a particular web site in the internet, the URL must be converted into an IP address. Which service does this
conversion?

Select one:
a. HTTP
b. TNS
c. DNS 
d. DHCP

The correct answer is: DNS

[Link] 4/4
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Started on Monday, 28 October 2024, 7:52 AM


State Finished
Completed on Monday, 28 October 2024, 8:02 AM
Time taken 9 mins 30 secs
Marks 50.00/59.00
Grade 84.75 out of 100.00

Question 1
Correct
Mark 1.00 out of 1.00

A measure of similarity between two vectors which is determined by measuring the angle between them is called:

Select one:
a. cosine similarity 
b. sin similarity
c. vector similarity
d. vector scoring

The correct answer is: cosine similarity

Question 2
Correct
Mark 1.00 out of 1.00

The tf-idf weight is highest when a term t occurs many times within a small number of documents.

Select one:
True 
False

The correct answer is 'True'.

[Link] 1/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 3
Correct
Mark 1.00 out of 1.00

A data structure that maps terms back to the parts of a document in which they occur is called an (select the best answer):

Select one:
a. Postings list
b. Incidence Matrix
c. Dictionary
d. Inverted Index 

The correct answer is: Inverted Index

Question 4
Incorrect
Mark 0.00 out of 1.00

The size of the document collection that can be indexed by single-pass in-memory indexing algorithm is limited by the size of the disk
storage the computer running the indexer process has access to.

Select one:
True
False 

The correct answer is 'True'.

Question 5
Correct
Mark 1.00 out of 1.00

Which of the following is NOT a benefit of index compression?

Select one:
a. Simplified algorithm design 
b. Reduction of disk space
c. Faster transfer of data from disk to memory
d. Increased Use of caching

The correct answer is: Simplified algorithm design

[Link] 2/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 6
Correct
Mark 1.00 out of 1.00

To evaluate the effectiveness of an IR system the output from a standard query executed against the test IR system is compared with
the known output from a:

Select one:
a. internet collection
b. reference book
c. separate IR system.
d. standard test collection 

The correct answer is: standard test collection

Question 7
Correct
Mark 1.00 out of 1.00

Which of the following items is not a component of a complete search system?

Select one:
a. Document cache
b. Indexers
c. Spell correction
d. Horizontal index 

The correct answer is: Horizontal index

Question 8
Correct
Mark 1.00 out of 1.00

The standard approach to information retrieval system evaluation involves around the notion of:

Select one:
a. Quantity of documents in the collection
b. Relevant and non relevant documents. 
c. Accuracy
d. user happiness

The correct answer is: Relevant and non relevant documents.

[Link] 3/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 9
Correct
Mark 1.00 out of 1.00

An approach to retrieval in a search that is likely (but not precisely) to produce the top K scoring documents is called:

Select one:
a. Exact top K document retrieval
b. top scoring document retrieval
c. Inexact top K document retrieval 
d. Imprecise top K document retrieval

The correct answer is: Inexact top K document retrieval

Question 10
Incorrect
Mark 0.00 out of 1.00

The number of documents within a collection that contain a particular term is the collection frequency of the term.

Select one:
True 
False

The correct answer is 'False'.

Question 11
Correct
Mark 1.00 out of 1.00

The basic operation of a web browser is to pass a request to the web server. This request is an address for a web page and is known as
the:

Select one:
a. UAL: Universal Address Locator
b. HTML: Hypertext Markup Language
c. URL: Universal Resource Locator 
d. HTTP: Hypertext transfer protocol

The correct answer is: URL: Universal Resource Locator

[Link] 4/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 12
Correct
Mark 1.00 out of 1.00

The list of web pages that a web crawler has queued up to index is called the:

Select one:
a. Web Page Queue
b. Seed set
c. URL Filter
d. URL Frontier 

The correct answer is: URL Frontier

Question 13
Correct
Mark 1.00 out of 1.00

For a moderately large collection of static documents maintained on a single system the most appropriate indexing algorithm would
be:

Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm 
c. Distributed Map-Reduce indexing algorithm
d. Dynamic indexing process employing an auxiliary index

The correct answer is: Single-pass in memory indexing algorithm

Question 14
Correct
Mark 1.00 out of 1.00

Which of the following is NOT one of the types of queries in a complete search system discussed in our text?

Select one:
a. Wildcard Query
b. Boolean retrieval
c. Phrase Query
d. Ranked retrieval Query 

The correct answer is: Ranked retrieval Query

[Link] 5/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 15
Incorrect
Mark 0.00 out of 1.00

Which of the following is a technique for context sensitive spelling correction:

Select one:
a. the Jaccard Coefficient
b. Soundex algorithms
c. k-gram indexes 
d. Levenshtein distance

The correct answer is: Soundex algorithms

Question 16
Correct
Mark 1.00 out of 1.00

A scoring function that computes an aggregate of a document's relevance from multiple sources is called evidence accumulation.

Select one:
True 
False

The correct answer is 'True'.

Question 17
Correct
Mark 1.00 out of 1.00

Given two strings s1 and s2, the edit distance between them is sometimes known as the:

Select one:
a. Levenshtein distance 
b. isolated-term distance
c. k-gram overlap
d. Jaccard Coefficient

The correct answer is: Levenshtein distance

[Link] 6/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 18
Correct
Mark 1.00 out of 1.00

A scheme where a weight is assigned to a term based upon the number of occurrences of the term within a document is called:

Select one:
a. Bag of Words
b. Document Frequency
c. Term Frequency 
d. Optimal weight

The correct answer is: Term Frequency

Question 19
Correct
Mark 1.00 out of 1.00

The F-measure is a single measure that balances precision versus recall.

Select one:
True 
False

The correct answer is 'True'.

Question 20
Incorrect
Mark 0.00 out of 1.00

Accuracy is typically the most accurate measure of IR system effectiveness.

Select one:
True 
False

The correct answer is 'False'.

[Link] 7/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 21
Correct
Mark 1.00 out of 1.00

The purpose of the inverse document frequency is to increase the weight of terms with high collection frequency.

Select one:
True
False 

The correct answer is 'False'.

Question 22
Correct
Mark 1.00 out of 1.00

A web page whose content doesn't vary from one request to another is called a:

Select one:
a. Text Page
b. Dynamic Page
c. Active Server Page
d. Static Page 

The correct answer is: Static Page

Question 23
Correct
Mark 1.00 out of 1.00

Recall is the fraction of non relevant documents that are retrieved.

Select one:
True
False 

The correct answer is 'False'.

[Link] 8/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 24
Correct
Mark 1.00 out of 1.00

In information retrieval, extremely common words which would appear to be of little value in helping select documents that are
excluded from the index vocabulary are called:

Select one:
a. Stop Words 
b. Tokens
c. Lemmatized Words
d. Stemmed Terms

The correct answer is: Stop Words

Question 25
Correct
Mark 1.00 out of 1.00

A compression algorithm that results in some loss of data is called:

Select one:
a. zipf compression
b. dictionary compression
c. lossless compression
d. lossy compression 

The correct answer is: lossy compression

Question 26
Correct
Mark 1.00 out of 1.00

For a large collection of documents such as the internet that experience frequent change the most appropriate indexing algorithm
would be:

Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm
c. Distributed Map-Reduce indexing algorithm
d. Dynamic indexing process employing an auxiliary index 

The correct answer is: Dynamic indexing process employing an auxiliary index

[Link] 9/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 27
Correct
Mark 1.00 out of 1.00

The 30 most common words account for 30% of the tokens in written text is known as front coding.

Select one:
True
False 

The correct answer is 'False'.

Question 28
Correct
Mark 1.00 out of 1.00

An index that includes sequences of words or terms of variable length that have been extracted from a source document is called a:

Select one:
a. Phrase Index 
b. Biword index
c. Positional index
d. Inverted Index

The correct answer is: Phrase Index

Question 29
Correct
Mark 1.00 out of 1.00

Results from a search engine that are based upon the retrieval of items using a method of term weighting such as cosine similarity is a
form of:

Select one:
a. Sponsored Search
b. Algorithmic Search 
c. Informational Search
d. Navigational Search

The correct answer is: Algorithmic Search

[Link] 10/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 30
Correct
Mark 1.00 out of 1.00

Weighted zone scoring is sometimes referred to as ranked Boolean retrieval.

Select one:
True 
False

The correct answer is 'True'.

Question 31
Correct
Mark 1.00 out of 1.00

A process to efficiently intersect lists to be able to quickly find documents that contain both terms is referred to as merging postings
lists.

Select one:
True 
False

The correct answer is 'True'.

Question 32
Correct
Mark 1.00 out of 1.00

A model of information retrieval in which we can pose any query in which search terms are combined with the operators AND, OR, and
NOT:

Select one:
a. Ad Hoc Retrieval
b. Ranked Retrieval Model
c. Boolean Information Model 
d. Proximity Query Model

The correct answer is: Boolean Information Model

[Link] 11/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 33
Correct
Mark 1.00 out of 1.00

The purpose of the inverse document frequency is to increase the weight of terms with high collection frequency.

Select one:
True
False 

The correct answer is 'False'.

Question 34
Incorrect
Mark 0.00 out of 1.00

An advantage of a positional index is that it reduces the asymptotic complexity of a postings intersection operation.

Select one:
True 
False

The correct answer is 'False'.

Question 35
Correct
Mark 1.00 out of 1.00

An approach to compression that takes advantage of the redundancy in the dictionary that results from common prefixes that come
from sorted terms is called:

Select one:
a. Front Coding 
b. Blocked storage
c. Prefix Coding
d. Variable byte encoding

The correct answer is: Front Coding

[Link] 12/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 36
Incorrect
Mark 0.00 out of 1.00

For a small collection of documents on a personal computer that don't experience any change the most appropriate indexing
algorithm would be:

Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm 
c. Distributed Map-Reduce indexing algorithm
d. Dynamic indexing process employing an auxiliary index

The correct answer is: Block sort-based indexing algorithm

Question 37
Incorrect
Mark 0.00 out of 1.00

An approach to computing scores in an IR system that selects a sample of documents randomly from the collection as leaders which
are in the index and links similar documents to it (followers) is called:

Select one:
a. Champion list 
b. Impact ordering
c. Cluster pruning
d. Tiered indexes

The correct answer is: Cluster pruning

Question 38
Correct
Mark 1.00 out of 1.00

An index that is often supplemental to the inverted index and contains terms from only a particular field or section of a document is
called a parametric index.

Select one:
True 
False

The correct answer is 'True'.

[Link] 13/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 39
Correct
Mark 1.00 out of 1.00

The formula used to estimate the vocabulary size of a collection is known as:

Select one:
a. Zipf's law
b. Power law
c. Heap's law 
d. Compression ratio

The correct answer is: Heap's law

Question 40
Incorrect
Mark 0.00 out of 1.00

The tf-idf weight is lower when a term t occurs many times in a document or occurs in relatively few documents.

Select one:
True 
False

The correct answer is 'False'.

Question 41
Correct
Mark 1.00 out of 1.00

The model of information retrieval in which we can pose any query in the form of a Boolean expression is called the ranked retrieval
model.

Select one:
True
False 

The correct answer is 'False'.

[Link] 14/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 42
Correct
Mark 1.00 out of 1.00

The number of times that a word or term occurs in a document is called the:

Select one:
a. Proximity Operator
b. Vocabulary Lexicon
c. Term Frequency 
d. Indexing Granularity

The correct answer is: Term Frequency

Question 43
Correct
Mark 1.00 out of 1.00

An approach to computing scores in an IR system that orders documents in the posting list of a term by decreasing order of term
frequency is called:

Select one:
a. Champion list
b. Impact ordering 
c. Cluster pruning
d. Tiered indexes

The correct answer is: Impact ordering

Question 44
Correct
Mark 1.00 out of 1.00

Hashing is a process where an item is reduced, through a mathematical process, to an integer.

Select one:
True 
False

The correct answer is 'True'.

[Link] 15/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 45
Correct
Mark 1.00 out of 1.00

A crude heuristic process that chops off the ends of the words to reduce inflectional forms of words and reduce the size of the
vocabulary is called:

Select one:
a. Lemmatization
b. Case Folding
c. True casing
d. Stemming 

The correct answer is: Stemming

Question 46
Correct
Mark 1.00 out of 1.00

An approach to computing scores in an IR system that pre-computes for each term in the dictionary, the set of documents with the
highest weights for the term is:

Select one:
a. Champion list 
b. Impact ordering
c. Cluster pruning
d. Tiered indexes

The correct answer is: Champion list

Question 47
Correct
Mark 1.00 out of 1.00

One disadvantage, as outlined in our text, of using a permuterm index for wild card queries is:

Select one:
a. It requires complex code that is difficult to maintain
b. It has the risk of key collisions which are difficult to resolve
c. The required rotations creates a very large dictionary 
d. It cannot be used to find terms that are not spelled correctly

The correct answer is: The required rotations creates a very large dictionary

[Link] 16/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 48
Correct
Mark 1.00 out of 1.00

A web server communicates with a client (browser) using which protocol:

Select one:
a. HTML
b. HTTP 
c. FTP
d. Telnet

The correct answer is: HTTP

Question 49
Correct
Mark 1.00 out of 1.00

Stemming increases the size of the vocabulary.

Select one:
True
False 

The correct answer is 'False'.

Question 50
Correct
Mark 1.00 out of 1.00

A program that captures and indexes content from web pages is known as what insect:

Select one:
a. Fly
b. Centipede
c. Mosquito
d. Spider 

The correct answer is: Spider

[Link] 17/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 51
Correct
Mark 1.00 out of 1.00

In the bag of words model, the exact ordering of terms within the document is both significant and relevant to processing.

Select one:
True
False 

The correct answer is 'False'.

Question 52
Correct
Mark 1.00 out of 1.00

A disadvantage of compression is that it reduces the transfer of data from disk to memory.

Select one:
True
False 

The correct answer is 'False'.

Question 53
Correct
Mark 1.00 out of 1.00

Considering only documents containing terms whose idf exceeds a preset threshold is an index elimination.

Select one:
True 
False

The correct answer is 'True'.

[Link] 18/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 54
Correct
Mark 1.00 out of 1.00

In order to access a particular web site in the internet, the URL must be converted into an IP address. Which service does this
conversion?

Select one:
a. HTTP
b. TNS
c. DNS 
d. DHCP

The correct answer is: DNS

Question 55
Incorrect
Mark 0.00 out of 1.00

In the context of web search engines the manipulation of web page content for the purpose of appearing high up in search results for
selected query terms is called:

Select one:
a. Paid inclusion
b. SPAM
c. SEO 
d. Link Analysis

The correct answer is: SPAM

Question 56
Correct
Mark 1.00 out of 1.00

For a very large collection of books of classic literature the most appropriate indexing algorithm would be:

Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm
c. Distributed Map-Reduce indexing algorithm 
d. Dynamic indexing process employing an auxiliary index

The correct answer is: Distributed Map-Reduce indexing algorithm

[Link] 19/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

Question 57
Correct
Mark 1.00 out of 1.00

A metric derived by taking the log of N divided by the document frequency where N is the total number of documents in a collection is
called:

Select one:
a. document frequency
b. tf-idf weight
c. collection frequency
d. inverse document frequency 

The correct answer is: inverse document frequency

Question 58
Correct
Mark 1.00 out of 1.00

A web link within a web page that references another part of the same page is called a:

Select one:
a. Out link
b. Vector
c. In link 
d. Tendril

The correct answer is: In link

Question 59
Correct
Mark 1.00 out of 1.00

Precision is the fraction of retrieved documents that are relevant.

Select one:
True 
False

The correct answer is 'True'.

[Link] 20/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home

[Link] 21/21

You might also like