Self-Quiz Review: Information Retrieval
Self-Quiz Review: Information Retrieval
Question 1
Correct
Mark 1.00 out of 1.00
A model of information retrieval in which we can pose any query in which search terms are combined with the operators AND, OR, and
NOT:
Select one:
a. Ad Hoc Retrieval
b. Ranked Retrieval Model
c. Boolean Information Model
d. Proximity Query Model
Question 2
Correct
Mark 1.00 out of 1.00
A data structure that maps terms back to the parts of a document in which they occur is called an (select the best answer):
Select one:
a. Postings list
b. Incidence Matrix
c. Dictionary
d. Inverted Index
[Link] 1/4
10/31/24, 4:35 PM Self-Quiz Unit 1: Attempt review | Home
Question 3
Correct
Mark 1.00 out of 1.00
A process to efficiently intersect lists to be able to quickly find documents that contain both terms is referred to as merging postings
lists.
Select one:
True
False
Question 4
Correct
Mark 1.00 out of 1.00
The model of information retrieval in which we can pose any query in the form of a Boolean expression is called the ranked retrieval
model.
Select one:
True
False
Question 5
Correct
Mark 1.00 out of 1.00
The number of times that a word or term occurs in a document is called the:
Select one:
a. Proximity Operator
b. Vocabulary Lexicon
c. Term Frequency
d. Indexing Granularity
[Link] 2/4
10/31/24, 4:35 PM Self-Quiz Unit 1: Attempt review | Home
Question 6
Correct
Mark 1.00 out of 1.00
Select one:
True
False
Question 7
Correct
Mark 1.00 out of 1.00
In information retrieval, extremely common words which would appear to be of little value in helping select documents that are
excluded from the index vocabulary are called:
Select one:
a. Stop Words
b. Tokens
c. Lemmatized Words
d. Stemmed Terms
Question 8
Correct
Mark 1.00 out of 1.00
A crude heuristic process that chops off the ends of the words to reduce inflectional forms of words and reduce the size of the
vocabulary is called:
Select one:
a. Lemmatization
b. Case Folding
c. True casing
d. Stemming
[Link] 3/4
10/31/24, 4:35 PM Self-Quiz Unit 1: Attempt review | Home
Question 9
Correct
Mark 1.00 out of 1.00
An advantage of a positional index is that it reduces the asymptotic complexity of a postings intersection operation.
Select one:
True
False
Question 10
Correct
Mark 1.00 out of 1.00
An index that includes sequences of words or terms of variable length that have been extracted from a source document is called a:
Select one:
a. Phrase Index
b. Biword index
c. Positional index
d. Inverted Index
[Link] 4/4
10/31/24, 4:35 PM Self-Quiz Unit 2: Attempt review | Home
Question 1
Correct
Mark 1.00 out of 1.00
One disadvantage, as outlined in our text, of using a permuterm index for wild card queries is:
Select one:
a. It requires complex code that is difficult to maintain
b. It has the risk of key collisions which are difficult to resolve
c. The required rotations creates a very large dictionary
d. It cannot be used to find terms that are not spelled correctly
The correct answer is: The required rotations creates a very large dictionary
Question 2
Correct
Mark 1.00 out of 1.00
Select one:
a. the Jaccard Coefficient
b. Soundex algorithms
c. k-gram indexes
d. Levenshtein distance
[Link] 1/4
10/31/24, 4:35 PM Self-Quiz Unit 2: Attempt review | Home
Question 3
Correct
Mark 1.00 out of 1.00
For a very large collection of books of classic literature the most appropriate indexing algorithm would be:
Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm
c. Distributed Map-Reduce indexing algorithm
d. Dynamic indexing process employing an auxiliary index
Question 4
Correct
Mark 1.00 out of 1.00
For a large collection of documents such as the internet that experience frequent change the most appropriate indexing algorithm
would be:
Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm
c. Distributed Map-Reduce indexing algorithm
d. Dynamic indexing process employing an auxiliary index
The correct answer is: Dynamic indexing process employing an auxiliary index
Question 5
Correct
Mark 1.00 out of 1.00
Given two strings s1 and s2, the edit distance between them is sometimes known as the:
Select one:
a. Levenshtein distance
b. isolated-term distance
c. k-gram overlap
d. Jaccard Coefficient
[Link] 2/4
10/31/24, 4:35 PM Self-Quiz Unit 2: Attempt review | Home
Question 6
Correct
Mark 1.00 out of 1.00
For a moderately large collection of static documents maintained on a single system the most appropriate indexing algorithm would
be:
Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm
c. Distributed Map-Reduce indexing algorithm
d. Dynamic indexing process employing an auxiliary index
Question 7
Correct
Mark 1.00 out of 1.00
For a small collection of documents on a personal computer that don't experience any change the most appropriate indexing
algorithm would be:
Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm
c. Distributed Map-Reduce indexing algorithm
d. Dynamic indexing process employing an auxiliary index
Question 8
Correct
Mark 1.00 out of 1.00
Select one:
True
False
[Link] 3/4
10/31/24, 4:35 PM Self-Quiz Unit 2: Attempt review | Home
Question 9
Correct
Mark 1.00 out of 1.00
The size of the document collection that can be indexed by single-pass in-memory indexing algorithm is limited by the size of the disk
storage the computer running the indexer process has access to.
Select one:
True
False
[Link] 4/4
10/31/24, 4:35 PM Self-Quiz Unit 3: Attempt review | Home
Question 1
Correct
Mark 1.00 out of 1.00
The formula used to estimate the vocabulary size of a collection is known as:
Select one:
a. Zipf's law
b. Power law
c. Heap's law
d. Compression ratio
Question 2
Correct
Mark 1.00 out of 1.00
Select one:
a. Simplified algorithm design
b. Reduction of disk space
c. Faster transfer of data from disk to memory
d. Increased Use of caching
[Link] 1/3
10/31/24, 4:35 PM Self-Quiz Unit 3: Attempt review | Home
Question 3
Correct
Mark 1.00 out of 1.00
Select one:
a. zipf compression
b. dictionary compression
c. lossless compression
d. lossy compression
Question 4
Correct
Mark 1.00 out of 1.00
An approach to compression that takes advantage of the redundancy in the dictionary that results from common prefixes that come
from sorted terms is called:
Select one:
a. Front Coding
b. Blocked storage
c. Prefix Coding
d. Variable byte encoding
Question 5
Correct
Mark 1.00 out of 1.00
A disadvantage of compression is that it reduces the transfer of data from disk to memory.
Select one:
True
False
[Link] 2/3
10/31/24, 4:35 PM Self-Quiz Unit 3: Attempt review | Home
Question 6
Correct
Mark 1.00 out of 1.00
The 30 most common words account for 30% of the tokens in written text is known as front coding.
Select one:
True
False
[Link] 3/3
10/31/24, 4:35 PM Self-Quiz Unit 4: Attempt review | Home
Question 1
Correct
Mark 1.00 out of 1.00
Select one:
True
False
Question 2
Correct
Mark 1.00 out of 1.00
In the bag of words model, the exact ordering of terms within the document is both significant and relevant to processing.
Select one:
True
False
[Link] 1/4
10/31/24, 4:35 PM Self-Quiz Unit 4: Attempt review | Home
Question 3
Correct
Mark 1.00 out of 1.00
The purpose of the inverse document frequency is to increase the weight of terms with high collection frequency.
Select one:
True
False
Question 4
Correct
Mark 1.00 out of 1.00
A scheme where a weight is assigned to a term based upon the number of occurrences of the term within a document is called:
Select one:
a. Bag of Words
b. Document Frequency
c. Term Frequency
d. Optimal weight
Question 5
Correct
Mark 1.00 out of 1.00
The number of documents within a collection that contain a particular term is the collection frequency of the term.
Select one:
True
False
[Link] 2/4
10/31/24, 4:35 PM Self-Quiz Unit 4: Attempt review | Home
Question 6
Correct
Mark 1.00 out of 1.00
A metric derived by taking the log of N divided by the document frequency where N is the total number of documents in a collection is
called:
Select one:
a. document frequency
b. tf-idf weight
c. collection frequency
d. inverse document frequency
Question 7
Correct
Mark 1.00 out of 1.00
The tf-idf weight is highest when a term t occurs many times within a small number of documents.
Select one:
True
False
Question 8
Correct
Mark 1.00 out of 1.00
The tf-idf weight is lower when a term t occurs many times in a document or occurs in relatively few documents.
Select one:
True
False
[Link] 3/4
10/31/24, 4:35 PM Self-Quiz Unit 4: Attempt review | Home
Question 9
Correct
Mark 1.00 out of 1.00
A measure of similarity between two vectors which is determined by measuring the angle between them is called:
Select one:
a. cosine similarity
b. sin similarity
c. vector similarity
d. vector scoring
Question 10
Correct
Mark 1.00 out of 1.00
An index that is often supplemental to the inverted index and contains terms from only a particular field or section of a document is
called a parametric index.
Select one:
True
False
[Link] 4/4
10/31/24, 4:34 PM Self-Quiz Unit 5: Attempt review | Home
Question 1
Correct
Mark 1.00 out of 1.00
An approach to retrieval in a search that is likely (but not precisely) to produce the top K scoring documents is called:
Select one:
a. Exact top K document retrieval
b. top scoring document retrieval
c. Inexact top K document retrieval
d. Imprecise top K document retrieval
Question 2
Correct
Mark 1.00 out of 1.00
An approach to computing scores in an IR system that pre-computes for each term in the dictionary, the set of documents with the
highest weights for the term is:
Select one:
a. Champion list
b. Impact ordering
c. Cluster pruning
d. Tiered indexes
[Link] 1/3
10/31/24, 4:34 PM Self-Quiz Unit 5: Attempt review | Home
Question 3
Correct
Mark 1.00 out of 1.00
An approach to computing scores in an IR system that orders documents in the posting list of a term by decreasing order of term
frequency is called:
Select one:
a. Champion list
b. Impact ordering
c. Cluster pruning
d. Tiered indexes
Question 4
Correct
Mark 1.00 out of 1.00
An approach to computing scores in an IR system that selects a sample of documents randomly from the collection as leaders which
are in the index and links similar documents to it (followers) is called:
Select one:
a. Champion list
b. Impact ordering
c. Cluster pruning
d. Tiered indexes
Question 5
Correct
Mark 1.00 out of 1.00
Select one:
a. Document cache
b. Indexers
c. Spell correction
d. Horizontal index
[Link] 2/3
10/31/24, 4:34 PM Self-Quiz Unit 5: Attempt review | Home
Question 6
Correct
Mark 1.00 out of 1.00
Which of the following is NOT one of the types of queries in a complete search system discussed in our text?
Select one:
a. Wildcard Query
b. Boolean retrieval
c. Phrase Query
d. Ranked retrieval Query
Question 7
Correct
Mark 1.00 out of 1.00
Considering only documents containing terms whose idf exceeds a preset threshold is an index elimination.
Select one:
True
False
Question 8
Correct
Mark 1.00 out of 1.00
A scoring function that computes an aggregate of a document's relevance from multiple sources is called evidence accumulation.
Select one:
True
False
[Link] 3/3
10/31/24, 4:34 PM Self-Quiz Unit 6: Attempt review | Home
Question 1
Correct
Mark 1.00 out of 1.00
To evaluate the effectiveness of an IR system the output from a standard query executed against the test IR system is compared with
the known output from a:
Select one:
a. internet collection
b. reference book
c. separate IR system.
d. standard test collection
Question 2
Correct
Mark 1.00 out of 1.00
Select one:
True
False
[Link] 1/3
10/31/24, 4:34 PM Self-Quiz Unit 6: Attempt review | Home
Question 3
Correct
Mark 1.00 out of 1.00
Select one:
True
False
Question 4
Correct
Mark 1.00 out of 1.00
Select one:
True
False
Question 5
Correct
Mark 1.00 out of 1.00
Select one:
True
False
[Link] 2/3
10/31/24, 4:34 PM Self-Quiz Unit 6: Attempt review | Home
Question 6
Correct
Mark 1.00 out of 1.00
The purpose of the inverse document frequency is to increase the weight of terms with high collection frequency.
Select one:
True
False
Question 7
Correct
Mark 1.00 out of 1.00
The standard approach to information retrieval system evaluation involves around the notion of:
Select one:
a. Quantity of documents in the collection
b. Relevant and non relevant documents.
c. Accuracy
d. user happiness
[Link] 3/3
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home
Question 1
Correct
Mark 1.00 out of 1.00
Select one:
a. HTML
b. HTTP
c. FTP
d. Telnet
Question 2
Correct
Mark 1.00 out of 1.00
The basic operation of a web browser is to pass a request to the web server. This request is an address for a web page and is known as
the:
Select one:
a. UAL: Universal Address Locator
b. HTML: Hypertext Markup Language
c. URL: Universal Resource Locator
d. HTTP: Hypertext transfer protocol
[Link] 1/4
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home
Question 3
Correct
Mark 1.00 out of 1.00
A web page whose content doesn't vary from one request to another is called a:
Select one:
a. Text Page
b. Dynamic Page
c. Active Server Page
d. Static Page
Question 4
Correct
Mark 1.00 out of 1.00
A web link within a web page that references another part of the same page is called a:
Select one:
a. Out link
b. Vector
c. In link
d. Tendril
Question 5
Correct
Mark 1.00 out of 1.00
In the context of web search engines the manipulation of web page content for the purpose of appearing high up in search results for
selected query terms is called:
Select one:
a. Paid inclusion
b. SPAM
c. SEO
d. Link Analysis
[Link] 2/4
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home
Question 6
Correct
Mark 1.00 out of 1.00
Results from a search engine that are based upon the retrieval of items using a method of term weighting such as cosine similarity is a
form of:
Select one:
a. Sponsored Search
b. Algorithmic Search
c. Informational Search
d. Navigational Search
Question 7
Correct
Mark 1.00 out of 1.00
A program that captures and indexes content from web pages is known as what insect:
Select one:
a. Fly
b. Centipede
c. Mosquito
d. Spider
Question 8
Correct
Mark 1.00 out of 1.00
The list of web pages that a web crawler has queued up to index is called the:
Select one:
a. Web Page Queue
b. Seed set
c. URL Filter
d. URL Frontier
[Link] 3/4
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home
Question 9
Correct
Mark 1.00 out of 1.00
In order to access a particular web site in the internet, the URL must be converted into an IP address. Which service does this
conversion?
Select one:
a. HTTP
b. TNS
c. DNS
d. DHCP
[Link] 4/4
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home
Question 1
Correct
Mark 1.00 out of 1.00
Select one:
a. HTML
b. HTTP
c. FTP
d. Telnet
Question 2
Correct
Mark 1.00 out of 1.00
The basic operation of a web browser is to pass a request to the web server. This request is an address for a web page and is known as
the:
Select one:
a. UAL: Universal Address Locator
b. HTML: Hypertext Markup Language
c. URL: Universal Resource Locator
d. HTTP: Hypertext transfer protocol
[Link] 1/4
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home
Question 3
Correct
Mark 1.00 out of 1.00
A web page whose content doesn't vary from one request to another is called a:
Select one:
a. Text Page
b. Dynamic Page
c. Active Server Page
d. Static Page
Question 4
Correct
Mark 1.00 out of 1.00
A web link within a web page that references another part of the same page is called a:
Select one:
a. Out link
b. Vector
c. In link
d. Tendril
Question 5
Incorrect
Mark 0.00 out of 1.00
In the context of web search engines the manipulation of web page content for the purpose of appearing high up in search results for
selected query terms is called:
Select one:
a. Paid inclusion
b. SPAM
c. SEO
d. Link Analysis
[Link] 2/4
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home
Question 6
Correct
Mark 1.00 out of 1.00
Results from a search engine that are based upon the retrieval of items using a method of term weighting such as cosine similarity is a
form of:
Select one:
a. Sponsored Search
b. Algorithmic Search
c. Informational Search
d. Navigational Search
Question 7
Correct
Mark 1.00 out of 1.00
A program that captures and indexes content from web pages is known as what insect:
Select one:
a. Fly
b. Centipede
c. Mosquito
d. Spider
Question 8
Incorrect
Mark 0.00 out of 1.00
The list of web pages that a web crawler has queued up to index is called the:
Select one:
a. Web Page Queue
b. Seed set
c. URL Filter
d. URL Frontier
[Link] 3/4
10/31/24, 4:33 PM Self-Quiz Unit 7: Attempt review | Home
Question 9
Correct
Mark 1.00 out of 1.00
In order to access a particular web site in the internet, the URL must be converted into an IP address. Which service does this
conversion?
Select one:
a. HTTP
b. TNS
c. DNS
d. DHCP
[Link] 4/4
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 1
Correct
Mark 1.00 out of 1.00
A measure of similarity between two vectors which is determined by measuring the angle between them is called:
Select one:
a. cosine similarity
b. sin similarity
c. vector similarity
d. vector scoring
Question 2
Correct
Mark 1.00 out of 1.00
The tf-idf weight is highest when a term t occurs many times within a small number of documents.
Select one:
True
False
[Link] 1/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 3
Correct
Mark 1.00 out of 1.00
A data structure that maps terms back to the parts of a document in which they occur is called an (select the best answer):
Select one:
a. Postings list
b. Incidence Matrix
c. Dictionary
d. Inverted Index
Question 4
Incorrect
Mark 0.00 out of 1.00
The size of the document collection that can be indexed by single-pass in-memory indexing algorithm is limited by the size of the disk
storage the computer running the indexer process has access to.
Select one:
True
False
Question 5
Correct
Mark 1.00 out of 1.00
Select one:
a. Simplified algorithm design
b. Reduction of disk space
c. Faster transfer of data from disk to memory
d. Increased Use of caching
[Link] 2/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 6
Correct
Mark 1.00 out of 1.00
To evaluate the effectiveness of an IR system the output from a standard query executed against the test IR system is compared with
the known output from a:
Select one:
a. internet collection
b. reference book
c. separate IR system.
d. standard test collection
Question 7
Correct
Mark 1.00 out of 1.00
Select one:
a. Document cache
b. Indexers
c. Spell correction
d. Horizontal index
Question 8
Correct
Mark 1.00 out of 1.00
The standard approach to information retrieval system evaluation involves around the notion of:
Select one:
a. Quantity of documents in the collection
b. Relevant and non relevant documents.
c. Accuracy
d. user happiness
[Link] 3/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 9
Correct
Mark 1.00 out of 1.00
An approach to retrieval in a search that is likely (but not precisely) to produce the top K scoring documents is called:
Select one:
a. Exact top K document retrieval
b. top scoring document retrieval
c. Inexact top K document retrieval
d. Imprecise top K document retrieval
Question 10
Incorrect
Mark 0.00 out of 1.00
The number of documents within a collection that contain a particular term is the collection frequency of the term.
Select one:
True
False
Question 11
Correct
Mark 1.00 out of 1.00
The basic operation of a web browser is to pass a request to the web server. This request is an address for a web page and is known as
the:
Select one:
a. UAL: Universal Address Locator
b. HTML: Hypertext Markup Language
c. URL: Universal Resource Locator
d. HTTP: Hypertext transfer protocol
[Link] 4/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 12
Correct
Mark 1.00 out of 1.00
The list of web pages that a web crawler has queued up to index is called the:
Select one:
a. Web Page Queue
b. Seed set
c. URL Filter
d. URL Frontier
Question 13
Correct
Mark 1.00 out of 1.00
For a moderately large collection of static documents maintained on a single system the most appropriate indexing algorithm would
be:
Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm
c. Distributed Map-Reduce indexing algorithm
d. Dynamic indexing process employing an auxiliary index
Question 14
Correct
Mark 1.00 out of 1.00
Which of the following is NOT one of the types of queries in a complete search system discussed in our text?
Select one:
a. Wildcard Query
b. Boolean retrieval
c. Phrase Query
d. Ranked retrieval Query
[Link] 5/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 15
Incorrect
Mark 0.00 out of 1.00
Select one:
a. the Jaccard Coefficient
b. Soundex algorithms
c. k-gram indexes
d. Levenshtein distance
Question 16
Correct
Mark 1.00 out of 1.00
A scoring function that computes an aggregate of a document's relevance from multiple sources is called evidence accumulation.
Select one:
True
False
Question 17
Correct
Mark 1.00 out of 1.00
Given two strings s1 and s2, the edit distance between them is sometimes known as the:
Select one:
a. Levenshtein distance
b. isolated-term distance
c. k-gram overlap
d. Jaccard Coefficient
[Link] 6/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 18
Correct
Mark 1.00 out of 1.00
A scheme where a weight is assigned to a term based upon the number of occurrences of the term within a document is called:
Select one:
a. Bag of Words
b. Document Frequency
c. Term Frequency
d. Optimal weight
Question 19
Correct
Mark 1.00 out of 1.00
Select one:
True
False
Question 20
Incorrect
Mark 0.00 out of 1.00
Select one:
True
False
[Link] 7/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 21
Correct
Mark 1.00 out of 1.00
The purpose of the inverse document frequency is to increase the weight of terms with high collection frequency.
Select one:
True
False
Question 22
Correct
Mark 1.00 out of 1.00
A web page whose content doesn't vary from one request to another is called a:
Select one:
a. Text Page
b. Dynamic Page
c. Active Server Page
d. Static Page
Question 23
Correct
Mark 1.00 out of 1.00
Select one:
True
False
[Link] 8/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 24
Correct
Mark 1.00 out of 1.00
In information retrieval, extremely common words which would appear to be of little value in helping select documents that are
excluded from the index vocabulary are called:
Select one:
a. Stop Words
b. Tokens
c. Lemmatized Words
d. Stemmed Terms
Question 25
Correct
Mark 1.00 out of 1.00
Select one:
a. zipf compression
b. dictionary compression
c. lossless compression
d. lossy compression
Question 26
Correct
Mark 1.00 out of 1.00
For a large collection of documents such as the internet that experience frequent change the most appropriate indexing algorithm
would be:
Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm
c. Distributed Map-Reduce indexing algorithm
d. Dynamic indexing process employing an auxiliary index
The correct answer is: Dynamic indexing process employing an auxiliary index
[Link] 9/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 27
Correct
Mark 1.00 out of 1.00
The 30 most common words account for 30% of the tokens in written text is known as front coding.
Select one:
True
False
Question 28
Correct
Mark 1.00 out of 1.00
An index that includes sequences of words or terms of variable length that have been extracted from a source document is called a:
Select one:
a. Phrase Index
b. Biword index
c. Positional index
d. Inverted Index
Question 29
Correct
Mark 1.00 out of 1.00
Results from a search engine that are based upon the retrieval of items using a method of term weighting such as cosine similarity is a
form of:
Select one:
a. Sponsored Search
b. Algorithmic Search
c. Informational Search
d. Navigational Search
[Link] 10/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 30
Correct
Mark 1.00 out of 1.00
Select one:
True
False
Question 31
Correct
Mark 1.00 out of 1.00
A process to efficiently intersect lists to be able to quickly find documents that contain both terms is referred to as merging postings
lists.
Select one:
True
False
Question 32
Correct
Mark 1.00 out of 1.00
A model of information retrieval in which we can pose any query in which search terms are combined with the operators AND, OR, and
NOT:
Select one:
a. Ad Hoc Retrieval
b. Ranked Retrieval Model
c. Boolean Information Model
d. Proximity Query Model
[Link] 11/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 33
Correct
Mark 1.00 out of 1.00
The purpose of the inverse document frequency is to increase the weight of terms with high collection frequency.
Select one:
True
False
Question 34
Incorrect
Mark 0.00 out of 1.00
An advantage of a positional index is that it reduces the asymptotic complexity of a postings intersection operation.
Select one:
True
False
Question 35
Correct
Mark 1.00 out of 1.00
An approach to compression that takes advantage of the redundancy in the dictionary that results from common prefixes that come
from sorted terms is called:
Select one:
a. Front Coding
b. Blocked storage
c. Prefix Coding
d. Variable byte encoding
[Link] 12/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 36
Incorrect
Mark 0.00 out of 1.00
For a small collection of documents on a personal computer that don't experience any change the most appropriate indexing
algorithm would be:
Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm
c. Distributed Map-Reduce indexing algorithm
d. Dynamic indexing process employing an auxiliary index
Question 37
Incorrect
Mark 0.00 out of 1.00
An approach to computing scores in an IR system that selects a sample of documents randomly from the collection as leaders which
are in the index and links similar documents to it (followers) is called:
Select one:
a. Champion list
b. Impact ordering
c. Cluster pruning
d. Tiered indexes
Question 38
Correct
Mark 1.00 out of 1.00
An index that is often supplemental to the inverted index and contains terms from only a particular field or section of a document is
called a parametric index.
Select one:
True
False
[Link] 13/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 39
Correct
Mark 1.00 out of 1.00
The formula used to estimate the vocabulary size of a collection is known as:
Select one:
a. Zipf's law
b. Power law
c. Heap's law
d. Compression ratio
Question 40
Incorrect
Mark 0.00 out of 1.00
The tf-idf weight is lower when a term t occurs many times in a document or occurs in relatively few documents.
Select one:
True
False
Question 41
Correct
Mark 1.00 out of 1.00
The model of information retrieval in which we can pose any query in the form of a Boolean expression is called the ranked retrieval
model.
Select one:
True
False
[Link] 14/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 42
Correct
Mark 1.00 out of 1.00
The number of times that a word or term occurs in a document is called the:
Select one:
a. Proximity Operator
b. Vocabulary Lexicon
c. Term Frequency
d. Indexing Granularity
Question 43
Correct
Mark 1.00 out of 1.00
An approach to computing scores in an IR system that orders documents in the posting list of a term by decreasing order of term
frequency is called:
Select one:
a. Champion list
b. Impact ordering
c. Cluster pruning
d. Tiered indexes
Question 44
Correct
Mark 1.00 out of 1.00
Select one:
True
False
[Link] 15/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 45
Correct
Mark 1.00 out of 1.00
A crude heuristic process that chops off the ends of the words to reduce inflectional forms of words and reduce the size of the
vocabulary is called:
Select one:
a. Lemmatization
b. Case Folding
c. True casing
d. Stemming
Question 46
Correct
Mark 1.00 out of 1.00
An approach to computing scores in an IR system that pre-computes for each term in the dictionary, the set of documents with the
highest weights for the term is:
Select one:
a. Champion list
b. Impact ordering
c. Cluster pruning
d. Tiered indexes
Question 47
Correct
Mark 1.00 out of 1.00
One disadvantage, as outlined in our text, of using a permuterm index for wild card queries is:
Select one:
a. It requires complex code that is difficult to maintain
b. It has the risk of key collisions which are difficult to resolve
c. The required rotations creates a very large dictionary
d. It cannot be used to find terms that are not spelled correctly
The correct answer is: The required rotations creates a very large dictionary
[Link] 16/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 48
Correct
Mark 1.00 out of 1.00
Select one:
a. HTML
b. HTTP
c. FTP
d. Telnet
Question 49
Correct
Mark 1.00 out of 1.00
Select one:
True
False
Question 50
Correct
Mark 1.00 out of 1.00
A program that captures and indexes content from web pages is known as what insect:
Select one:
a. Fly
b. Centipede
c. Mosquito
d. Spider
[Link] 17/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 51
Correct
Mark 1.00 out of 1.00
In the bag of words model, the exact ordering of terms within the document is both significant and relevant to processing.
Select one:
True
False
Question 52
Correct
Mark 1.00 out of 1.00
A disadvantage of compression is that it reduces the transfer of data from disk to memory.
Select one:
True
False
Question 53
Correct
Mark 1.00 out of 1.00
Considering only documents containing terms whose idf exceeds a preset threshold is an index elimination.
Select one:
True
False
[Link] 18/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 54
Correct
Mark 1.00 out of 1.00
In order to access a particular web site in the internet, the URL must be converted into an IP address. Which service does this
conversion?
Select one:
a. HTTP
b. TNS
c. DNS
d. DHCP
Question 55
Incorrect
Mark 0.00 out of 1.00
In the context of web search engines the manipulation of web page content for the purpose of appearing high up in search results for
selected query terms is called:
Select one:
a. Paid inclusion
b. SPAM
c. SEO
d. Link Analysis
Question 56
Correct
Mark 1.00 out of 1.00
For a very large collection of books of classic literature the most appropriate indexing algorithm would be:
Select one:
a. Block sort-based indexing algorithm
b. Single-pass in memory indexing algorithm
c. Distributed Map-Reduce indexing algorithm
d. Dynamic indexing process employing an auxiliary index
[Link] 19/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
Question 57
Correct
Mark 1.00 out of 1.00
A metric derived by taking the log of N divided by the document frequency where N is the total number of documents in a collection is
called:
Select one:
a. document frequency
b. tf-idf weight
c. collection frequency
d. inverse document frequency
Question 58
Correct
Mark 1.00 out of 1.00
A web link within a web page that references another part of the same page is called a:
Select one:
a. Out link
b. Vector
c. In link
d. Tendril
Question 59
Correct
Mark 1.00 out of 1.00
Select one:
True
False
[Link] 20/21
10/31/24, 4:27 PM Review Quiz: Attempt review | Home
[Link] 21/21