0% found this document useful (0 votes)

182 views93 pages

Information Retrieval MCQ

This document is a question bank for a course on Information Retrieval (IR) containing multiple-choice questions covering various topics such as infrared spectroscopy, indexing algorithms, and evaluation metrics in IR systems. It includes questions on concepts like Boolean retrieval, document frequency, and collaborative filtering. The questions are designed to test knowledge on both theoretical and practical aspects of information retrieval systems.

Uploaded by

alexaussie2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

182 views93 pages

Information Retrieval MCQ

Uploaded by

alexaussie2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 93

SEM: VI CLASS: TYCS SUB: INFORMATION

RETRIEVAL (IR)
Multiple Choice Questions (Question Bank)

1) Which of the following is not a source used in Mid Infrared Spectrophotometer?

a) Nernst glower
b) High pressure mercury arc lamp
c) Globar
d) Nichrome wire.
2) Which of the following is the wave number of near infrared
spectrometer? a) 4000 – 200 cm-1
b) 200 – 10 cm-1
c) 12500 – 4000 cm-1
d) 50 – 1000 cm-1.
3) Which of the following is not a composition of Nernst glower or Nernst filament?
a) Oxide
s of
Zirconiu
m b)
Oxides
of
Barium
c) Oxides of Yitrium
d) Oxides of Thorium
4) What is the composition of Globar rod which is used as a source in Mid IR
spectroscopy?
a) Silicon carbide
b) Silver chloride
c) Silicon dioxide
d) Silver carbide
5) Bolometer, a type of detector, is also known as
a) Resistance temperature detector (RTD)
b) Thermistor
c) Thermocouple
d) Golay cell
6) Which of the following is not used as pyroelectric material used in pyroelectric
transducers in Infrared spectroscopy?
a) Triglycine Sulphate
b) Deutrated Triglycine Sulphate
c) Some Polymers
d) Tetraglycine sulphate
7) A model of information retrieval in which we can pose any query in which
search terms are combined with the operators AND, OR, and NOT
a) Ad Hoc Retrieva
b) Ranked Retrieval Model
c) Boolean Information Model
d) Proximity Query Model
8) A data structure that maps terms back to the parts of a document in which they occur
is called an
a) Postings list
b) Incidence Matrix
c) Dictionary
d) Inverted Index
9) Stemming increases the size of the
vocabulary True
False
10) In information retrieval, extremely common words which would appear to be of little value in
helping select documents that are excluded from the index vocabulary are called:
a) Stop Words
b) Tokens
c) Lemmatized Words
d) Stemmed Terms
11) A crude heuristic process that chops off the ends of the words to reduce inflectional forms of words
and reduce the size of the vocabulary is calle
a) Lemmatization
b) Case Folding
c) True casing
d) Stemming
12) Which of the following is a technique for context sensitive spelling correction
a) the Jaccard Coefficient
b) Soundex algorithms
c) k-gram indexes
d) Levenshtein distance
13) For a very large collection of books of classic literature the most appropriate indexing algorithm
would be
a) Block sort-based indexing algorithm
b) Single-pass in memory indexing algorithm
c) Distributed Map-Reduce indexing algorithm
d) Dynamic indexing process employing an auxiliary index
14) An index that includes sequences of words or terms of variable length that have been extracted from
a source document is called a
a) Phrase Index
b) Biword index
c) Positional index
d) Inverted Index
15) For a large collection of documents such as the internet that experience frequent change the most
appropriate indexing algorithm would be

a) Block sort-based indexing algorithm

b) Single-pass in memory indexing algorithm
c) Distributed Map-Reduce indexing algorithm
d) Dynamic indexing process employing an auxiliary index
15) Hashing is a process where an item is reduced, through a mathematical process, to an
integer. True
False
16) The formula used to estimate the vocabulary size of a collection is known as:
a) Zipf's law
b) Power law
c) Heap's law
d) Compression ratio
17) An approach to compression that takes advantage of the redundancy in the dictionary that results
from common prefixes that come from sorted terms is called:
a) Front Coding
b) Blocked storage
c) Prefix Coding
d) Variable byte encoding
18) A scheme where a weight is assigned to a term based upon the number of occurrences of the term
within a document is called
a) Bag of Words
b) Document Frequency
c) Term Frequency
d) Optimal weight
19) A measure of similarity between two vectors which is determined by measuring the angle between
them is called:
a) Cosine similarity
b) Sin similarity
c) Vector similarity
d) Vector scoring
20) A group of related documents against which information retrieval is employed is called:
a) Corpus
b) Text Database
c) Index Collection
d) Repository
21) A metric derived by taking the log of N divided by the document frequency where N is the total
number of documents in a collection is called
a) document frequency
b) tf-idf weight
c) collection frequency
d) inverse document frequency
22) A web page whose content doesn't vary from one request to another is called as:
a) Text Page
b) Dynamic Page
c) Active Server Page
d) Static Page
23) A program that captures and indexes content from web pages is known as what insect:
a) Fly
b) Centipede
c) Mosquito
d) Spider
24) To evaluate the effectiveness of an IR system the output from a standard query executed against the
test IR system is compared with the known output from a:
a) internet collection
b) reference book
c) separate IR system.
d) standard test collection
25) Which of the following is NOT one of the types of queries in a complete search system discussed in
our text?
a) Wildcard Query
b) Boolean retrieval
c) Phrase Query
d) Ranked retrieval Query

26) The standard approach to information retrieval system evaluation involves around the notion of:

a) Quantity of documents in the collection

b) Relevant and non relevant documents.

c) Accuracy

d) user happiness

27) Which of the following items is not a component of a complete search system?

a) Document cache

b) Indexers

c) Spell correction

d) Horizontal index

28) An approach to computing scores in an IR system that orders documents in the posting list of a term
by decreasing order of term frequency is called:

a) Champion list

b) Impact ordering

c) Cluster pruning

d) Tiered indexes

29) A web link within a web page that ref erences another part of the same page is called a:

a) Out link

b) Vector

c) In link

d) Tendril

30) Information retrieval is querying of textual data.

a) structured

b) unstructured

c) Formatted

d) None

31) The number of documents in the collection that contain a term t is called as
a) Document Index dit

b) Document frequency dft

c) Document Inverse dint

d) Document Incidence Matrix dimt

32) CPM stands for

a) Cost per migrating

b) Cost per making

c) Cost per manage

d) Cost per mil

33) fraction of the returned results are relevant to the information need.

a) Proximity

b) Posting Merge

c) Precision

d) Posting list

34) A dictionary of terms is sometime also referred as

a) Corpus

b) Collection

c) Lexicon

d) None of the above

35) SEO stands for

a) Search engine order

b) Search engine organizer

c) Search engine option

d) Search engine optimization

36) filtering recommends products which are similar to the ones that a user has liked in the past.

a) Collaborative based

b) Context based

c) Collection based
d) Content based

37) is the fraction of the relevant documents in the collection returned by the system.

a) Reconnect

b) Recall

c) Reciprocal

d) Retrieved

38) is a page that contains actual information on a topic.

a) Authority

b) Hub

c) Hyperlinks

d) Image

39) Given two strings s1 and s2, the edit distance between them is sometimes known as the

a) Levenshtein distance

b) Isolated-term distance

c) k-gram overlap

d) Jaccard Coefficient

40) Hadoop is a framework that works with a variety of related tools. Common cohorts include

a) MapReduce, Hive and HBase

b) MapReduce, MySQL and Google Apps

c) MapReduce, Hummer and Iguana

d) MapReduce, Heron and Trumpet

41) The purpose of the inverse document frequency is to increase the weight of terms with high
collection frequency

a) True

b) False

42) The basic operation of a web browser is to pass a request to the web server. This request is an
address for a web page and is known as the

a) UAL: Universal Address Locator

b) HTML: Hypertext Markup Language

c) URL: Universal Resource Locator

d) HTTP: Hypertext transfer protocol

43) Collaborative Filtering has following problems

a) Cold Start

b) Scalability

c) Sparsity

d) All of the above

44) Input, Purpose and Output are the factors of .

a) Summarization

b) Question Answering

c) Page Rank

d) Personalized Search

45) Information retrieval systems have much in common with

a) Filing systems

b) Transaction systems

c) Database systems

d) Management systems

46) A deadlock can be broken down by

a) Committing one or more transactions

b) Aborting one or more transactions

c) Rolling back one or more transactions

d) Terminating one or more transactions

47) Which one of the following is not Test Collection and Evaluation Series

a) Text Retrieval Conference (TREC)

b) NII Test Collections for IR Systems (NTCIR)

c) Cross Language Evaluation Forum(CLEF)

d) Collaborative Filtering
48) Information is

a) Data

b) Processed Data

c) Manipulated input

d) Computer output

49) Online transaction processing is used because

a) Disk is used for storing files

b) It is efficient

c) It can handle random queries.

d) Transactions occur in batches

50) The quality of information which is based on understanding user needs

a) Complete

b) Trustworthy

c) Relevant

d) None of the above

51) The primary storage medium for storing archival data is

a) Floppy disk

b) Magnetic disk

c) Magnetic tape

d) CD- ROM

53) Organizations have hierarchical structures because

a) It is convenient to do so

b) It is done by every organization

c) Specific responsibilities can be assigned for each level

d) It provides opportunities for promotions

54) Operational information is

a) Haphazard

b) Well organized
c) Unstructured

d) Partly structured

55) Operational information is needed for

a) Day to day operations

b) Meet government requirements

c) Long range planning

d) Short range planning

56) Data by itself is not useful unless

a) It is massive

b) It is processed to obtain information

c) It is collected from diverse sources

d) It is properly stated

57) For taking decisions data must be

a) Very accurate

b) Massive

c) Processed correctly

d) Collected from diverse sources

58) One of the application of Personalized Search is,

a) Google

b) Yahoo

c) IBM

d) Alpha Search Engine

59) Boolean retrieval model does not provide provision for:

a) Ranked search

b) Proximity search

c) Phrase search

d) Both proximity and ranked search

60) Which is a good idea for using skip pointers?

a) Fewer skips, larger skip spans

b) None

c) Depends upon the no. of comparisons needed

d) More skips, shorter skip spans

70) Edit distance (Levenshtein distance) is a way of:

a) Context-sensitive spelling correction

b) Document correction

c) Isolated word correction

d) Phonetic correction
71) Permuterm indices are used for solving:
a) None
b) Boolean queries
c) Phrase queries
d) Wildcard queries
72) Benefits of using a hash table is
a) Do not need to rehash everything periodically if vocabulary keeps growing.
b) Lookup in a hash table is faster than lookup in a tree.
c) All of the above
d) No prefix search is required
73) Variable-size postings lists is used when:
a) More seek time is desired and the corpus is dynamic
b) Less seek time is desired and the corpus is dynamic
c) No seek time is desired and the corpus is static

d) Time is desired and the corpus is dynamic

74) Unstructured data tends to refer to information on the web and is processed using:

a) Both

b) Database systems

c) IR systems

D) None

75) If list lengths are x and y, merge takes:

a) O(Yn) operations

b) O(xy) operations

c) O(xn) operations

d) O(x+y) operations

76) Term-document incidence matrix is:

a) Sparse

b) Depends upon the data

c) Dense

d) Cannot predict

77) Blocked sort-based Indexing is a method of:

a) Sorting with more disk seeks.

b) Merging with fewer disk seeks.

c) Comparing with fewer disk seeks.

d) Sorting with fewer disk seeks.

78) Issues in biword indexes are:

a) Any one

b) Index blowup due to bigger dictionary

c) Both

d) False positives

79) Best implementation approach for dynamic indexing is:

a) Periodic re-indexing

b) Using Invalidation bit-vector for deleted docs

c) None

d) Using logarithmic merge

80) The goal of IR is to:

a) Find documents relevant to an information need

b) Find documents relevant to an information need from a given document set

c) Find documents relevant to an information need from a large document set

d) Find documents relevant to an information need from a small document set

81) For postings of length L, no. of skip pointers required are:

a) Use L evenly-spaced skip pointer

b) Use L^2 evenly-spaced skip pointers.

c) Use L^1/2 evenly-spaced skip pointers

d) Use 2L evenly-spaced skip pointers.

82) Postings list should be sorted by:

a) Document Frequency

b) DocID

c) TermID

d) Term frequency

83) Benefits of using B-trees:

a) Re-balancing is cheap

b) Balanced trees allow efficient retrieval

c) Faster O(log M)

d) Solves the prefix problem

84) For ad hoc information ret is/are the test collectionsrieval system

evaluation.

a) Cranﬁeld

b) TREC

c) Only a

d) Both a and b

85) The basic formula for paid placement is

a) Pay-per-click ($) = Advertising cost ($) ÷ Ads clicked (#)

b) Pay-per-click ($) = Advertising cost ($) * Ads clicked (#)

c) Pay-per-click ($) = Advertising cost ($) * Ads clicked (#)

d) Both a and b

86) Every web page is assigned score(s).

a) 1

b) 2

c) 4

d) 3

87) maintains the file system tree and the metadata for all the files and

directories present in the system.

a) Namenode

b) Datanode

c) Mapper

d) Tracker
88) nodes that can be reached from the giant SCC but cannot reach it.
a) In
b) Out
c) Gcc
d) in-out
89) The first special index for general wild card queries is the .
a) k-term index
b) Permuterm index
c) B-tree
d) Hashes
90) mainly encodes numerical and non-text attribute-value data.
a) Data centric XML
b) Text centric XML
c) Both a and b
d) User centric XML

91) Permuterm indexes are used for solving

a) Spelling Checking

b) Boolean queries

c) Phrase queries

d) Wildcard queries
92) A query such as mon* is known as a

a) Trailing wildcard query

b) Leading wildcard query

c) Both a and b

d) Mixed wildcard query

93) CLEF stands for

a) Cross Language Evaluation Forum

b) Cross lingual evaluating field

c) Cross Language Evaluating Field

d) Cross Language Evaluating Forum

94) Precision (P) is the fraction of

a) P(retrieved/relevant)

b) P(relevant/true)

c) P(relevant/retrieved)

d) P(retrieved/true)

95) Each node of the tree is an XML element and is written with an

a) Opening tag

b) Closing tag

c) Both a and b

d) Only a

96) is not the Basic Ranking Models of information retrieval system.

a) Boolean Retreival

b) Vector Space model

c) Probabilistic model

d) Data model

97) A good page for a topic links to many authority pages for that topic.
a) Crawler

b) SEO

c) Web

d) Hub

98) is the number of documents contains the term.

a) Term

b) Df

c) Idf

d) Inverse df

99) includes link building, increasing link popularity by submitting open

directories, search engines, link exchange, etc.

a) Off Page SEO

b) In Page SEO

c) Middle Page SEO

d) Both a and b

100) In information retrieval, extremely common words which would appear to be of little value in
helping select documents that are excluded from the index vocabulary are called:

a) Stop Words

b) Tokens

c) Lemmatized Words

d) Stemmed Terms

101) Document frequency of a term is the

a) Number of documents that contain the term

b) None of the above

c) Number of times the term appears in the document

d) Number of times the term appears in the collection

102) Boolean queries often result in

a) Too many or too few results

b) None of the above

c) Too few results

d) Too many results

103) Ranked retrieval models take as input

a) None of the given

b) Boolean queries

c) Logical queries

d) Free text queries

104) What is contiguity hypothesis in vector space classification

a) Documents from different classes don’t overlap

b) Documents in the same class form a contiguous region of space

c) All of the above.

d) Intra cluster similarity is higher than inter-cluster similarity

105) Information is

a) Data

b) Processed Data

c) Manipulated input

d) Computer output

106) Strategic information is needed for

a) Day to day operations

b) Meet government requirements

c) Long range planning

d) Short range planning

107) Strategic information is required by

a) Middle managers

b) Line managers
c) Top managers

d) All workers

108) Tactical information is needed for

a) Day to day operations

b) Meet government requirements

c) Long range planning

d) Short range planning

109) The is a wild card that represents one or more characters

a) Question mark

b) Asterisk

c) Exclamation mark

d) Dollar sign

110) The Search tool is best used when searching for which kind of data.

a) Simple

b) Multiple

c) Unique

d) Formatted

111) Given a document collection which has 35 relevant documents, if an IR system retrieves 10 relevant
and 13 irrelevant documents, what is the precision value of the system?

a) 0.43

b) 0.28

c) 0.33

d) 0.66

112) If the two postings list are of length X and Y , then maximum number of operations needed for
merge is

a) Max(X, Y)

b) X+Y

c) X*Y
d) Min(X, Y)

113) A computer based information system is needed because

(i) The size of organization have become large and data is massive

(ii) Timely decisions are to be taken based on available data

(iii) Computers are available

(iv) Difficult to get clerks to process data

a) (ii) and (iii)

b) (i) and (ii)

c) (i) and (iv)

d) (iii) and (iv)

114) Measures of Similarity are as Follows :

i. The lengths of the Documents.

ii. The number of terms in common.

iii. Whether the terms are common or unusual.

iv. How many times each term appears.

a) i) & ii)

b) ii) & iii)

c) iii) & iv)

d) i), ii), iii) & iv)

115) Proximity operator is a way of specifying that

a) Two terms in a query must occur close to each other in a document

b) Two terms in a query must occur in between in a document

c) Two terms in a query must occur close to each other in a document

d) None of the above

116) is the task of chopping documents into the pieces.

a) Ranked

b) Wild card
c) Tokenization

d) Boolean retrieval

117) A is the class of all tokens containing the same character sequence.

a) Term

b) Token

c) Type

d) Sequence

118) The DOM represents

a) Elements

b) Attributes

c) Text

d) All of the above

119) Data-centric XML mainly encodes

a) Numerical

b) Non text attribute value data

c) Both a and b

d) None of the above

120) XML document retrieval is characterized by

a) Long text field

b) Inexact matching

c) Relevance -ranked results

d) All a, b and c

121) One disadvantage, as outlined in our text, of using a permuterm index for wild card queries is:

a) It requires complex code that is difficult to maintain

b) It has the risk of key collisions which are difficult to resolve

c) The required rotations creates a very large dictionary

d) It cannot be used to find terms that are not spelled correctly

122) Which of the following is NOT a benefit of index compression?

a) Simplified algorithm design

b) Reduction of disk space

c) Faster transfer of data from disk to memory

d) Increased Use of caching

123) Which is not an option for Filter on a text field

a) Begins With

b) Between

c) Contains

d) End With

124) Which major database object stores all data

a) Field

b) Query

c) Record

d) Table

125) Given a document containing the sentence “I left my left bag at my home” the number of tokens in
the sentence is

a) 2

b) 8

c) 6

d) 4

126) Phrase queries can be solved using N-grams.

True

False

127) When Lemmatization is applied to the term “Destruction” to which of the following form it gets
reduced?

a) Destination

b) Destruct
c) Destroy

d) Destruc

128) What is the soundex code for the term “amazing”?

a) A552

b) A252

c) A525

d) A255

128) Hashing is a process where an item is reduced, through a mathematical process, to an integer.

True

False

129) A compression algorithm that results in some loss of data is called:

a) Zipf compression

b) Dictionary compression

c) Lossless compression

d) Lossy compression

130) The 30 most common words account for 30% of the tokens in written text is known as front coding.

True

False

131) An approach to retrieval in a search that is likely (but not precisely) to produce the top K
scoring documents is called:

a) Exact top K document retrieval

b) Top scoring document retrieval

c) Inexact top K document retrieval

d) Imprecise top K document retrieval

132) Recall is the fraction of non relevant documents that are retrieved.

True

False
133) In the context of web search engines the manipulation of web page content for the purpose of
appearing high up in search results for selected query terms is called:

a) Paid inclusion

b) SPAM

c) SEO

d) Link Analysis

134) Results from a search engine that are based upon the retrieval of items using a method of
term weighting such as cosine similarity is a form of

a) Sponsored Search

b) Algorithmic Search

c) Informational Search

d) Navigational Search

135) The list of web pages that a web crawler has queued up to index is called the:

a) Web Page Queue

b) Seed set

c) URL Filter

d) URL Frontier

136) In order to access a particular web site in the internet, the URL must be converted into an
IP address. Which service does this conversion?

a) HTTP

b) TNS

c) DNS

d) DHCP

137) The Search tool CANNOT be used on which major Access object

a) Forms

b) Queries

c) Reports

d) Tables
138) CLEF stands for

a) Cross Language Evaluation Forum

b) Cross lingual evaluating field

c) Cross Language Evaluating Field

d) Cross Language Evaluating Forum

139) Which of the following is not a technique for preparing solid samples in IR spectroscopy?

a) Solids run in solution

b) Mull technique

c) Solid films

d) Thin films

140) Which of the following is the principle of Golay cell which is used as a detector in IR spectroscopy?

a) Expansion of gas upon heating

b) Increase in resistance due to an increase in temperature and vice versa

c) Temperature difference gives rise to a potential difference in the material

d) Decrease in resistance due to an increase in temperature

141) For a moderately large collection of static documents maintained on a single system the most
appropriate indexing algorithm would be:

a) Block sort-based indexing algorithm

b) Single-pass in memory indexing algorithm

c) Distributed Map-Reduce indexing algorithm

d) Dynamic indexing process employing an auxiliary index

142) Weighted zone scoring is sometimes referred to as ranked Boolean retrieval.

True

False

143) An approach to computing scores in an IR system that orders documents in the posting list of a term
by decreasing order of term frequency is called:

a) Champion list

b) Impact ordering
c) Cluster pruning

d) Tiered indexes

144) The process where multiple lists are evaluated using AND or OR operators in a Boolean retrieval
query is called an intersection operation.

True

False

145) Which of the following applications are used in IR

a) Indexing

b) Ranked retrieval

c) Web search

d) All of the above

146) The Components of IR are

a) The Ser-system interface

b) The matching subsystem

c) Both a and b

d) None of them.

147) The function of Information Retrieval is

a) To make necessary adjustment in the system based on feedback

b) The human- computer interface

c) Computer Vision

d) Cognitive Theory.

148) Arrange the following in sequence

a) Archie , web crawler , Google , wiseNut

b) Archie , google, wiseNut, web crawler

c) Google, Archie, web crawler, wiseNut

d) WiseNut, google, Archie, web crawler

149) Web can be characterised by

a) Search engines

b) Web directories

c) Hyperlink search

d) All of the above

150) SEO stands for

a) System effect off

b) Search engine optimization

c) Search effect optimization

d) System engine off

151) What is direct addressing

a) Distinct array position for every possible key

b) Fewer array position than keys

c) Fewer keys than array positions

d) None of the mentioned

152) What can be the technique to avoid collision ?

a) Make the hash function appear random

b) Use the chaining method

c) Use uniform hashing

d) All of the mentioned

153) What is a hash function ?

a) A function has allocated memory to keys

b) A function that computes the location of the key in the array

c) A function that creates an array

d) None of the mentioned

154) A document is if it is one that the user perceives as containing information of value with
respect to their need.

a) Query
b) Relevant

c) Adhoc

d) Irrelevant

155) An need is the topic about which the user desires ti know more.

a) Information

b) Relevant

c) Statistical

d) None of the above

156) A search tree commonly used for a dictionary is the .

a) Subtrees

b) B-tree

c) Interval tree

d) Web tree

157) The best known search tree is in which each internal node has two children.

a) Balanced tree

b) Unbalanced tree

c) Internal Node

d) Binary tree

158) is used to communicate with web servers on the internet , which enables it to download and
display the web pages.

a) Web server

b) Search service

c) Web browser

d) None of the above

159) is finding material of an unstructured nature that satisfies an info need from large collection.

a) Adhoc query

b) Information retrieval
c) Conflation

d) Stemming

160) The core indexing step is the list so that the terms are arranged alphabetically.

a) Grouped

b) Normalized

c) Sorting

d) Recording

161) A search value can be an exact value or it can be

a) Logical operator

c) Relationship

c) Wild card character

d) Comparison operation

162) Instances of same term are grouped and the result is splint into

a) Classes

b) Columns

c) Both a and b

d) Dictionary

163) The operation is efficient so that we can quickly find the documents.

a) Intersection

b) Minus

c) Union

d) Matrix

164) Model is an algebraic model for representing text documents as vectors of identifiers.

a) Index
b) Sorting
c) Relevant
d) Vector Space

165) In web search , the vocabulary size keeps .

a) Constant

b) Reducing

c) Fluctuating

d) Growing

166) A function may become insufficient after several years.

a) Variant

b) Hash

c) B-tree

d) Primitive

167) Term is the number of times a term occurs in document.

a) Relevant

b) Lists

c) Accumulate

d) Frequency

168) The different types of queries used by the user

a) Informational query
b) Transactional query
c) Navigational query
d) All of the above

169) Given two engines A and B are given then the size of union may be

estimated a) |AUB|= |A|+|B|+|A-B|

b) |AUB|= |A|+|B|-|A ꓵ B|
c) | AUB|= |A|-|B|+|A ꓵ B|
d) |AUB|= |A|-|B|-|A ꓵ B|
170) To process queries from users as quickly as possible is called

a) Speed
b) Quality
c) Interface
d) Query processor
171) The relationship between sites and pages indicated by hyperlinks gives rise to

a) Static page
b) Dynamic page
c) Web graph
d) Size of web page

172) The process that occurs in a series of time-steps in each of which a random choice is made is

a) Markov Chains
b) Rank page
c) Link
d) Transition

173) Two documents are------------if they contain some of same terms.

a) Unique
b) Equal
c) Both a and b
d) Similar

174) Shared Word Count is

a) Here weighting are used

b) No weighing are used
c) Some weighing are used
d) None of them

175) NTCIR stands for

a) NII Test Collections for IR systems

b) Nil Test Collections for IR
c) Null Technique Collections for IR
d) Nil Test collaboration for IR

176) Deep expert is the capacity to deliver-------------------that is relevant to each individual inquirer

a) Same Information
b) False Information
c) Unique Information
d) True Information

177) It requires a large amount of existing data on a user in order to make accurate recommendation

a) Hot start
b) Cold start
c) Both a and b
d) None of them

178)builds systems that automatically answer questions posed humans in a natural language

a) Query
b) Solution
c) Question Answering
d) Multiple Solution

179) The information needs to be translated into a query by the user

a) The User Task

b) Logical View
c) Logical Task
d) None

180) It contains document by document data

a) Inverted File
b) Combination File
c) Bath a and b
d) Sequential File

181) It is group of documents that retrieval is performed on.

a) Term
b) Query
c) Collection
d) Posting
182) The main goal is to find the important meaning and create an internal representation

a) Query evaluation
b) Document Indexing
c) System evaluation
d) None

183)were the first to adopt Information Retrieval systems for retrieving Information
a) Laboratory
b) Libraries
c) Industry
d) All of the above

184) It is the topic which the user desires to know more and is differentiated from a query.

a) Posting
b) Term
c) Documents
d) Information need

185) It serves as a witness who knows specific information on a given event.

a) Shallow expert
b) Expert
c) Deep expert
d) None

186) Collaborative filtering has following problems.

a) Cold Start
b) Scalability
c) Both a and b
d) None of them

187) Factors of Summarization are

a) Input, Purpose, Output

b) Purpose, Output, Input
c) Output, Purpose, Input
d) Input, Output, Purpose.

187) XML stands for

a) Extensible Main Language

b) Extensible Markup Language
c) Exists Markup Language
d) Extensible Markup Lingual.

188) Many documents on web are not in--------------format.

a) Multicode
b) Unicode
c) Same code
d) Different code
189) It improves search engine ranking of a websites.

a) White Hat SEO

b) Black Hat SEO
c) On page SEO
d) Off page SEO

190) Building data structures that enable searching

a) Web Process
b) Index process
c) Query process
d) None

191) Query process comprises of the following sequence.

a) User interaction, Ranking, Evaluation.

b) Ranking, Evaluation, User interaction.
c) Evaluation, User interaction, Ranking
d) Evaluation, Ranking, User information.

192) An advantage of a positional index is that it reduces the asymptotic complexity of a postings
intersection operation.

a) True
b) False

193) Each document has a unique serial number known as

a) Document identifier
b) Document name
c) Document type
d) None of the above

194) A is a sequence of K Characters.

a) K-gram
b) Boolean
c) Post filter
d) None of the above

195) Structure of Web has following entities:

i. Web Graph
ii. Static and Dynamic Pages
iii. Hidden web pages
iv. Size of web page

a) i) & ii)
b) i) & ii)
c) iii) & iv)
d) i),ii),iii) & iv)

196) An XML document can contain

a) Wide variety of data

b) Unique data
c) Simple data
d) Single data

197) Regular keyword queries as in unstructured information retrieval is

a) CO Topics
b) CAS Topics
c) Both a and b
d) None of them.

198) There is------------collection of Markup tags.

a) Fixed
b) Vast
c) No fixed
d) Large.

199) The MapReduce of two pieces of code:

a) The Mapper and The Reducer

b) The index and Page rank
c) Input and Output
d) Map and Shuffle.

200) is transformation of a string of characters into a usually shorter fixed length value which
represents the original key.

a) Hashing
b) Indexing
c) Querying
d) Searching
[1] Data By Itself Is Not Useful Unless
(A) => It is massive
(B) => It is processed to obtain information
(C) => It is collected from divert source
Answer =>> It is processed to obtain information

[2] For Taking Decisions Data Must Be

(A) => Very accurate
(B) => Massive
(C) => Processed correctly
Answer =>> Processed correctly

[3] Strategic Information Is Needed For

(A) => Day to Day operations
(B) => Meet government requirements
(C) => Long range planning
Answer =>> Long range planning

[4] Strategic Information Is Required By

(A) => Middle managers
(B) => Line managers
(C) => Top managers
Answer =>> Top managers

[5] Tactical Information Is Needed For

(A) => Day to Day operations
(B) => Short range planning
(C) => Meet government requirements
Answer =>> Short range planning

[6] Tactical Information Is Required By

(A) => Middle managers
(B) => Line managers
(C) => Top managers
Answer =>> Middle managers

[7] Operational Information Is Needed For

(A) => Day to Day operations
(B) => Meet government requirements
(C) => Long range planning
Answer =>> Day to Day operations

[8] Operational Information Is Required By

(A) => Middle managers
(B) => Line managers
(C) => Top managers
Answer =>> Line managers

[9] Statutory Information Is Needed For

(A) => Day to Day operations
(B) => Meet government requirements
(C) => Long range planning
Answer =>> Meet government requirements

[10] In Motor Car Manufacturing The Following Type Of Information Is Strategic

(A) => Decision on introducing a new model

(B) => Scheduling production
(C) => Assessing competitor car
Answer =>> Decision on introducing a new model

[11] In Motor Car Manufacturing, The Following Type Of Information Is Tactical

(A) => Decision on introducing a new model

(B) => Scheduling productionB. Scheduling production
(C) => Assessing competitor car
Answer =>> Assessing competitor car

[12] A Computer Based Information System Is Needed Because

(A) => The size of organization have become large and data is massive
(B) => Computers are available
(C) => Difficult to get clerks to process data.
Answer =>> The size of organization have become large and data is massive

[13] Organizations Are Divided Into Departments Because

(A) => It is convenient to do so
(B) => Each department can be assigned a specific functional responsibility
(C) => It provides opportunities for promotions
Answer =>> Each department can be assigned a specific functional responsibility

[14] Organizations Have Hierarchical Structures Because

(A) => It is convenient to do so
(B) => It is done by every organizations
(C) => Specific responsibilities can be assigned for each level
Answer =>> Specific responsibilities can be assigned for each level
[15] Which Of The Following Function Is Most Likely In An Insurance Company
(A) => Training
(B) => Giving loans
(C) => Bill of material
Answer =>> Bill of material

[16] Which Of The Following Functions Is Most Likely In A University

(A) => Admissions
(B) => Accounting
(C) => Conducting examinations
Answer =>> Conducting examinations

[17] Every Record Stored In A Master File Has A Key Field Because
(A) => It is the most important field
(B) => It acts as a unique identification of records
(C) => It is the key to the database
Answer =>> It acts as a unique identification of records

[18] The Primary Storage Medium For Storing Archival Data Is

(A) => Floppy disc
(B) => Magnetic disk
(C) => Magnetic tape
Answer =>> Magnetic tape

[19] Master Files Are Normally Stored In

(A) => A hard disk
(B) => A tape
(C) => CD-ROM
Answer =>> A hard disk

[20] Master File Is A File Containing

(A) => All master records
(B) => All record relevant to the application
(C) => A collection of data items
Answer =>> All record relevant to the application

[21] Edit Program Is Required To

(A) => Authenticate data entered by an operator
(B) => Format correctly input data
(C) => Detect errors in input data
Answer =>> Detect errors in input data
[22] Data Rejected By Edit Program Are
(A) => Corrected and re-entered
(B) => Removed from processing
(C) => Collected for later use
Answer =>> Corrected and re-entered

[23] Online Transaction Processing Is Used Because

(A) => It is efficient
(B) => Disk is used for storing files
(C) => It can handle random queries
Answer =>> It can handle random queries

[24] A Management Information System Is One Which

(A) => Is required by all managers of the organizations
(B) => Processed data to yield information of value in tactical management
(C) => Provides operational information
Answer =>> Processed data to yield information of value in tactical management

[25] Data Mining Is Used To Aid In

(A) => Operational management
(B) => Analyzing past decision made by managers
(C) => Detecting patterns in operational data
Answer =>> Detecting patterns in operational data

[26] Data Mining Requires

(A) => Large quantities of operational data stored over a period of time
(B) => Lots of tactical data
(C) => Several tape drives to store archival data
Answer =>> Large quantities of operational data stored over a period of time

[27] Decision Support System Are Used For

(A) => Management decision making
(B) => Providing tactical information to management
(C) => Providing strategic information to management
Answer =>> Providing strategic information to management

[28] Decision Support System Are Used By

(A) => Line managers
(B) => Top-level managers
(C) => Middle level managers
Answer =>> Top-level managers

[29] Decision Support Systems Are Essential For

(A) => Day-to-Day operations of an organizations
(B) => Providing statutory information
(C) => Top level strategic decision making
Answer =>> Top level strategic decision making

[30] A Data Dictionary Has Consolidate List Of Data Contained In

(A) => Data flows
(B) => Data inputs
(C) => Data outputs
Answer =>> Data flows

[31] By Metadata We Mean

(A) => Very large data
(B) => Data about data
(C) => Data dictionary
Answer =>> Data about data

[32] A Data Dictionary Is Usually Developed

(A) => At requirement specification phase
(B) => During feasibility analysis
(C) => When DFD is developed
Answer =>> When DFD is developed

[33] A Data Dictionary Has Information About

(A) => Every data element in a data flow
(B) => Only key data element in a data flow
(C) => Only important data element in a data flow
Answer =>> Every data element in a data flow

[34] A Data Element In A Data Dictionary May Have

(A) => Only integer value
(B) => Only value
(C) => Only real value
Answer =>> Only value

[35] It Is Necessary To Carefully Design Data Input To A Computer Based System Because

(A) => It is good to be careful

(B) => The volume of data handled is large
(C) => The volume of data handled is small
Answer =>> The volume of data handled is large

[36] Error Occurs More Often When

(A) => Data is entered by users
(B) => Data is entered by operators
(C) => When data is hand written by users and entered by operators
Answer =>> When data is hand written by users and entered by operators

[37] In Online Data Entry It Is Possible To

(A) => Give immediate feedback if incorrect data is entered
(B) => Eliminate all errors
(C) => Save data entry operators time
Answer =>> Give immediate feedback if incorrect data is entered

[38] In Interactive Data Input A Menu Is Used To

(A) => Enter new data
(B) => Add/Delete data
(C) => Select one out of many alternatives often by a mouse click
Answer =>> Select one out of many alternatives often by a mouse click

[39] Data Inputs Which Requires Coding Are

(A) => Fields with specify prices
(B) => Key fields
(C) => Name field such as product name
Answer =>> Key fields

[40] By The Term ‘Meaningful Code’ We Understand That The Code

(A) => Conveys information on item being coded
(B) => Is of small length
(C) => Can add new item easi

MCQ

1) A model of information retrieval in which we can pose any query in which search terms are combined with the
operators AND, OR, and NOT:

Ad Hoc Retrieval Ranked

Retrieval Model

Boolean Information Model

Proximity Query Model

2)A data structure that maps terms back to the parts of a document in which they occur is called an (select the
best answer):
Postings list Incidence

Matrix Dictionary

Inverted Index

The correct answer is: Inverted Index

3)A process to efficiently intersect lists to be able to quickly find documents that contain both terms is referred to
as merging postings lists.

True

False

The correct answer is 'True'.

4)The model of information retrieval in which we can pose any query in the form of a Boolean expression is called
the ranked retrieval model.

True

False

The correct answer is 'False'.

5)The number of times that a word or term occurs in a document is called the:

Proximity Operator

Vocabulary Lexicon

Term Frequency

Indexing Granularity

The correct answer is: Term Frequency

6)Stemming increases the size of the vocabulary.

True

False

The correct answer is 'False'.

7)In information retrieval, extremely common words which would appear to be of little value in helping
select documents that are excluded from the index vocabulary are called:

Stop Words

Tokens

Lemmatized Words

Stemmed Terms

The correct answer is: Stop Words

8)A crude heuristic process that chops off the ends of the words to reduce inflectional forms of words
and reduce the size of the vocabulary is called:

Lemmatizatio

n Case

Folding True

casing
Stemming
The correct answer is: Stemming

9)An advantage of a positional index is that it reduces the asymptotic complexity of a postings
intersection operation.

True

False

The correct answer is 'False'.

10)An index that includes sequences of words or terms of variable length that have been extracted
from a source document is called a:

Phrase Index

Biword index

Positional index

Inverted Index

The correct answer is: Phrase Index

11)One disadvantage, as outlined in our text, of using a permuterm index for wild card queries is:

It requires complex code that is difficult to maintain

It has the risk of key collisions which are difficult to resolve

The required rotations creates a very large dictionary

It cannot be used to find terms that are not spelled correctly

The correct answer is: The required rotations creates a very large dictionary

12)Which of the following is a technique for context sensitive spelling correction:

the Jaccard Coefficient

Soundex algorithms

k-gram indexes
Levenshtein distance

The correct answer is: Soundex algorithms

13)For a very large collection of books of classic literature the most appropriate indexing
algorithm would be:

Block sort-based indexing algorithm

Single-pass in memory indexing

algorithm

Distributed Map-Reduce indexing algorithm

Dynamic indexing process employing an auxiliary index

The correct answer is: Distributed Map-Reduce indexing algorithm

14)For a large collection of documents such as the internet that experience frequent change the most
appropriate indexing algorithm would be:

Block sort-based indexing algorithm

Single-pass in memory indexing

algorithm

Distributed Map-Reduce indexing algorithm

Dynamic indexing process employing an auxiliary index

The correct answer is: Dynamic indexing process employing an auxiliary index

15)Given two strings s1 and s2, the edit distance between them is sometimes known as the:

Levenshtein distance

isolated-term distance

k-gram overlap

Jaccard Coefficient

The correct answer is: Levenshtein distance

16)For a moderately large collection of static documents maintained on a single system the most
appropriate indexing algorithm would

be: Block sort-based indexing

algorithm

Single-pass in memory indexing algorithm

Distributed Map-Reduce indexing algorithm

Dynamic indexing process employing an auxiliary index

The correct answer is: Single-pass in memory indexing algorithm

17)For a small collection of documents on a personal computer that don't experience any change the
most appropriate indexing algorithm would be:

Block sort-based indexing algorithm

Single-pass in memory indexing algorithm

Distributed Map-Reduce indexing algorithm

Dynamic indexing process employing an auxiliary

index The correct answer is: Block sort-based

indexing algorithm

18)Hashing is a process where an item is reduced, through a mathematical process, to an integer.

True

False

The correct answer is 'True'.

19)19)

The size of the document collection that can be indexed by single-pass in-memory indexing
algorithm is limited by the size of the disk storage the computer running the indexer process
has access to.

True
False

The correct answer is 'False'.

20)The formula used to estimate the vocabulary size of a collection is known as:

Zipf's law

Power law

Heap's law

Compression ratio

The correct answer is: Heap's law

21)Which of the following is NOT a benefit of index compression?

Simplified algorithm design

Reduction of disk space

Faster transfer of data from disk to

memory Increased Use of caching

The correct answer is: Simplified algorithm design

22)A compression algorithm that results in some loss of data is called:

zipf compression

dictionary

compression lossless

compression

lossy compression

The correct answer is: lossy compression

23)An approach to compression that takes advantage of the redundancy in the dictionary that results
from common prefixes that come from sorted terms is called:

Front Coding

Blocked storage

Prefix Coding
Variable byte encoding

The correct answer is: Front Coding

24)A disadvantage of compression is that it reduces the transfer of data from disk to memory.

True

False

The correct answer is 'False'.

25)The 30 most common words account for 30% of the tokens in written text is known as front coding.

True

False

The correct answer is 'False'.

26)Weighted zone scoring is sometimes referred to as ranked Boolean retrieval.

True

False

The correct answer is 'True'.

27)In the bag of words model, the exact ordering of terms within the document is both significant and
relevant to processing.

True

False

The correct answer is 'True'.

28)The purpose of the inverse document frequency is to increase the weight of terms with
high collection frequency.

True
False

The correct answer is 'False'.

29)A scheme where a weight is assigned to a term based upon the number of occurrences of the
term within a document is called:

Bag of Words

Document

Frequency

Term Frequency

Optimal weight

The correct answer is: Term Frequency

30)The number of documents within a collection that contain a particular term is the collection
frequency of the term.

True

False

The correct answer is 'False'.

31)A metric derived by taking the log of N divided by the document frequency where N is the total
number of documents in a collection is called:

document

frequency tf-idf

weight collection

frequency

inverse document frequency

The correct answer is: inverse document frequency

32)The tf-idf weight is highest when a term t occurs many times within a small number of documents.
True
False

The correct answer is 'True'.

33)The tf-idf weight is lower when a term t occurs many times in a document or occurs in relatively few
documents.

True

False

The correct answer is 'False'.

34)A measure of similarity between two vectors which is determined by measuring the angle between
them is called:

cosine similarity

sin similarity

vector similarity

vector scoring

The correct answer is: cosine similarity

35)An index that is often supplemental to the inverted index and contains terms from only a particular
field or section of a document is called a parametric index.

True

False

The correct answer is 'True'.

36)A scheme where a weight is assigned to a term based upon the number of occurrences of the
term within a document is called:

Select one:

a. Bag of Words

b. Document Frequency
c. Term Frequency

d. Optimal weight

The correct answer is: Term Frequency

37)A group of related documents against which information retrieval is employed is called:

a. Corpus

b. Text Database

c. Index Collection

d. Repository

The correct answer is: Corpus

38)Weighted zone scoring is referred to as:

a. ranked Boolean retrieval

b. Zipf retrieval

c. Ad Hoc query retrieval

d. Jaccard retrieval

The correct answer is: ranked Boolean retrieval

39)An approach to compression that takes advantage of the redundancy in the dictionary that results
from common prefixes that come from sorted terms is called:

a. Front Coding

b. Blocked storage

c. Prefix Coding

d. Variable byte encoding

The correct answer is: Front Coding

40)True/False: Given two strings s1 and s2, the edit distance between them is sometimes known as the
Levenshtein distance.

True

False

The correct answer is 'True'.

41)True/False: Ad hoc retrieval is a model of information retrieval in which we can pose any query in
which search terms are combined with the operators AND, OR, and NOT.

Select

one: True

False

The correct answer is 'False'.

42)True/False: An advantage of compression is that it reduces the transfer of data from disk to memory.

True

False

The correct answer is 'True'.

43)True/False: The process where multiple lists are evaluated using AND or OR operators in a Boolean
retrieval query is called an intersection operation.

True

False

The correct answer is 'True'.

44)For a small collection of documents on a personal computer that don't experience any change the
most appropriate indexing algorithm would be:

Select one:

a. Block sort-based indexing algorithm

b. Single-pass in memory indexing algorithm

c. Distributed Map-Reduce indexing algorithm

d. Dynamic indexing process employing an auxiliary index

The correct answer is: Block sort-based indexing algorithm

45)True/False: The number of documents within a collection that contain a particular term is the
collection frequency of the term.

True

False

The correct answer is 'False'.

46)True/False: In the bag of words model, the exact ordering of terms within the document is
not relevant to processing.

Select one:

True

False

The correct answer is 'True'.

47)In information retrieval, extremely common words which would appear to be of little value in helping
select documents that are excluded from the index vocabulary are called:

a. Stop Words

b. Tokens

c. Lemmatized Words

d. Stemmed Terms

The correct answer is: Stop Words

48)A process that reduces the size of a vocabulary by reducing to the 'root' of words is called:
a. Stemming

b. Lemmatizing

c. Removal of stop words

d. Posting

e. pruning

The correct answer is: Stemming

49)Which of the following is NOT a benefit of index compression?

a. Simplified algorithm design

b. Reduction of disk space

c. Faster transfer of data from disk to memory

d. Increased Use of caching

The correct answer is: Simplified algorithm design

50)To evaluate the effectiveness of an IR system the output from a standard query executed against the
test IR system is compared with the known output from a:

Select one:

a. internet collection

b. reference book

c. separate IR system.

d. standard test collection

The correct answer is: standard test collection

51)The standard approach to information retrieval system evaluation involves around the notion of:

a. Quantity of documents in the collection

b. Relevant and non relevant documents.

c. Accuracy

d. user happiness

The correct answer is: Relevant and non relevant documents

52)A web server communicates with a client (browser) using which

protocol: Select one:

a. HTML

b. HTTP

c. FTP

d. Telnet

The correct answer is: HTTP

53)The basic operation of a web browser is to pass a request to the web server. This request is an
address for a web page and is known as the:

a. UAL: Universal Address Locator

b. HTML: Hypertext Markup Language

c. URL: Universal Resource Locator

d. HTTP: Hypertext transfer protocol

The correct answer is: URL: Universal Resource Locator

54)A web page whose content doesn't vary from one request to another is called a:

a. Text Page

b. Dynamic Page

c. Active Server Page

d. Static Page

The correct answer is: Static Page

55)A web link within a web page that references another part of the same page is called a:

a. Out link

b. Vector

c. In link

d. Tendril

The correct answer is: In link

56)In the context of web search engines the manipulation of web page content for the purpose of
appearing high up in search results for selected query terms is called:

Select one:

a. Paid inclusion

b. SPAM

c. SEO

d. Link Analysis

The correct answer is: SPAM

57)Results from a search engine that are based upon the retrieval of items using a method of
term weighting such as cosine similarity is a form of:

a. Sponsored Search

b. Algorithmic Search

c. Informational Search

d. Navigational Search

The correct answer is: Algorithmic Search

58)A program that captures and indexes content from web pages is known as what insect:

a. Fly
b. Centipede

c. Mosquito

d. Spider

The correct answer is: Spider

59)The list of web pages that a web crawler has queued up to index is called the:

a. Web Page Queue

b. Seed set

c. URL Filter

d. URL Frontier

The correct answer is: URL Frontier

60)In order to access a particular web site in the internet, the URL must be converted into an IP
address. Which service does this conversion?

a. HTTP

b. TNS

c. DNS

d. DHCP

The correct answer is: DNS

61)For a very large collection of books of classic literature the most appropriate indexing
algorithm would be:

a. Block sort-based indexing algorithm

b. Single-pass in memory indexing algorithm

c. Distributed Map-Reduce indexing algorithm

d. Dynamic indexing process employing an auxiliary index

The correct answer is: Distributed Map-Reduce indexing algorithm

62)Which of the following is a technique for context sensitive spelling correction:

a. the Jaccard Coefficient

b. Soundex algorithms

c. k-gram indexes

d. Levenshtein distance

The correct answer is: Soundex algorithms

63)The formula used to estimate the vocabulary size of a collection is known as:

a. Zipf's law

b. Power law

c. Heap's law

d. Compression ratio

The correct answer is: Heap's law

THEORY

Page Rank

PageRank (PR) is an algorithm used by Google Search to rank websites in their search engine
results. PageRank was named after Larry Page, one of the founders of Google. PageRank is a
way of measuring the importance of website pages. According to Google:

PageRank works by counting the number and quality of links to a page to determine a rough
estimate of how important the website is. The underlying assumption is that more important
websites are likely to receive more links from other websites.

how calculate

The PageRank is calculated by the number and value of incoming links to a website. Initially,
one link from a site equaled one vote for the site that it was linked to. However, later versions
of the PageRank set 0.25 as the initial value for a new website (based on an assumed
probability distribution between 0 and 1).
MapReduce

MapReduce is a processing technique and a program model for distributed computing based on
java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map
takes a set of data and converts it into another set of data, where individual elements are
broken down into tuples (key/value pairs). Secondly, reduce task, which takes the output from a
map as an input and combines those data tuples into a smaller set of tuples. As the sequence of
the name MapReduce implies, the reduce task is always performed after the map job.

algorithm

Generally MapReduce paradigm is based on sending the computer to where the data resides.
MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce
stage. Map stage − The map or mapper's job is to process the input data. Generally the input
data is in the form of file or directory and is stored in the Hadoop file system (HDFS).

MapReduce in Hadoop

MapReduce Overview. Apache Hadoop MapReduce is a framework for processing large data
sets in parallel across a Hadoop cluster. Data analysis uses a two step map and reduce process.
The job configuration supplies map and reduce analysis functions and the Hadoop
framework provides the scheduling, distribution, and parallelization services. By default, the
MapReduce framework gets input data from the Hadoop Distributed File System (HDFS).

hadooop(By default, Hadoop uses the cleverly named Hadoop Distributed File System
(HDFS))

The Apache Hadoop software library is a framework that allows for the distributed processing of
large data sets across clusters of computers using simple programming models. It is designed to
scale up from single servers to thousands of machines, each offering local computation and
storage.

It’s the tool that actually gets data processed.

It tends to drive people slightly crazy when they work with it.

Link Analysis

Link analysis is a data analysis technique used in network theory that is used to evaluate the
relationships or connections between network nodes Link analysis is often used in search
engine optimization as
well as in intelligence, in security analysis and in market and medical research.

Question answering (QA)

Question answering (QA) is a computer science discipline within the fields of information
retrieval and natural language processing (NLP), which is concerned with building
systems that automatically answer questions posed by humans in a natural language.
A question answering implementation, usually a computer program, may construct its answers by
querying a structured database of knowledge or information, usually a knowledge base. More
commonly, question answering systems can pull answers from an unstructured collection of natural
language (this is copy right)

Some examples of natural language document collections used for question answering systems
include:

a local collection of reference texts

internal organization documents and web pages

compiled newswire reports

a set of Wikipedia pages

a subset of World Wide Web pages

summerization

Text summarization is a way to condense the large amount of information into a concise form by the
process of selection of important information and discarding unimportant and redundant information
1) Information is
a) Data b) Processed data
c) Manipulated input d) Computer output
ANS: b

2) Which of the following is a characteristic of Data?

a) Numerically expressed b) Affected by various cause
c) Aggregates of facts d) All of these
Ans: d

3) Which of the following is a characteristic of information?

a) Pre-determined objectives b) Collection of data in systematic manner
c) Accuracy in data collection d) All of these
Ans: d

4) A computer based information system is needed because

i) The size of organization have become large and data is massive ii )
Timely decisions are to be taken based on available data
iii) Computers are available
iv ) Difficult to get clerks to process data
a) (ii) and (iii) b) (i) and (ii)
c) (i) and (iv) d) (iii) and (iv)
Ans: b

5) An MIS objective can be stated as

a) Increase product sales b) Reduce marketing cost
c) Increase sale of product A by 10% in the next year d) All of the above
Ans: b

6) Information systems are organized combination of

a) People, hardware, software, computer networks and data resources b) Hardware, software
c) Computer cables d) None of these
Ans: a

7) One of the main capability of ‘IS’ is _

a) Provide computer for working b) Provide fast and accurate transaction processing
c) Both of above d) None of these Ans: b

8) IS is needed because it provides support for

a) Business processes, decision-making, and competitive advantage b) Generating reports only
c) Demonstration effect d) None of these
Ans: a

9) Main dimensions of information systems are

a) Organizations and management b) Management and technology
c) Organizations, management and technology d) None of these Ans: c
10) Components of information systems are
a) Computer and network b) Computer and software
c) People, hardware, software, data and networks d) None of the above Ans: c
11) The major components of a computer are
a) Memory c) I/O Devices
c) CPU d) All of the
above Ans: d

12) The Central Processing Unit

a) Is operated from the Control Panel b) Controls the Storage Unit
c) Is controlled by the input data entering the system d) Controls all input, output
and processing Ans: d

13) The CPU (Central Processing Unit) consists of

a) Input, output, and processing
b) Input, processing, and storage
c) Control Unit, Arithmetic and Logic Unit, and Primary Storage
d) Control unit, primary storage, and secondary
storage Ans: c

14) Memory is
a) Device that performs a sequence of operations specified by instructions in memory
b) The device where information is stored
c) A sequence of instruction
d) Typically characterized by interactive processing and time slicing of the CPU's time to allow quick response to
each user
Ans: b

15) Which is the component that allows the computer to permanently retain large amounts of data?
a) CPU b) Primary Memory
c) Mass Storage Device d) None of the
above Ans: c

16) Which of the following loses its contents when the computer is turned off?
a) RAM b) ROM
c) PROM d) All of the
above. Ans: a

17) The fastest memory in a computer system is

a) ROM b) RAM
c) Cache d) None of
these Ans: c

18) Which of the following is a portable computer

a) Laptops b) Notebook Computer
c) PDAs d) All of the
above Ans: d

19) Why a desktop computer is called Personal Computer?

a) Because it belongs to a single person
b) Because only one person can use it at any point of time
c) Because only persons can use it, not organizations
d) Because it needs personal attention
Ans: b

20) Which of the following is System Software?

a) MS-Word b) Tally
c) Ms-PowerPoint d) Operating
System Ans: d

21) Which of the following is not application Software?

a) Word Processing b) Spreadsheet
c) UNIX d) Desktop
Publishing Ans: c

22) Which of the following is not an output device?

a) Printer b) Keyboard
c) Projector d) Plotter
Ans: b

23) Mouse is which type of device?'

a) Extracting device b) Pointing Device
c) Hand device d) Gaming
device Ans: b

24) Mouse contains a wheel for scrolling is called

a) Scroll wheel b) Wheel
c) Roller d) None of these
Ans: a

25) Some of the most basic types of output devices is/are

a) Monitors, printers b) Plotters, computer output firms
c) Audio output d) All of the
above Ans: a

26) Mouse, trackball, and joystick are the examples of

a) scanning devices b) storing devices
c) pointing devices d) Multimedia
devices Ans: c

27) The device which is used to input images into the computer is
a) Mouse b) Digital Camera
c) Joystick d) None of the
above Ans: b

28) Which topology requires a central controller or hub?

a) Mesh b) star
c) Bus d) Ring
Ans: B

29) Which topology requires a multipoint connection?

a) Mesh b) star
c) Bus d) Ring
Ans: c

UNIT - II
1) DBMS stands for
a) Data base marginal system c) Data base management system
b) Directory based memory standard d) Dual bus mask storage
Ans: c

2) A Database Management System is

a) Collection of interrelated data b) Collection of programs to access data
c) Collection of data describing one particular enterprise d) All of
the above Ans: a

3) In the relational model, cardinality is termed as:

a) Number of tuples c) Number of tables
b) Number of attributes d) Number of constraints
Ans: a

4) Architecture of the database can be viewed as

a) Two levels b) Four levels
c) Three levels d) One
level Ans: c

5) In a relational model, relations are termed as

a) Tuples b) Attributes
c) Tables d) Rows
Ans: c

6) Related fields in a database are grouped to form a

a) Data File b) Data Record
c) Menu d) Bank
Ans: b

7) The database environment has all of the following components except

a) Users b) Separate files
c) Database d) Database
administrator Ans: a

8) An advantage of the database management approach is

a) Data is dependent on programs b) Database redundancy increases
c) Data is integrated and can be accessed by multiple programs. d) None of
the above. Ans: c

9) The RDBMS terminology for a row is

a) Tuple b) Relation
c) Attribute d)
Degree Ans: a

10) includes review of the existing procedures and information flow.

a) Feasibility Study b) Feasibility report
c) System Design d) System
analysis Ans: a

11) refers to the collection of information pertinent to systems Project.

a) Data transfer b) Data gathering
c) Data Embedding d) Data
Request . Ans: b

13) System Development process is also called as

a) System Development Life Cycle b) System Life Cycle
c) Both A and B d) System Process
Cycle Ans: a

15) Which of these sequences is correct for the systems development lifecycle?
a) Initiation, analysis, design, build b) Design, initiation, analysis, build
c) Analysis, design, initiation, build d) Analysis, initiation,
design, build Ans: a

16) Which is not a software life cycle model

a) Spiral Model b) Waterfall Model
c) Prototyping Model d) Capability Maturity
Model Ans: d

17) RAD stands for

a) Rapid Application Development b) Relative Application Development
c) Ready Application Development d) Repeated Application
Development Ans: a

18) The major goal of requirement determination phase of information system development is
a) Determine whether information is needed by an organization
b) Determine what information is needed by an organization
c) Determine how information needed by an organization can be provided
d) Determine when information is to be
given Ans: b

19) lnformation requirements of an organization can be determined by

a) Interviewing managers and users and arriving at the requirements based on consensus
b) Finding out what similar organizations do
c) Telling organization what they need based on your experience
d) Sending a questions to all employees of the organization
Ans : a

20) A feasibility study is carried out

a) After final requirements specifications are drawn up
b) During the period when requirements specifications are drawn up
c) Before the final requirements specifications are drawn up
d) At any time
Ans: c

21) The main objective of feasibility study is

a) To assess whether it is possible to meet the requirements specifications
b) To assess if it is possible to meet the requirements specified subject to constraints of budget, human
resource and hardware
c) To assist the management in implementing the desired system
d) To remove bottlenecks in implementing the desired system
Ans: b

22) Feasibility study is carried out by

a) Managers of the organization
b) System analyst in consultation with managers of the organization
c) Users of the proposed system
d) Systems designers in consultation with the prospective users of the system
Ans: b

23) The expansion of CASE tools is:

a) Computer Assisted Self Evaluation b) Computer Aided Software Engineering
c) Computer Aided Software Environment d) Core Aids for Software Engineering Ans: b

24) CASE tools are used by industries to

a) Improve productivity of their software engineers b) Reduce time to develop applications
c) Improve documentation d) All of the above Ans: d

25) CASE tools are useful

a) Only during system design stage b) During all the phases of system life cycle
c) Only for system documentation d) Only during System analysis stage Ans: b

26) CASE tools are

a) A Set of rules to be used during system analysis and design
b) Program, packages used during system analysis and design
c) A set of tools used by analysts
d) Needed for use case development Ans: b

27) Which of the following is, NOT a key component of object oriented programming?
a) Inheritance b) Encapsulation
c) Polymorphism d) Parallelism Ans: d

28) Which of these is TRUE of the relationship between objects and classes?
a) A class is an instance of an object. b) An object is the ancestor of its subclass.
c) An object is an instance of a class. d) An object is the descendant of its super-class Ans:
c
1.Distributed indexing is used in:

Select one:
a. All of the above
b. Web-scale indexing
c. Google data centres
d. Parallel tasking

Ans: a. All of the above

2.Which is a good idea for using skip pointers? Select one:

a. Fewer skips, larger skip spans
b. None
c. Depends upon the no. of comparisons needed
d. More skips, shorter skip spans

Ans: c. Depends upon the no. of comparisons needed

3. Edit distance (Levenshtein distance) is a way of:

Select one:
a. Context-sensitive spelling correction
b. Document correction
c. Isolated word correction
d. Phonetic correction

Ans: c. Isolated word correction

4.Boolean retrieval model does not provide provision for: Select one:
a. Ranked search
b. Proximity search
c. Phrase search
d. Both proximity and ranked search

Ans: d. Both proximity and ranked search

5. Permuterm indices are used for solving:

Select one:
a. None
b. Boolean queries
c. Phrase queries
d. Wildcard queries

Ans: d. Wildcard queries

6. A large repository of documents in IR is called as:

Select one:

a. Corpus
b. Database
c. Dictionary
d. Collection

Ans: a. Corpus

7. Benefits of using a hash table is:

Select one:

a. Do not need to rehash everything periodically if vocabulary keeps growing.

b. Lookup in a hash table is faster than lookup in a tree.

c. All of the above

d. No prefix search is required

Ans: b. Lookup in a hash table is faster than lookup in a tree.

8. Variable-size postings lists is used when:

Select one:
a. More seek time is desired and the corpus is dynamic
b. Less seek time is desired and the corpus is dynamic
c. Less seek time is desired and the corpus is static
d. More seek time is desired and the corpus is dynamic Ans: d.

More seek time is desired and the corpus is dynamic

9. An alternative to equivalence classing is to do:

Select one:
a.Asymmetric expansion
b. Symmetric expansion
c. Case folding
d. Normalization

Ans: d. Normalization

10. We need external sorting algorithms to:

Select one:

a. Maximize the disk seek time.

b. Maintain constant disk seek time
c. Minimize the disk seek time.
d. None

Ans: c. Minimize the disk seek time.

11. Benefits of using B-trees:

Select one:
a. Re-balancing is cheap
b. Balanced trees allow efficient retrieval
c. Faster O(log M)
d. Solves the prefix problem.

Ans: d. Solves the prefix problem.

12. Postings list should be sorted by:

Select one:
a. Document Frequency
b. DocID
c. TermID
d. Term frequency

Ans: b. DocID

13. Key idea behind Single-pass in-memory indexing is:

Select one:
a. Don’t sort, Accumulate postings in postings lists as they occur.
b. Generate separate dictionaries for each block.
c. All of the above
d. No need to maintain term-termID mapping across blocks.

Ans: c. All of the above

14. For postings of length L, no. of skip pointers required are:

Select one:
a. Use L evenly-spaced skip pointers

b. Use L^2 evenly-spaced skip pointers.

c. Use L^1/2 evenly-spaced skip pointers

d. Use 2L evenly-spaced skip pointers.

Ans: c. Use L^1/2 evenly-spaced skip pointers

15. For query optimization while intersecting two postings list, we should:

Select one:
a. Process in the order of increasing document frequency
b. Process in any order
c. None of the above
d. Process in the order of decreasing document frequency Ans: a.

Process in the order of increasing document frequency

16. The goal of IR is to:

Select one:
a.find documents relevant to an information need
b. find documents relevant to an information need from a given document set
c. find documents relevant to an information need from a large document set
d. find documents relevant to an information need from a small document set Ans: c.

find documents relevant to an information need from a large document set

17. Best implementation approach for dynamic indexing is:

Select one:
a. Periodic re-indexing
b. Using Invalidation bit-vector for deleted docs
c. None
d. Using logarithmic merge

Ans: d. Using logarithmic merge

18. Issues in biword indexes are:

Select one:
a. Any one
b. Index blowup due to bigger dictionary
c. Both
d. False positives

Ans: c. Both

19. Any string of terms of the following form is called an extended biword:

Select one:
a. NNX*
b. NXNN
c. *NNX
d. NX*N

Ans:d. NX*N

20. Structured data allows for:

Select one:

a. Does not depend on data complexity

b. Less complex queries

c. No relationship

d. More complex queries

Ans: d. More complex queries

21. Blocked sort-based Indexing is a method of:

Select one:
a. Sorting with more disk seeks.
b. Merging with fewer disk seeks.
c. Comparing with fewer disk seeks.
d. Sorting with fewer disk seeks.

Ans: a. Sorting with more disk seeks.

22. Term-document incidence matrix is:

Select one:
a. Sparse
b. Depends upon the data
c. Dense
d. Cannot predict Ans: a. Sparse

23. Lemmatization is a technique for:

Select one:
a. Ranking documents
b. Case folding
c. Normalization
d. Tokenization

Ans: c. Normalization

24. If list lengths are x and y, merge takes:

Select one:
a. O(Yn) operations
b. O(xy) operations
c. O(xn) operations
d. O(x+y) operations

Ans: d. O(x+y) operations

25. Unstructured data tends to refer to information on the web and is processed using: Select one:
a. Both
b. Database systems
c. IR systems
d. None

Ans: c. IR systems
Question 1
Consider the following documents:
D1. Cat in the hat
D2. The cat chased the rat D3. The rat died
D4: The cat died
What is the space requirement for an uncompressed Boolean term-document incidence matrix of the above
documents?

Select one:
7 bytes
28 bits
28 bytes
7 bits
Feedback
The correct answer is: 28 bits
Question 2
Which of the following terms have the same soundex code?

Select one or more: Brightsite

Briteside
Brightside
Feedback
Your answer is correct.
The correct answer is: Brightside, Brightsite
Question 3
Consider an index for 100000 documents each having a length of 750 words. Assume there are 200K
distinct terms in total. What is the minimum number of bits required for representing the Doc-ID?

Select one:
8 bits
18 bits
17 bits
20 bits
Feedback
The correct answer is: 17 bits
Question 4
Which of the following is(are) NOT true with Google Search Engine? Select one:
It offers specialized search services

It does stemming
It does stop-word
removal None of the
choices
Feedback
The correct answer is: None of the choices
Question 5
A fragment from an inverted index (augmented with positional information) is given below.
Information: d1:12 ; d2:23,32,43; d3:13, d5:32,45,80
systems: d1:15; d2:34,42; d3: 35, d5: 38
Which of the following phrase(s) has(have) possible occurrences in the above document
sequence?

Select one or more:

“Information retrieval
systems” “Information
systems”
“Information theory retrieval systems”
None of the choices
Feedback
The correct answer is: “Information retrieval systems”, “Information theory retrieval
systems”
Question 6
Consider the following two postings list with the skip pointers shown. How many
postings comparisons will be made while intersecting the two lists with skip pointers?

Select one:
7
8
6
9
Feedback
The correct answer is: 9
Question 7
Consider the following fragment of a positional index with the format:
word: document: (position, position, . . .); document:(position, . . .i). . .
Gates: 1: (3); 2: (6); 3: (2,17); 4: (1);
IBM: 4: (3); 7: (14);
Microsoft: 1: (1); 2: (1,21); 3: (3); 5: (16,22,51);
The /k operator, word1 /k word2 finds occurrences of word1 within k words of word2
(either on left or right side), where k is a positive integer argument. Thus k = 1 demands
that word1 be adjacent to word2.
What is the set of documents that satisfy the query Gates /2 Microsoft?

Select
one:
1,3
3
1
No document satisfies the query
Feedback
The correct answer is: 1
Question 8
Given the query uni*e , if you want to search for permuterm wildcard index, which of the
following keys can be looked upon?

Select one:
e$uni
*
e$uin*
$unie*
Ie$un*
Feedback
The correct answer is: e$uin*
Question 9
If X denotes the length of string s1 and Y denotes the length of the string s2, then the edit
distance between s1 and s2 is never more than --------------------

Select one:
Min(X,Y)
None of the Choices
Max(X,Y)
X+Y
Feedback
The correct answer is: Max(X,Y)
Question 10
What is the soundex code for the term “amazing”?

Select one:
A552
A252
A525
A255
Feedback
The correct answer is: A525
Question 11
Given a document collection of 1000 documents which has 110 relevant documents for a
given query and if the IR system retrieves 30 relevant and 15 irrelevant documents, what
is the recall value of the system?

Select one:
0.03
0.27
0.33
0.66
Feedback
The correct answer is: 0.27
Question 12
When Lemmatization is applied to the term “Destruction” to which of the following form it
gets reduced?

Select one:
Destruc
t
Destroy
Destruc
Feedback
The correct answer is: Destroy
Question 13
Variable-size postings lists is used when

Select one:
Less seek time is desired and the corpus is dynamic
Less seek time is desired and the corpus is static
More seek time is desired and the corpus is
dynamic
More seek time is desired and the corpus is static
Feedback
The correct answer is: More seek time is desired and the corpus is dynamic
Question 14
Inverted Index Dictionary is sorted by

Select one:
Term frequency
Document
Frequency
Term/TermID
DocID
Feedback
The correct answer is: Term/TermID
Question 15
Which of the following is called an extended biword?

Select one:
NXNN
NNX*
NX*N
*NNX
Feedback
The correct answer is: NX*N
Question 16
If the two postings list are of length X and Y , then maximum number of operations needed
for merge is

Select one:
max(X,Y)
X+Y
X*Y
min(X,Y)
Feedback
The correct answer is: X+Y
Question 17
Given the Boolean query with terms (cat OR bat) AND NOT (dog or mat) Which
of the following will be the equivalent Disjunctive Normal Form of the
above query?

Select one:
(cat AND (NOT dog) AND (NOT mat)) OR (cat AND bat AND(NOT dog))
(cat AND (NOT dog) AND (NOT mat)) OR (bat AND (NOT mat) AND(NOT dog))
None of the Choices
(cat AND bat AND (NOT dog)) OR (cat AND bat AND (NOT mat))
Feedback
The correct answer is: (cat AND (NOT dog) AND (NOT mat)) OR (bat AND (NOT mat) AND(NOT
dog))
Question 18
If string s1= filosophi and s2= philosophy, what is the minimum edit distance
between s1and s2?
Select
one: 3
5
4
2
Feedback
The correct answer is: 3
Question 19
Given a document containing the sentence “I left my left bag at my home” the number of
tokens in the sentence is
Select
one: 8
6
4
Feedback
The correct answer is: 8
Question 20
Given a document collection which has 35 relevant documents, if an IR system retrieves 10
relevant and 13 irrelevant documents, what is the precision value of the system?

Select one:
0.43
0.28
0.33
0.66
Feedback
The correct answer is: 0.43
Question 21
Consider the following documents:
Doc1: new home sales top forecasts
Doc2: home sales rise in july
Doc3: increase in home sales in july
Doc4: july new home sales rise
When the Term Document incidence matrix is constructed and the query home AND (new OR
july) is executed on it, the resultant doc’s retrieved will be

Select one:
Doc1
Doc1,Doc3, Doc4 Doc1, Doc4,
Doc1, Doc2,Doc3,Doc4
Feedback
The correct answer is: Doc1, Doc2,Doc3,Doc4
Question 22
Yahoo search engine uses stemming for its Index generation

Select one:
True False
Feedback
The correct answer is 'False'.
Question 23
When stemming is used, it should be used for both indexing and query processing. Select one:
True False
Feedback
The correct answer is 'True'.
Question 24
Boolean Retrieval model maintains the term frequency. Is the statement True or False.

Select one:
True False
Feedback
The correct answer is 'False'.
Question 25
Phrase queries can be solved using N-grams.

Select one:
True False
Feedback
The correct answer is 'False'.
TYCS SEM-6th Information Retrieval (MCQ) Question Bank

1) IR Stands for______________.

a) Information Retrieval
b) Information Retired
c) Inform Retrieval
d) Information Ready

2) Each item in the list is called as______________.

a) Items
b) Posting
c) Query
d) Information
3) etr term is called _________k-grams wildcard query.

a) 3
b)4
c) 1
d)2
4) To search document by _______________ in IR.
a)id

b)docID

c)number

d)#digits

5) SEO stands for _____________ .

a) Search English Optimization
b) Search Engine Optimization
c) Search Engine Operator
d) Search Engine Operation

6) Dictionary performed by _________________pair

a) Key and Value

b) Value and Number
c) Id and Number
d) Name and code
7) An advantage of a positional index is that it reduces the asymptotic complexity of a postings intersection operation.

A) True
B) False

8) _________can best be described as a programming model used to develop Hadoopbased applications that can
process massive amounts of data.
A) MapReduce
B) Mahout
C) Oozie
D) All of the mentioned

9) The purpose of the inverse document frequency is to increase the weight of terms with high collection frequenc.

A) True
B) False

10) URL Stands for ______________________.

a) Uniform Ravar Location

b) Uniform Resource Locator
c) Uni Resource Locate
d) Uniform Reverse Locator
11) A data structure that maps terms back to the parts of a document in which they occur is called an

A) Postings list

B) Incidence Matrix
C) Dictionary
D) Inverted Index
12) The first large information retrieval research group was formed by____________at cornell in 1960.

a) Gerard Salton
b) Ratan Tata
c) Ramesh Bush
d) Think Roy
13) Input, Purpose and Output are the factors of _________ .

a) Summarization

b) Question Answering
c) Page Rank
d) Personalized Search

14) A deadlock can be broken down by

a) Committing one or more transactions

b) Aborting one or more transactions
c) Rolling back one or more transactions
d) Terminating one or more transactions.

15) NLTK stands for ______________ .

a) Natural Language Toolkit

b) Natural Lang Tool
c) Natural Long Tooltip
d) Nature Language Toolkit
16) Online transaction processing is used because

a) disk is used for storing files

b) it is efficient
c) it can handle random queries.
d) Transactions occur in batches

17) The primary storage medium for storing archival data is

a)floppy disk

b)magnetic disk

c)magnetic tape

d)CD- ROM

18) Organizations have hierarchical structures because

a) it is convenient to do so

b) it is done by every organization

c) specific responsibilities can be assigned for each level
d) it provides opportunities for promotions

19) Spelling correction only depends on___________factor.

a) Query
b) term
c) indexpowerd
d)Postings

20) Boolean query operator?

a) +
b) -
c) AND,OR NOT
d) <<<

21) A computer based information system is needed because

(i) The size of organization have become large and data is massive
(ii) Timely decisions are to be taken based on available data
(iii) Computers are available
(iv) Difficult to get clerks to process data
a)(ii) and (iii)

b)(i) and (ii)

c)(i) and (iv)

d)(iii) and (iv)

22) Operational information is needed for

a) Day to day operations

b) Meet government requirements

c) Long range planning
d) Short range planning

23) Data by itself is not useful unless

a) It is massive

b) It is processed to obtain information

c) It is collected from diverse sources
d) It is properly stated

24) For taking decisions data must be

a) Very accurate

b) Massive
c) Processed correctly
d) Collected from diverse sources

25) CLEF stands for________

a) Cross Language Evaluation Forum

b) Cross lingual evaluating field
c) Cross Language Evaluating Field
d) Cross Language Evaluating Forum

26)Variable size postings lists is used when

A) More seek time is desired and the corpus is dynamic

B) Less seek time is desired and the corpus is dynamic
C) Less seek time is desired and the corpus is static
D) More seek time is desired and the corpus is dynamic

27)Best implementation approach for dynamic indexing is

A) Periodic re indexing
B) Using Invalidation bit vector for deleted docs
C) None
D) Using logarithmic merge
28)Structured data allows for

A) Does not depend on data complexity

B) Less complex queries
C) No relationship
D) More complex queries

29) Data represent in_________________format IR System a) Text

b) Image
c) Audio text media
d) Options a,b,c

30)Term document incidence matrix is

A) Sparse
B) Depends upon the data
C) Dense
D) Cannot predict

31) What is contiguity hypothesis in vector space classification

A) Documents from different classes dont overlap

B) Documents in the same class form a contiguous region of space
C) All of the above.
D) Intra cluster similarity is higher than inter-cluster similarity

32) Tactical information is needed for

A) Day to day operations

B) Meet government requirements
C) Long range planning
D) Short range planning

33) Strategic information is required by

a) Middle managers

b) Line managers
c) Top managers
d) All workers

34) Postings List is like Array structure in IR?

a) True
b) false
35) An index that includes sequences of words or terms of variable length that have been extracted from a source
document is called a

a) Phrase Index
b) Biword index
c) Positional index
d) Inverted Index
36) A process to efficiently intersect lists to be able to quickly find documents that contain both terms is referred to as
merging postings lists.

a) True
b) false

37) The formula used to estimate the vocabulary size of a collection is known as:

a) Zipf's law

b) Power law
c) Heap's law
d) Compression ratio

38) Weighted zone scoring is sometimes referred to as ranked Boolean retrieval.

a)True

b)False

39)In the bag of words model, the exact ordering of terms within the document is both significant and relevant to
processing.

a)True

b) False

40) The number of times that a word or term occurs in a document is called the:

a)Proximity Operator

b)Vocabulary Lexicon

c)Term Frequency

d)Indexing Granularity

Practice Question For Information Retrieval Subject
No ratings yet
Practice Question For Information Retrieval Subject
5 pages
Inverted Index and Retrieval Models
No ratings yet
Inverted Index and Retrieval Models
8 pages
Understanding Boolean Search in IR
No ratings yet
Understanding Boolean Search in IR
15 pages
IR MCQ With Answers
100% (1)
IR MCQ With Answers
23 pages
IRS Important Questions
No ratings yet
IRS Important Questions
3 pages
Tycs Sem-6 Information Retrieval (MCQ) Question Bank: Items
100% (2)
Tycs Sem-6 Information Retrieval (MCQ) Question Bank: Items
6 pages
IRS UNITS-1,2,3 Objective Type Questions
No ratings yet
IRS UNITS-1,2,3 Objective Type Questions
9 pages
Information Retrieval MCQ PDF
100% (4)
Information Retrieval MCQ PDF
4 pages
IRS Objective
No ratings yet
IRS Objective
20 pages
Ir End Pyq Sols
No ratings yet
Ir End Pyq Sols
8 pages
Irt Ans
No ratings yet
Irt Ans
9 pages
Irs Cie Objective Paper
No ratings yet
Irs Cie Objective Paper
11 pages
SEO and Search Engine Basics
No ratings yet
SEO and Search Engine Basics
14 pages
Information Retrieval Quiz
No ratings yet
Information Retrieval Quiz
49 pages
Information Retrieval
No ratings yet
Information Retrieval
7 pages
CS3308 Information Retrieval Quiz
50% (2)
CS3308 Information Retrieval Quiz
63 pages
April 2019
No ratings yet
April 2019
2 pages
Information Retrieval Exam Guide
No ratings yet
Information Retrieval Exam Guide
2 pages
Sample Exam
No ratings yet
Sample Exam
2 pages
Ir MCQ
No ratings yet
Ir MCQ
11 pages
Ir QB
No ratings yet
Ir QB
8 pages
Ir Cbcs
No ratings yet
Ir Cbcs
3 pages
IRSunit 2
No ratings yet
IRSunit 2
20 pages
Indexing Techniques and Systems
No ratings yet
Indexing Techniques and Systems
3 pages
Irs Unit-4 Notes - 241202 - 150037
No ratings yet
Irs Unit-4 Notes - 241202 - 150037
18 pages
IRS Question Bank
No ratings yet
IRS Question Bank
8 pages
Research Paper
No ratings yet
Research Paper
3 pages
IR - Set 1
No ratings yet
IR - Set 1
5 pages
CS 3308 - Information Retrieval Self Quiz - Unit 01 - Unit 088 - University of The People
No ratings yet
CS 3308 - Information Retrieval Self Quiz - Unit 01 - Unit 088 - University of The People
49 pages
CS8080 INFORMATION RETRIEVAL TECHNIQUES II INTERNAL EXAMINATION - Google Forms
No ratings yet
CS8080 INFORMATION RETRIEVAL TECHNIQUES II INTERNAL EXAMINATION - Google Forms
420 pages
Information Retrieval Essentials
No ratings yet
Information Retrieval Essentials
22 pages
Understanding Search Engines and Queries
No ratings yet
Understanding Search Engines and Queries
19 pages
IR4MCQ
No ratings yet
IR4MCQ
3 pages
Irt QB
No ratings yet
Irt QB
8 pages
PART-I: Multiple Choices: Jimma University
100% (1)
PART-I: Multiple Choices: Jimma University
6 pages
Irs Unit - 4
No ratings yet
Irs Unit - 4
29 pages
asila-IR
No ratings yet
asila-IR
16 pages
All Unit 2 Mark
No ratings yet
All Unit 2 Mark
15 pages
Nov 2019
No ratings yet
Nov 2019
2 pages
Information Retrieval Concepts Quiz
No ratings yet
Information Retrieval Concepts Quiz
4 pages
Unit-4 1
No ratings yet
Unit-4 1
7 pages
2017 Fall
No ratings yet
2017 Fall
3 pages
Ir MCQ-1
No ratings yet
Ir MCQ-1
22 pages
2 Introduction To Information Retrieval
No ratings yet
2 Introduction To Information Retrieval
38 pages
ACFrOgAhDKMNiLdAKJ27Hzg52gNTQw 5K PHitykqmtwIgd9UKTVkmihywbzrIyBvrHsHZZ9wixYTTAUoZYnERTr6vUQ Cfqlt65bXEVoMBh Ta3S1geQE-C8DUlimE
No ratings yet
ACFrOgAhDKMNiLdAKJ27Hzg52gNTQw 5K PHitykqmtwIgd9UKTVkmihywbzrIyBvrHsHZZ9wixYTTAUoZYnERTr6vUQ Cfqlt65bXEVoMBh Ta3S1geQE-C8DUlimE
2 pages
PSSSB School Librarian Library Science Old Question Paper
No ratings yet
PSSSB School Librarian Library Science Old Question Paper
27 pages
What Is Information Retrieval (IR)
No ratings yet
What Is Information Retrieval (IR)
5 pages
Final Solutions
No ratings yet
Final Solutions
21 pages
University of Mumbai MCQ Question Bank: Semester
No ratings yet
University of Mumbai MCQ Question Bank: Semester
17 pages
Comprehensive Guide to Information Retrieval
No ratings yet
Comprehensive Guide to Information Retrieval
74 pages
IRS Most Important Topic
No ratings yet
IRS Most Important Topic
4 pages
267-Library and Information Science
No ratings yet
267-Library and Information Science
56 pages
Bulu
No ratings yet
Bulu
47 pages
Mid-Semester Test Solutions: Information Retrieval
100% (2)
Mid-Semester Test Solutions: Information Retrieval
4 pages
NLP Mod-V Q - A (Uploaded by Snaptricks - In)
No ratings yet
NLP Mod-V Q - A (Uploaded by Snaptricks - In)
7 pages
Objective Questions On Research Methodology
No ratings yet
Objective Questions On Research Methodology
13 pages
Unit III
No ratings yet
Unit III
37 pages
End Sem Paper
No ratings yet
End Sem Paper
2 pages
Irs Ia 1
No ratings yet
Irs Ia 1
12 pages
CashAds Listing Creation SOP COMPLETE - MD
No ratings yet
CashAds Listing Creation SOP COMPLETE - MD
11 pages
Tiktok Lecture
No ratings yet
Tiktok Lecture
27 pages
Numpy
No ratings yet
Numpy
8 pages
IR
No ratings yet
IR
5 pages
Chemistry Paper 1 (2018) - Booklet
No ratings yet
Chemistry Paper 1 (2018) - Booklet
17 pages
Primary
No ratings yet
Primary
34 pages
BFD Formula Sheet
No ratings yet
BFD Formula Sheet
20 pages
Anthropology Template Consent Form Students Updated May 2024
No ratings yet
Anthropology Template Consent Form Students Updated May 2024
3 pages
jf5058 00
No ratings yet
jf5058 00
2 pages
Yielding and Failure Criteria in Materials
No ratings yet
Yielding and Failure Criteria in Materials
3 pages
Signing Naturally Homework Answers Unit 7
100% (1)
Signing Naturally Homework Answers Unit 7
4 pages
DRNB CRITICAL CARE MEDICINE Paper3
100% (4)
DRNB CRITICAL CARE MEDICINE Paper3
4 pages
C Programming 1 Semester 2017-2018
No ratings yet
C Programming 1 Semester 2017-2018
46 pages
Five Pillars Worksheet
100% (1)
Five Pillars Worksheet
3 pages
Bilingual Education Lesson Plan
No ratings yet
Bilingual Education Lesson Plan
6 pages
Modern Theory of Dynamical Systems A Tribute To Dmitry Victorovich Anosov Contemporary Mathematics Anatole Katok Editor PDF Download
No ratings yet
Modern Theory of Dynamical Systems A Tribute To Dmitry Victorovich Anosov Contemporary Mathematics Anatole Katok Editor PDF Download
85 pages
Understanding Kenshō in Zen Buddhism
No ratings yet
Understanding Kenshō in Zen Buddhism
8 pages
William Levi
No ratings yet
William Levi
17 pages
Non-Theatrical Exploitation Agreement
No ratings yet
Non-Theatrical Exploitation Agreement
5 pages
Differential Pressure Transmitter Specs
No ratings yet
Differential Pressure Transmitter Specs
1 page
Zen Wellness App: Revolutionizing Massage Services
No ratings yet
Zen Wellness App: Revolutionizing Massage Services
34 pages
Day 1 Alcatel-Lucent Intro v2.2 PDF
100% (1)
Day 1 Alcatel-Lucent Intro v2.2 PDF
40 pages
Catalogue Citron
No ratings yet
Catalogue Citron
115 pages
Electrical Machines
No ratings yet
Electrical Machines
6 pages
CH 9
No ratings yet
CH 9
25 pages
Addressing - Part 1: Conceptual Model: ISO/TC 211 N
No ratings yet
Addressing - Part 1: Conceptual Model: ISO/TC 211 N
23 pages
Logic Your Way Into Writing Syllabus
No ratings yet
Logic Your Way Into Writing Syllabus
6 pages
Stone Circles A Field Guide Colin Richards Vicki Cummings Download
No ratings yet
Stone Circles A Field Guide Colin Richards Vicki Cummings Download
34 pages
Leading Culture
100% (1)
Leading Culture
19 pages
Get Studying With PQRST
No ratings yet
Get Studying With PQRST
17 pages
Cyber Law
No ratings yet
Cyber Law
20 pages
Foxwell Automaster Pro Series Manual English V2.0
No ratings yet
Foxwell Automaster Pro Series Manual English V2.0
74 pages
Camry 2007 Rear Bumper Assembly AllData
No ratings yet
Camry 2007 Rear Bumper Assembly AllData
9 pages
Letter of Intent for Hindustan Unilever
No ratings yet
Letter of Intent for Hindustan Unilever
1 page