0% found this document useful (0 votes)

25 views

Boolean Retrieval Model

Information Retrieval

Uploaded by

saajalkumarborhade

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Boolean Retrieval Model

Information Retrieval

Uploaded by

saajalkumarborhade

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Model is an idealization or abstraction of an actual process.

Information Retrieval models can

describe the computational process.

For example

1. how documents are ranked

2. Note that how documents or indexes are stored is implemented.

The goal of information retrieval (IR) is to provide users with those documents that will
satisfy their information need. Retrieval models can attempt to describe the human Process,
such as the information need, interaction.

Retrieval model can be categorize as

1. Boolean retrieval model

2. Vector space model
3. Probabilistic model
4. Model based on belief net

The Boolean model of information retrieval is a classical information retrieval (IR) model
and is the first and most adopted one. It is used by virtually all commercial IR systems today.

Exact vs Best match

In exact match a query specifies precise criteria. Each document either matches or fails to
match the query. The results retrieved in exact match is a set of document (without ranking).

In best match a query describes good or best matching documents. In this case the result is a
ranked list of document. The Boolean model here I’m going to deal with is the most common
exact match model.

Basic Assumption of Boolean Model

1. An index term is either present(1) or absent(0) in the document
2. All index terms provide equal evidence with respect to information needs.
3. Queries are Boolean combinations of index terms.
o X AND Y: represents doc that contains both X and Y
o X OR Y: represents doc that contains either X or Y
o NOT X: represents the doc that do not contain X
Consider below sentences,
1. I am a cow.
2. Cow is what I am.
3. Today is Tuesday.
Now, if I ask you a question — Can you tell the sentences which contain the term ‘cow’ but
not ‘Tuesday’?
As a human, it is easy for us to say that the answer will be sentence 1 and sentence 2.
But how to model this problem mathematically so that it can be solved by a machine?

Term-Document Incidence Matrix

The term-document incidence matrix is one of the basic techniques to represent text data
where,
> We get the unique words across all the documents.
> For each document, we add 1 if the term exists in the document otherwise fill 0 in the cell.

For the sentences, which we took in our problem statement, Term-Document Incidence Matrix
will look something like this:
Term-Document Incidence Matrix for the sentences — 1, 2 and 3.

Note : Words are normalized i.e. same word is not considered twice across all the
documents/sentences.

Boolean Retrieval Model

It is one of the application of this matrix where we can answer any query which is in the form
of a Boolean expression of terms, that is, in which terms are combined with the
operators and, or, and not.

For our query i.e. get the sentences which contain the term ‘cow’ but not ‘tuesday’,
> We will get the term vector, which is basically, the values from the row containing the term
in Term-Document Matrix. Example — For Cow, the vector will be [1,1,0].
> Perform a Bitwise AND operation between the vectors of the terms provided in the input
query.

Let’s apply the algorithm and see if we get the right answer.

1. Cow Vector = [1,1,0]

2. Tuesday Vector = [0,0,1].

3. Not Tuesday Vector = [1,1,0]. Not vector can be obtained by taking compliment of the
original vector.

Perform BITWISE AND OPERATION :

[1,1,0] & [1,1,0] => [1,1,0]

Inference from the result :

In the result obtained from BITWISE AND operation, the indices for which 1 is present,
those sentence satisfy the input query. Hence, sentence one and two contain the word ‘cow’
but not ‘tuesday’ and will be returned as result for the query.

Conclusion

Term-Document Incidence matrix is one of the basic mathematical model to represent texts
and it can be used to answer Boolean expression queries via model called Boolean Retrieval
Model. Below are the key points to consider:

1. It can answer any query which is a Boolean expression.

2. Views document as the set of terms.

3. Good precision since the documents are retrieved only if the condition is matched.

Advantages

 Clean formalism
 Easy to implement
 Intuitive concept
 If the resulting document set is either too small or too big, it is directly clear which
operators will produce respectively a bigger or smaller set.
 It gives (expert) users a sense of control over the system. It is immediately clear why a
document has been retrieved given a query.

Disadvantages

 Exact matching may retrieve too few or too many documents

 Hard to translate a query into a Boolean expression
 All terms are equally weighted
 More like data retrieval than information retrieval
 Retrieval based on binary decision criteria with no notion of partial matching
 No ranking of the documents is provided (absence of a grading scale)
 Information need has to be translated into a Boolean expression, which most users find
awkward
 The Boolean queries formulated by the users are most often too simplistic
 The model frequently returns either too few or too many documents in response to a user
query

Ströker - INVESTIGATIONS IN PHYLOSOPHY OF SPACE
100% (1)
Ströker - INVESTIGATIONS IN PHYLOSOPHY OF SPACE
331 pages
Luminiferous Aether
No ratings yet
Luminiferous Aether
111 pages
IR Unit II
No ratings yet
IR Unit II
4 pages
Unit Ii Modeling
No ratings yet
Unit Ii Modeling
15 pages
IR_MOD2_NOTES
No ratings yet
IR_MOD2_NOTES
26 pages
Irt-23 Unit 2
No ratings yet
Irt-23 Unit 2
10 pages
irs unit-4 modified
No ratings yet
irs unit-4 modified
13 pages
1 Information Retrieval System
No ratings yet
1 Information Retrieval System
10 pages
Information Retrieval
No ratings yet
Information Retrieval
9 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
IR Chapter 4
No ratings yet
IR Chapter 4
15 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
62 pages
DBMS Assignment
100% (1)
DBMS Assignment
11 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
67 pages
Proof Assistants: History, Ideas and Future: H Geuvers
No ratings yet
Proof Assistants: History, Ideas and Future: H Geuvers
23 pages
Information Retrieval - 1
No ratings yet
Information Retrieval - 1
47 pages
NLP UNIT-II(PART-I)
No ratings yet
NLP UNIT-II(PART-I)
19 pages
Faqcase: A Textual Case-Based Reasoning System: Aminu Bui Muhammad, Abba Almu
No ratings yet
Faqcase: A Textual Case-Based Reasoning System: Aminu Bui Muhammad, Abba Almu
8 pages
Text Classification MLND Project Report Prasann Pandya
No ratings yet
Text Classification MLND Project Report Prasann Pandya
17 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
Chapter 4
No ratings yet
Chapter 4
48 pages
Quora Question Pairs
No ratings yet
Quora Question Pairs
7 pages
News Article Text Classification and Summary for Authors and Topics
No ratings yet
News Article Text Classification and Summary for Authors and Topics
12 pages
Information Retrieval Algorithms: A Survey: Prabhakar Raghavan
No ratings yet
Information Retrieval Algorithms: A Survey: Prabhakar Raghavan
8 pages
Information Retrieval Models
No ratings yet
Information Retrieval Models
4 pages
Intelligent Q &A PDF
No ratings yet
Intelligent Q &A PDF
5 pages
Ontology Modeling and Object Modeling in Software Engineering
No ratings yet
Ontology Modeling and Object Modeling in Software Engineering
18 pages
2
No ratings yet
2
17 pages
Grammar Based Annotation Content Mapping in The Social Page in The Context of Short Text
No ratings yet
Grammar Based Annotation Content Mapping in The Social Page in The Context of Short Text
3 pages
Efficient Searching On Data Using Forward Search
No ratings yet
Efficient Searching On Data Using Forward Search
8 pages
UNIT 5
No ratings yet
UNIT 5
9 pages
NLP 4
No ratings yet
NLP 4
33 pages
Sat - 15.Pdf - Online Subjective Answer Checker
No ratings yet
Sat - 15.Pdf - Online Subjective Answer Checker
11 pages
IR Merged Merged
No ratings yet
IR Merged Merged
132 pages
Concurrent Context Free Framework For Conceptual Similarity Problem Using Reverse Dictionary
No ratings yet
Concurrent Context Free Framework For Conceptual Similarity Problem Using Reverse Dictionary
4 pages
Unit 2 Irt
No ratings yet
Unit 2 Irt
33 pages
Sem 22
No ratings yet
Sem 22
15 pages
Information Retrival List of Experiment - Odd Sem 2024-25
No ratings yet
Information Retrival List of Experiment - Odd Sem 2024-25
23 pages
Ans Key CIA 2 Set 1
No ratings yet
Ans Key CIA 2 Set 1
9 pages
Irs Unit III
No ratings yet
Irs Unit III
74 pages
Mcd r fe ynny
No ratings yet
Mcd r fe ynny
23 pages
Information Retrieval System and The Pagerank Algorithm
No ratings yet
Information Retrieval System and The Pagerank Algorithm
37 pages
Anti-Serendipity: Finding Useless Documents and Similar Documents
No ratings yet
Anti-Serendipity: Finding Useless Documents and Similar Documents
9 pages
Influential Vocabulary Detection
No ratings yet
Influential Vocabulary Detection
15 pages
Made By:-Bhawana Agarwal Cs Iiiyr
No ratings yet
Made By:-Bhawana Agarwal Cs Iiiyr
29 pages
Assignment 6
No ratings yet
Assignment 6
3 pages
Arguments and Results
No ratings yet
Arguments and Results
14 pages
Question Answering
No ratings yet
Question Answering
68 pages
Document Classification Using Machine Learning: What Is Document Classifier?
No ratings yet
Document Classification Using Machine Learning: What Is Document Classifier?
9 pages
Designing Machine Learning Systems With Python - Sample Chapter
100% (1)
Designing Machine Learning Systems With Python - Sample Chapter
31 pages
Hkansson 2006
No ratings yet
Hkansson 2006
10 pages
Information Retrieval Practical
No ratings yet
Information Retrieval Practical
10 pages
Problem Set 5 Instructions
No ratings yet
Problem Set 5 Instructions
8 pages
BC0057-Object Oriented Analysis & Design
No ratings yet
BC0057-Object Oriented Analysis & Design
7 pages
4 th unit
No ratings yet
4 th unit
13 pages
Topic Modelling: A Survey of Topic Models: Abstract-In Recent Years We Have Significant Increase
No ratings yet
Topic Modelling: A Survey of Topic Models: Abstract-In Recent Years We Have Significant Increase
12 pages
A Fuzzy Based Approach To Text Mining Anddocument Clustering
No ratings yet
A Fuzzy Based Approach To Text Mining Anddocument Clustering
10 pages
Tamrakar 2015
No ratings yet
Tamrakar 2015
6 pages
QB104762 2013 Regulation
No ratings yet
QB104762 2013 Regulation
2 pages
Affixation Adrian Tuarez
No ratings yet
Affixation Adrian Tuarez
5 pages
Sms Text Classification
No ratings yet
Sms Text Classification
10 pages
Introduction to Algorithms & Data Structures: A solid foundation for the real world of machine learning and data analytics
From Everand
Introduction to Algorithms & Data Structures: A solid foundation for the real world of machine learning and data analytics
Bolakale Aremu
No ratings yet
Year 9 5.3 Mathematics Assessment Task 1: Algebra Indices Products and Factors
No ratings yet
Year 9 5.3 Mathematics Assessment Task 1: Algebra Indices Products and Factors
5 pages
Unit 12: Idea of Quantification
No ratings yet
Unit 12: Idea of Quantification
15 pages
Chapter 18 Macro Reference
No ratings yet
Chapter 18 Macro Reference
145 pages
Math Anxiety Math Performance
No ratings yet
Math Anxiety Math Performance
17 pages
Seating Arrangements - Final
No ratings yet
Seating Arrangements - Final
67 pages
Slide Calipers
100% (1)
Slide Calipers
3 pages
Trades Math Workbook
No ratings yet
Trades Math Workbook
32 pages
Mathematics Grade 11 Revision_memo Term 1_2025
No ratings yet
Mathematics Grade 11 Revision_memo Term 1_2025
10 pages
Lesson 6: Objective: Represent Number Bonds With Composition and Decomposition Story Situations
No ratings yet
Lesson 6: Objective: Represent Number Bonds With Composition and Decomposition Story Situations
9 pages
Relations & Functions Marathon Class 11 Maths CBSE Shimon Sir Vedantu
No ratings yet
Relations & Functions Marathon Class 11 Maths CBSE Shimon Sir Vedantu
68 pages
Solidworks 1
No ratings yet
Solidworks 1
21 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
The Tracks of Egyptian Civilization
No ratings yet
The Tracks of Egyptian Civilization
21 pages
PLC 4
No ratings yet
PLC 4
40 pages
Resume Mechanicalengineer Fresher
No ratings yet
Resume Mechanicalengineer Fresher
1 page
Ross mathematics program 2023 application problems
No ratings yet
Ross mathematics program 2023 application problems
4 pages
Power Query M Formula Language
100% (1)
Power Query M Formula Language
56 pages
348 - 61275 - BA223 - 2019 - 1 - 1 - 1 - 2021 - Lecture Notes - Math 3
No ratings yet
348 - 61275 - BA223 - 2019 - 1 - 1 - 1 - 2021 - Lecture Notes - Math 3
176 pages
Tool of The Complete Optimal Control For Variable Speed Electrical Drives
No ratings yet
Tool of The Complete Optimal Control For Variable Speed Electrical Drives
36 pages
Zych Jeus G. Alfanta Bsbafm-4: Month of 2020 Stock Price
No ratings yet
Zych Jeus G. Alfanta Bsbafm-4: Month of 2020 Stock Price
2 pages
Ch 1 Textbook 1A to 1D
No ratings yet
Ch 1 Textbook 1A to 1D
12 pages
MTC-233 Python Programing Language I Slips Semester III ANSWER
No ratings yet
MTC-233 Python Programing Language I Slips Semester III ANSWER
169 pages
DM Computer Second Year Study Material
No ratings yet
DM Computer Second Year Study Material
522 pages
18ai62 Dip Jun Jul 2023 QP
No ratings yet
18ai62 Dip Jun Jul 2023 QP
2 pages
Bruce Hajek - Probability With Engineering Applications - Jan 2017
No ratings yet
Bruce Hajek - Probability With Engineering Applications - Jan 2017
291 pages
Tsi Test Content
No ratings yet
Tsi Test Content
3 pages
EE 371 Control Systems Exam II, Spring 1997 Solution: Ee Dept. Univ. of Nevada, Reno
No ratings yet
EE 371 Control Systems Exam II, Spring 1997 Solution: Ee Dept. Univ. of Nevada, Reno
3 pages
Algorithms and Bounds For Tower of Hanoi Problems On Graphs
No ratings yet
Algorithms and Bounds For Tower of Hanoi Problems On Graphs
91 pages