0% found this document useful (0 votes)

75 views6 pages

Course Name: Advanced Information Retrieval

The document discusses inverted indexes. It begins by defining an inverted index as a database index that maps content like words or numbers to their locations in documents. It notes inverted indexes allow for fast full-text searches but at the cost of increased processing when adding documents. There are two types: record-level indexes containing word to document mappings, and word-level containing word positions within documents. Advantages are fast searches and common use in search engines, while disadvantages are large storage and maintenance costs for updates. Python code is provided as an implementation example using CountVectorizer.

Uploaded by

jewar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views6 pages

Course Name: Advanced Information Retrieval

Uploaded by

jewar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

JIMMA UNIVERSITY

JIMMA INSTITUTE OF TECHNOLOGY

FACULTY OF COMPUTING AND
INFORMATICS
MSC. IN INFORMATION SCIENCE
(ELECTRONIC AND DIGITAL RECOURSE
MANAGEMENT)

Course Name: Advanced Information Retrieval

Assignment 2: Inverted Index

Prepared by: Ruth Wondu

Submitted to: Dr Getachew

January, 2021
1. Definition of Inverted Index
an inverted index (also referred to as a postings file or inverted file) is a database index storing a
mapping from content, such as words or numbers, to its locations in a table, or in a document or
a set of documents (named in contrast to a forward index, which maps from documents to
content). The purpose of an inverted index is to allow fast full-text searches, at a cost of
increased processing when a document is added to the database. The inverted file may be the
database file itself, rather than its index. It is the most popular data structure used in document
retrieval systems used on a large scale for example in search engines. Additionally, several
significant general-purpose mainframe-based database management systems have used inverted
list architectures, including ADABAS, DATACOM/DB, and Model 204.

Alternate names for ‘inverted index’ are ‘postings file’ and ‘inverted file’. In computer science,
this is an index data structure that stores a mapping from content, like words or numbers. Its
place of storage is its locations within a document or set of documents. This is in stark contrast to
a ‘forward index’, whose purpose is to map from documents to content. To put simply, it’s a
hash map-like data structure that guides you from a word to either a document or a web page
.The objective of an inverted index is to permit quick full-text searches. What’s more, they can
do so at a cost of increasing processing whenever a document goes on the database. The inverted
file may very well be the database file itself instead of its index. It is, inarguably, the most
popular data structure that document retrieval systems use. It is especially useful on a large scale;
in search engines, for instance.

There are two types of inverted indexes: A record-level inverted index contains a list of
references to documents for each word. A word-level inverted index additionally contains the
positions of each word within a document. The latter form offers more functionality, but needs
more processing power and space to be created. Suppose we want to search the texts “hello
everyone,” “this article is based on inverted index,” “which is hash map like data structure”. If
we index by (text, word within the text), the index with location in text is:

1. A record-level inverted index – Alternatively an ‘inverted file index’ or just ‘inverted file’.
This variant contains a list of references to documents for each word.

2. A word-level inverted index – Alternatively ‘full inverted index’ or ‘inverted list’. This
variant contains the positions of each word that exist within a document. This particular form

1|Page
provides more functionality (like phrase searches). However, it requires additional processing
power and space in order to achieve creation.

An inverted index is an index data structure storing a mapping from content, such as words or
numbers, to its locations in a document or a set of documents. In simple words, it is a hash map
like data structure that directs you from a word to a document or a web page.

2. Advantages and Disadvantages

Advantages

 Inverted index is to allow fast full text searches, at a cost of increased processing when a
document is added to the database.
 It is easy to develop.
 It is the most popular data structure used in document retrieval systems, used on a large
scale for example in search engines.

Disadvantages

 Large storage overhead and high maintenance costs on update, delete and insert.
 Large storage expense and tremendous maintenance costs on each update, delete and
insert.

2|Page
3. Implementation using any programming language (python/java ...). I recommend you
python programming language

import nltk

import sklearn

import sklearn.feature_extraction

vectorizer = sklearn.feature_extraction.text.CountVectorizer(min_df=1)

documents =['Computer program used to retrieve digital information.',

'Software is necessary for users to access digital information.',

'Digital ICT is communication through computer-based systems.',]

In = vectorizer.fit_transform(documents).toarray()

print('In={0}'.format(In))

3|Page
print('{0}'.format(vectorizer.vocabulary_))

4|Page
Reference
1. inverse-document-frequency.https://moz.com/blog/inverse-document-frequency-and-
the-importance-of-uniqueness Assessed on 19-01-2021
2.inverse-documenthttps://nlp.stanford.edu/IR-book/html/htmledition/inverse-
document-frequency-1.html Assessed on 2021

5|Page

Ir Mod4 Notes
No ratings yet
Ir Mod4 Notes
19 pages
Inverted Index
No ratings yet
Inverted Index
13 pages
Assignment No: 3: Aim: Objective: Theory:-Inverted Index
No ratings yet
Assignment No: 3: Aim: Objective: Theory:-Inverted Index
2 pages
Indexing 1
No ratings yet
Indexing 1
61 pages
Unit 3 Indexing
100% (1)
Unit 3 Indexing
10 pages
Text Indexing
No ratings yet
Text Indexing
11 pages
(Wiki) Inverted Index
No ratings yet
(Wiki) Inverted Index
3 pages
IRS Module 5
No ratings yet
IRS Module 5
24 pages
Unit 2 IR
No ratings yet
Unit 2 IR
13 pages
Chapter 3 Indexing
No ratings yet
Chapter 3 Indexing
48 pages
Indexing Structures Explained
No ratings yet
Indexing Structures Explained
44 pages
IR Chap3
No ratings yet
IR Chap3
45 pages
Unit 2
No ratings yet
Unit 2
10 pages
Inverted Index Construction Explained
No ratings yet
Inverted Index Construction Explained
21 pages
L05
No ratings yet
L05
33 pages
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
100% (1)
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
34 pages
IR Chapter Three
No ratings yet
IR Chapter Three
30 pages
Learning Guide Unit 2
No ratings yet
Learning Guide Unit 2
15 pages
115 Ir 9
No ratings yet
115 Ir 9
4 pages
4 Indexing
No ratings yet
4 Indexing
59 pages
Indexing for Efficient Retrieval
No ratings yet
Indexing for Efficient Retrieval
26 pages
Indexing and Searching Techniques
No ratings yet
Indexing and Searching Techniques
15 pages
IRS Module5-I
No ratings yet
IRS Module5-I
15 pages
IR Chapter Three
No ratings yet
IR Chapter Three
59 pages
3 Indexing
No ratings yet
3 Indexing
28 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Understanding Indexing Structures
No ratings yet
Understanding Indexing Structures
38 pages
Chapter 4 IR
No ratings yet
Chapter 4 IR
56 pages
Chapter 3,4, 5 and 6
No ratings yet
Chapter 3,4, 5 and 6
145 pages
Completed UNIT-III 20.9.17
No ratings yet
Completed UNIT-III 20.9.17
61 pages
IR Unit 2 Dictionaries and Query Processing
No ratings yet
IR Unit 2 Dictionaries and Query Processing
20 pages
Heaps Law Linguistic Pre-Processing Index Terms
No ratings yet
Heaps Law Linguistic Pre-Processing Index Terms
8 pages
Unit II-1
No ratings yet
Unit II-1
57 pages
Lecture 2 Inverted Index PDF
No ratings yet
Lecture 2 Inverted Index PDF
24 pages
Spelling Correction in IR Systems
No ratings yet
Spelling Correction in IR Systems
36 pages
Inverted Index Search Engine Guide
No ratings yet
Inverted Index Search Engine Guide
11 pages
Chap 5
No ratings yet
Chap 5
64 pages
FOP Efficiency Indexing 13
No ratings yet
FOP Efficiency Indexing 13
22 pages
4 Indexing
No ratings yet
4 Indexing
29 pages
Information Retrievalpdf
No ratings yet
Information Retrievalpdf
7 pages
Chap5 Index Construction
No ratings yet
Chap5 Index Construction
38 pages
Inverted File
No ratings yet
Inverted File
20 pages
3-Index Construction
No ratings yet
3-Index Construction
43 pages
Indexing: 1. Static and Dynamic Inverted Index
50% (2)
Indexing: 1. Static and Dynamic Inverted Index
55 pages
Creating and Optimizing Inverted Indexes
No ratings yet
Creating and Optimizing Inverted Indexes
76 pages
Efficient IR Indexing Techniques
No ratings yet
Efficient IR Indexing Techniques
153 pages
IR Chapter Three
No ratings yet
IR Chapter Three
30 pages
Certificate: T.Y.Bsc Cs
No ratings yet
Certificate: T.Y.Bsc Cs
120 pages
Chapter - 3 and 4
No ratings yet
Chapter - 3 and 4
47 pages
Ir Chapter Three
No ratings yet
Ir Chapter Three
41 pages
4.index Construction - New
No ratings yet
4.index Construction - New
46 pages
Index Construction Guide
No ratings yet
Index Construction Guide
43 pages
3 Index Construction
No ratings yet
3 Index Construction
43 pages
2 Introduction To Information Retrieval
No ratings yet
2 Introduction To Information Retrieval
38 pages
Course Name: Advanced Information Retrieval
No ratings yet
Course Name: Advanced Information Retrieval
6 pages
Item/Device Specification Unit QT Y: Cat 6 UTP Cable RJ-45 Patch Panel
No ratings yet
Item/Device Specification Unit QT Y: Cat 6 UTP Cable RJ-45 Patch Panel
4 pages
Foundation of Knowledge Management
No ratings yet
Foundation of Knowledge Management
1 page
Affordable Smart Home Solutions
100% (1)
Affordable Smart Home Solutions
11 pages
Word Is A Powerful Tool Used To Create Professional Looking Documents
No ratings yet
Word Is A Powerful Tool Used To Create Professional Looking Documents
2 pages
Item/Device Specification Unit QT Y: Cat 6 UTP Cable RJ-45 Patch Panel
No ratings yet
Item/Device Specification Unit QT Y: Cat 6 UTP Cable RJ-45 Patch Panel
4 pages
Copier Machine Specification
No ratings yet
Copier Machine Specification
6 pages
Foundation of Knowledge Management
No ratings yet
Foundation of Knowledge Management
1 page
(13-74) Galaxy Note3 Unlocking For Reactivation Lock Guide Rev 5 0
No ratings yet
(13-74) Galaxy Note3 Unlocking For Reactivation Lock Guide Rev 5 0
5 pages
Projector Types: Pros & Cons
No ratings yet
Projector Types: Pros & Cons
3 pages
Lesson 4 Saving Presentations
No ratings yet
Lesson 4 Saving Presentations
16 pages
Hard Drive Replacement Orthoscan C-Arm
No ratings yet
Hard Drive Replacement Orthoscan C-Arm
7 pages
Religious Education Grade 12 Final Booklet PDF - PDF - Afterlife - Marriage
No ratings yet
Religious Education Grade 12 Final Booklet PDF - PDF - Afterlife - Marriage
37 pages
Empowerment Technology: Learner'S Activity Sheet Quarter 2 - Week 8: Integration of Ict Knowledge
No ratings yet
Empowerment Technology: Learner'S Activity Sheet Quarter 2 - Week 8: Integration of Ict Knowledge
9 pages
Ensayo Sobre La Energía Eólica
100% (1)
Ensayo Sobre La Energía Eólica
4 pages
Instructions For Updating The Printer Firmware in Chipless, Method 5
No ratings yet
Instructions For Updating The Printer Firmware in Chipless, Method 5
5 pages
Introduction To Docker: Ajeet Singh Raina Docker Captain - Docker, Inc
No ratings yet
Introduction To Docker: Ajeet Singh Raina Docker Captain - Docker, Inc
56 pages
C IBP 2311-Demo
No ratings yet
C IBP 2311-Demo
4 pages
NUENDO Manual-Getting Started
No ratings yet
NUENDO Manual-Getting Started
258 pages
Toshiba Mini NB305 Detailed Product Specification: Operating Non-Operating
No ratings yet
Toshiba Mini NB305 Detailed Product Specification: Operating Non-Operating
4 pages
Photoshop Action Installation Guide
No ratings yet
Photoshop Action Installation Guide
3 pages
Grow Your Gaming Channel in 2023
No ratings yet
Grow Your Gaming Channel in 2023
6 pages
Ecommission SmartX Controllers Specification Sheet
No ratings yet
Ecommission SmartX Controllers Specification Sheet
4 pages
8 Powerful Icon Libraries
No ratings yet
8 Powerful Icon Libraries
10 pages
Whirligig (Free Version) : Features
No ratings yet
Whirligig (Free Version) : Features
13 pages
Spotlight On Oracle Getting Started
No ratings yet
Spotlight On Oracle Getting Started
115 pages
Thirumana Porutham Guide in Tamil
50% (2)
Thirumana Porutham Guide in Tamil
2 pages
Nesbitt, Grammar-Land
No ratings yet
Nesbitt, Grammar-Land
145 pages
Kplug Censor Installation
No ratings yet
Kplug Censor Installation
3 pages
DD Boost
No ratings yet
DD Boost
12 pages
Element
No ratings yet
Element
282 pages
Excel 2007: Using Templates Guide
No ratings yet
Excel 2007: Using Templates Guide
3 pages
How To Build MD Check @free - Update24
No ratings yet
How To Build MD Check @free - Update24
17 pages
A42479-BA - Uk - Production User Interface Quick Guide
No ratings yet
A42479-BA - Uk - Production User Interface Quick Guide
46 pages
Power MGT in XP
No ratings yet
Power MGT in XP
2 pages
The International Journal of Digital Curation
No ratings yet
The International Journal of Digital Curation
27 pages
History of Surround Sound Audio
No ratings yet
History of Surround Sound Audio
9 pages
Outsource Work Time Report Summary
No ratings yet
Outsource Work Time Report Summary
32,767 pages

Course Name: Advanced Information Retrieval

Uploaded by

Course Name: Advanced Information Retrieval

Uploaded by

JIMMA UNIVERSITY

JIMMA INSTITUTE OF TECHNOLOGY

Course Name: Advanced Information Retrieval

Assignment 2: Inverted Index

Prepared by: Ruth Wondu

Submitted to: Dr Getachew

2. Advantages and Disadvantages

documents =['Computer program used to retrieve digital information.',

'Software is necessary for users to access digital information.',

'Digital ICT is communication through computer-based systems.',]

You might also like