Information Retrieval & Search
Engines
Instructor: Prof. Shereen Taie
Information Retrieval & Search Engines
BIS216E
Course: Information Retrieval & Search Engines
Course References
• Textbook:
Essential Books:
– SEO 2018: Learn search engine optimization with
smart internet marketing strategies Adam Clarke,
Simple Effectiveness Publishing, 2018.
Recommended Books:
- Search Engine Optimization All-in-One For
Dummies by Bruce Clay (Author), Kristopher B.
Jones (Author) 2022 For Dummies (Business &
Personal Finance)) 4th Edition.
Course: Information Retrieval & Search Engines
Assessment of
Participants
Assessment will be based on the following deliverables:
• Week 7-
• Mid Term Exam: (15 grades)
• Assignments (15 Grades)
• Week 12
• Evaluation (20 grades) includes (practical
assignments + quizzes)
• Participation (10 Grades)
• End-of-Term-Exam: (40 grades)
For success:
Achieving 50% of total score & achieving at least 12 out of
40 at the Final exam.
Course: Information Retrieval & Search Engines
Group project:
• The aim of this project is to help students to develop a simple Search
Engine.
• Groups form 3 to 5
• The features will be described as separated tasks in the lab.
• Present your project to the on week 12
• Presentations should be no longer than 15 minutes.
Course: Information Retrieval & Search Engines
Introduction to
Information Retrieval
Introducing Information Retrieval
and & Search Engines
Course: Information Retrieval & Search Engines
Information Retrieval
• Information Retrieval (IR) is finding material (usually documents) of
an unstructured nature (usually text) that satisfies an information
need from within large collections (usually stored on computers).
– These days we frequently think first of web search,
but there are many other cases:
• E-mail search
• Searching your laptop
• Corporate knowledge bases
• Legal information retrieval
6
Course: Information Retrieval & Search Engines
The problem of IR
• Goal = find documents relevant to an information
need from a large document set
Inf
o.
ne
Query ed
IR
Document Retrieval
system
collection Answer list
7
Course: Information Retrieval & Search Engines
Example
Google
Web
8
Course: Information Retrieval & Search Engines
What is a Document?
• Examples:
– web pages, email, books, news stories, scholarly
papers, text messages, Word, Powerpoint, PDF,
forum postings, patents, IM sessions, etc.
• Common properties
– Significant text content
– Some structure (e.g., title, author, date for papers;
subject, sender, destination for email)
Course: Information Retrieval & Search Engines
Documents vs. Database
Records
• Database records (or tuples in relational databases) are typically
made up of well-defined fields (or attributes)
– e.g., bank records with account numbers,
balances, names, addresses, social security
numbers, dates of birth, etc.
• Easy to compare fields with well-defined semantics to queries in
order to find matches
• Text is more difficult
Course: Information Retrieval & Search Engines
Documents vs. Records
• Example bank database query
– Find records with balance > $50,000 in branches
located in Amherst, MA.
– Matches easily found by comparison with field values
of records
• Example search engine query
– bank scandals in western mass
– This text must be compared to the text of entire news
stories
Course: Information Retrieval & Search Engines
Unstructured (text) vs. structured
(database) data in the mid-nineties
12
Course: Information Retrieval & Search Engines
Unstructured (text) vs. structured
(database) data today
13
Course: Information Retrieval & Search Engines
Sec. 1.1
Basic assumptions of
Information Retrieval
• Collection: A set of documents
– Assume it is a static collection for the
moment
• Goal: Retrieve documents with information
that is relevant to the user’s information need
and helps the user complete a task
14
Course: Information Retrieval & Search Engines
The classic search model
User task Get rid of mice in a
politically correct way
Misconception?
Info need
Info about removing mice
without killing them
Misformulation?
Search
Query how trap mice
alive
Search
engine
Query Results
Collection
refinement
Course: Information Retrieval & Search Engines