PIR MEHR ALI SHAH ARID AGRICULTURE UNIVERSITY
University Institute of Information Technology
CS-802 Information retrieval
Credit Hours: 3(3-0) Prerequisites: None
Course Learning Outcomes (CLOs)
At the end of course the students will be able to: Domain BT Level*
1. Understand the basic concepts of the information C 1
retrieval.
2. Learn tools and techniques to do cutting-edge research in C 2
the area of information retrieval or text mining.
3. Identify the involvement of the information retrieval in C 3
modern life style & social media
4. Get hands on project experience by developing real- C 4
world applications, such as intelligent tools for
improving search accuracy from user feedback, email
spam detection, recommendation system, or scientific
literature organization and mining.
*BT- Bloom’s Taxonomy, C=Cognitive domain, P=Psychomotor domain, A=Affective domain
Course Contents:
Information retrieval is the process through which a computer system can respond to a user's
query for text-based information on a specific topic. IR was one of the first and remains one of
the most important problems in the domain of natural language processing (NLP). Web search
is the application of information retrieval techniques to the largest corpus of text anywhere --
the web -- and it is the area in which most people interact with IR systems most frequently.
In this course, we will cover basic and advanced techniques for building text-based
information systems, including the following topics:
Efficient text indexing
Boolean and vector-space retrieval models
Evaluation and interface issues
IR techniques for the web, including crawling, link-based algorithms, and metadata
usage
Document clustering and classification
Traditional and machine learning-based ranking approaches
Course Objective:
By the end of this course the student should:
understand the theoretical basis behind the standard models of IR (Boolean,
Vector-space, Probabilistic and Logical models),
understand the difficulty of representing and retrieving documents, images,
speech, etc.,
be able to implement, run and test a standard IR system,
understand the standard methods for Web indexing and retrieval,
understand how techniques from natural language processing, artificial
intelligence, human-computer interaction and visualization integrate with IR, and
be familiar with various algorithms and systems.
Teaching Methodology:
Lectures, Written Assignments, Practical labs, Semester Project, Presentations
Courses Assessment:
Exams, Assignments, Quizzes. Course will be assessed using a combination of written
examinations.
Reference Materials:
There are several good textbooks for the topic of information retrieval. The first book listed
below is our official textbook, and the others are recommended references.
1. Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan, and
Hinrich Schuetze, Cambridge University Press, 2007.
2. Search Engines: Information Retrieval in Practice. Bruce Croft, Donald Metzler, and
Trevor Strohman, Pearson Education, 2009.
3. Modern Information Retrieval. Baeza-Yates Ricardo and Berthier Ribeiro-Neto. 2nd
edition, Addison-Wesley, 2011. 1 SYLLABUS IFORMATION RETRIEVAL
4. Information Retrieval: Implementing and Evaluating Search Engines. Stefan Buttcher,
Charlie Clarke, Gordon Cormack, MIT Press, 2010.
Week Contents Theory
1 Introduction to Information Retrieval
Motivation
Information Retrieval vs Data Retrieval
Flashback
2 Models of Information Retrieval
Boolean Model
Vector Space Model
Probabilistic Model
Alternative Models
3 Retrieval Evaluation
Recall and Precision
Alternative Measures
Reference Collections and Evaluation of IR systems
4 Query Languages for IR
Keywords
Boolean Queries
Context Queries
Natural Language Queries
Structural Queries
5 Advanced Query Operations
Relevance Feedback
Query Expansion
Automatic Local Analysis
Automatic Global Analysis
6 Text Indexing, Preprocessing and File Organization
Stopwards, stemming, thesauri
File (Text) organization (invert,suff)
Text statistics (properties)
Text compression
7 Text Searching
Knuth-Morris-Pratt
Boyer-Moore family
Suffix automaton
Phrases and Proximity
8 Document Clustering
MID TERM
9 Multimedia Information Retrieval
Similarity Queries
Feature-based Indexing and Searching
Spatial Access Methods
Searching in Multidimensional Spaces
10 Parallel and Distributed IR
Architectures MIMD and SIMD
Collection Partitioning
Source Selection
Query Processing
Peer-2-Peer Architectures and Systems
11 Meta-Ranking
Integrated vs Isolated Methods
Interleaving
Voting
12 Web Search
History of Web
Indexing
Spidering/Crawling
Link Analysis (HITS, PageRank)
13 User Interfaces and Visualization
14 Link Analysis
Ranking the web frontier
The WebGraph framework I: Compression techniques
Extrapolation methods for accelerating PageRank computations
Searching the workplace web
15 Crawling and near-duplicate pages
Mercator: A scalable, extensible web crawler.
A standard for robot exclusion
16 Search applications
Introduce modern applications in search systems, including recommendation,
personalization, and online advertising, if time allows.
Final Exam