0% found this document useful (0 votes)
4 views

Unit 2.3 Vector Model

Uploaded by

Sai Buvanesh
Copyright
© © All Rights Reserved
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Unit 2.3 Vector Model

Uploaded by

Sai Buvanesh
Copyright
© © All Rights Reserved
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Vector Model

Unit II
Introduction
• Vector space model or term vector model is
an algebraic model for representing text
documents (and any objects, in general)
as vectors of identifiers (such as index terms).
• It is used in information filtering, information
retrieval, indexing and relevancy rankings. Its
first use was in the SMART Information
Retrieval System.
Definition

Each dimension corresponds to a separate term. If a term occurs in the document, its value in
the vector is non-zero. Several different ways of computing these values, also known as (term)
weights, have been developed. One of the best known schemes is tf-idf weighting (see the
example below).
The definition of term depends on the application. Typically terms are single words, keywords,
or longer phrases. If words are chosen to be the terms, the dimensionality of the vector is the
number of words in the vocabulary (the number of distinct words occurring in the corpus).
Vector operations can be used to compare documents with queries.
Vector Model
• It recognizes that Boolean matching is too
limiting and proposes a framework in which
partial matching is possible.
Ranking Function
Advantages
• Its term-weighting scheme improves retrieval
quality
• Its partial matching strategy allows retrieval of
documents that approximate the query
conditions.
• Its cosine ranking formula sorts the documents
according to their degree of similarity to the query
• Document length normalization is naturally built in
into ranking.
Disadvantage
• The index terms are assumed to be mutually
independent
• In practice, leveraging term dependencies is
challenging and might lead to poor results, if
not done appropriately.

You might also like