Information Retrieval and Search
Information Retrieval and Search
Information retrieval
El proceso de recuperación se lleva a cabo mediante consultas a la base de datos donde se
store the structured information using an appropriate query language. It is
it is necessary to consider the key elements that enable the search,
determining a higher degree of relevance and accuracy, such as: indexes, words
key, thesauri and the phenomena that can occur in the process such as noise and
documentary silence. One of the problems that arise in the search for information is whether
what we recover is "much or little" that is to say, depending on the type of search it
they can recover a multitude of documents or just a very small number. To this
The phenomenon is called Documentary Silence or Noise.
Essential components
Tools
Databases
1
Internet
Electronic journals
Search engines. Search engines are tools that allow you to locate and retrieve the
information stored on the internet. The operation is similar to databases
data, store the pages with certain characteristics (metadata) and that
subsequently, after using some keywords, they issue a list of the most
relevant.
General search engines
. Googthe([Link]
. All the web[Link]
. AltaVista[Link]
. Excite[Link]
. Infoseek([Link]
. Lycthe([Link]
. Web crawlr[Link]
. Hotboot[Link]
Directories. Directories are organized lists that allow us to access the
structured and hierarchical information. They are classified into categories and the
usuario enlaza de lo más general a lo más específico
Recommended for searches where the user does not know much
on the specific topic
. The directory of Googthe([Link]
. Ozú ([Link]
. The index[Link]
. Yeshoo[Link]
Directory and specialized engines
. Humbul[Link]
. Librarian Index to the Internett[Link]
. Internet Public Library[Link]
. Scvirus[Link]
. Search4Science[Link]
Meta-search engines. They are search engines that not only search in a
unique database, but when introducing the search concepts it makes the
swept through different databases, in this way the breadth of results is
mayor.
oVivisimo [Link]
oDogpile[Link]
oKartoo[Link]
oQbsearch([Link]
oMetacrawler: ([Link]
Selective search engines. They use a specialized database in a subject.
oAsk[Link]
oTegrandmother([Link]
oElectric Library[Link]
the Sacred Gamos[Link]
Program to search
2
Copernic[Link]
Intelligent agents. Intelligent agents are tools that allow
automatically locate information, it only needs a profile to be defined
of search and where it should be launched (databases, websites, etc.) and,
automatically presents a report on the new information that is
rising.
oBookWhere[Link]
oBullsEye Pto[Link]
oWebSeeker 5[Link]
oWebFerret[Link]
Indexes.
Index of subjects: terms ordered according to the subjects covered by the database,
the search engine, etc.
Alphabetical index: list of terms alphabetically
KWIC Index: A type of permuted index in which the thematic content of a
The work is represented by keywords from its title or another source of
document information.
KWOC Index: A type of permuted index that varies in its presentation regarding the
KWIC index, where the keywords appear as a heading in line
separated. Under each heading appears the entirety of the titles, complete or
truncated, containing the keyword of what it is about.
Keywords.
Significant term in natural language that represents the content of the document.
In the search for information, this option is essential as it allows us to narrow down and specify.
information. The problem lies in defining the exact word that represents the content, for
It is advisable to use specifiers. For example, if we use the word flower in
any search engine we could be looking for, the nearest florist, an image of
flowers or a study on flowers in the different seasons of the year.
Thesaurus
3
It is a controlled terminology list about an area or field of knowledge that
maintains semantic and generic relationships with each other.
Components:
Relationships:
Languages
Each retrieval system has its own query language, which is what allows it to
allows you to 'speak' in the same language as the database. This language, like any other
another has its own syntax that specifies the special features of the search
determining at every moment the relationship that the search elements have. The
Grammar rules in the language of interrogation are the operators.
There are no guidelines that tell us how to accurately conduct all searches due to
that each query is different. That's why it's advisable to define a basic procedure for
work
4
Applying the keywords in the selected search tools
Simple equations
Composite equations
Operators
Logical or Boolean: They allow converting the words of the query into sets
mathematicians, and operate with words as if they were sets. The operations
The basics are addition (OR), subtraction (NOT), and multiplication (AND).
logical OR (AND)
logical NOT
logical (OR)
Positionals: They allow specifying the position of words within the document.
oCerca (NEAR)
Joint
oPhrases
Existence: Indicates when the presence or absence of a word is required in the
recovered documents.
Presence / Absence
Absence
Accuracy: This type of operator is used when the intended query is
less specific since it allows for the possibility of truncating a search word to
its root.
Proximity
through fields
Navigation is the program that allows you to consult and obtain information through the
hypertext systems.
Differences
The essential difference between both concepts lies in the way of obtaining information;
while in information retrieval it is obtained linearly, navigation
it has the ability to obtain information through hypertext. This means that,
the acquisition of knowledge takes place gradually and depending on the interest of
the user delves into the information nodes on one subject or another.
5
Search engines Vs. Directories
The information is updated by hand
The information is updated.
human who registers in the directory when
automatically over the network.
create a website.
They do not store all web content,
They gather all the stored information
only the most relevant fields such as
on the page.
the title, the keywords, etc.
They store information through directories.
own database. classified into categories.
The search is conducted in the database. The search is conducted hierarchically according to
through the search equation. the established categories.
The presentation of the results is carried out.
establish by order of relevance through a list of all documents
some established criteria in the corresponding category equation, without any
of search. presentation criterion.
Suitable for locating informationSuitable for locating general information
specific. about a topic.
Metadata
Quality of recovery
The following are some basic criteria for the recovery carried out
quality sea.
6
Relevance rate: coefficient that arises from dividing the number of documents
relevant retrieved, about the total number of retrieved documents
Relevance: It is the quality of the retrieved document to adapt to the
information needs.
Relevance rate: coefficient that arises from dividing the number of documents
relevant retrieved, about the total number of retrieved documents
Precision: it is the ability of the search system to coordinate the equation
with the most relevant documents. Otherwise, they are those documents
relevant retrieved.
Precision rate: coefficient that arises from dividing the number of documents
relevant retrieved, about the total number of documents in the collection