0% found this document useful (0 votes)
3 views7 pages

Information Retrieval and Search

The document discusses the process of information retrieval, emphasizing the importance of using various tools such as databases, search engines, and thesauri to effectively locate and recover information. It outlines key concepts like documentary silence and noise, as well as essential components of information retrieval systems, including structured documents and query languages. Additionally, it highlights the skills and competencies required for effective information searching and retrieval.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views7 pages

Information Retrieval and Search

The document discusses the process of information retrieval, emphasizing the importance of using various tools such as databases, search engines, and thesauri to effectively locate and recover information. It outlines key concepts like documentary silence and noise, as well as essential components of information retrieval systems, including structured documents and query languages. Additionally, it highlights the skills and competencies required for effective information searching and retrieval.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Information Retrieval and Search

Information retrieval is the next step after determining needs.


of information. It can be retrieved through different tools: databases,
Internet, thesauri, ontologies, maps... Knowing and managing these tools contributes
to a quality recovery.

Information retrieval
El proceso de recuperación se lleva a cabo mediante consultas a la base de datos donde se
store the structured information using an appropriate query language. It is
it is necessary to consider the key elements that enable the search,
determining a higher degree of relevance and accuracy, such as: indexes, words
key, thesauri and the phenomena that can occur in the process such as noise and
documentary silence. One of the problems that arise in the search for information is whether
what we recover is "much or little" that is to say, depending on the type of search it
they can recover a multitude of documents or just a very small number. To this
The phenomenon is called Documentary Silence or Noise.

Documentary silence: These are the documents stored in the database.


but have not been recovered, due to the search strategy being
too specific or that the keywords used are not appropriate for
define the search.
Document noise: These are the documents recovered by the system but that
they are not relevant. This usually happens when the search strategy has been defined
too generic.

Concept of information retrieval system

Process where access is made to previously stored information, through


computer tools that allow establishing specific search equations.
That information must have been structured prior to its storage.

Essential components

Structured documents. It is necessary to establish a process where...


indexing tools and terminology control.
Databases where the documents are stored. Define programming languages for
interrogation and operators that the database will support and, establish what type of
equations will be allowed.

Tools
Databases

1
Internet

Electronic journals
Search engines. Search engines are tools that allow you to locate and retrieve the
information stored on the internet. The operation is similar to databases
data, store the pages with certain characteristics (metadata) and that
subsequently, after using some keywords, they issue a list of the most
relevant.
General search engines
. Googthe([Link]
. All the web[Link]
. AltaVista[Link]
. Excite[Link]
. Infoseek([Link]
. Lycthe([Link]
. Web crawlr[Link]
. Hotboot[Link]
Directories. Directories are organized lists that allow us to access the
structured and hierarchical information. They are classified into categories and the
usuario enlaza de lo más general a lo más específico
Recommended for searches where the user does not know much
on the specific topic
. The directory of Googthe([Link]
. Ozú ([Link]
. The index[Link]
. Yeshoo[Link]
Directory and specialized engines
. Humbul[Link]
. Librarian Index to the Internett[Link]
. Internet Public Library[Link]
. Scvirus[Link]
. Search4Science[Link]
Meta-search engines. They are search engines that not only search in a
unique database, but when introducing the search concepts it makes the
swept through different databases, in this way the breadth of results is
mayor.
oVivisimo [Link]
oDogpile[Link]
oKartoo[Link]
oQbsearch([Link]
oMetacrawler: ([Link]
Selective search engines. They use a specialized database in a subject.
oAsk[Link]
oTegrandmother([Link]
oElectric Library[Link]
the Sacred Gamos[Link]
Program to search
2
Copernic[Link]
Intelligent agents. Intelligent agents are tools that allow
automatically locate information, it only needs a profile to be defined
of search and where it should be launched (databases, websites, etc.) and,
automatically presents a report on the new information that is
rising.
oBookWhere[Link]
oBullsEye Pto[Link]
oWebSeeker 5[Link]
oWebFerret[Link]

Indexing languages and terminological control

Indexes.

List of standardized terms that represent the content of a resource. Some


types are:

Index of subjects: terms ordered according to the subjects covered by the database,
the search engine, etc.
Alphabetical index: list of terms alphabetically
KWIC Index: A type of permuted index in which the thematic content of a
The work is represented by keywords from its title or another source of
document information.
KWOC Index: A type of permuted index that varies in its presentation regarding the
KWIC index, where the keywords appear as a heading in line
separated. Under each heading appears the entirety of the titles, complete or
truncated, containing the keyword of what it is about.

Keywords.

Significant term in natural language that represents the content of the document.

In the search for information, this option is essential as it allows us to narrow down and specify.
information. The problem lies in defining the exact word that represents the content, for
It is advisable to use specifiers. For example, if we use the word flower in
any search engine we could be looking for, the nearest florist, an image of
flowers or a study on flowers in the different seasons of the year.

Meta Keywords. Most search engines use to locate resources,


the keywords of each webpage. For this reason, it is essential that each page
have a label that includes the keywords that define it, it is also
important the exact definition of each one of them because it is based on these
search engines locate or not a resource.

Thesaurus

3
It is a controlled terminology list about an area or field of knowledge that
maintains semantic and generic relationships with each other.

Its main feature is that the terms are arranged hierarchically,


allowing for terminological precision in the search for information

Components:

Accepted or preferred descriptors: these are standardized terms (where


they have undergone a purging process denying plurals, avoiding synonyms, etc.)
that the thesaurus considers them suitable for assigning them to a document and that
subsequently facilitate recovery
Unauthorized descriptors: these are those that, although normalized, do not
they consider suitable for using them (they are usually synonyms, unused terms)
in the field of action, etc.)

Relationships:

Hierarchical: indicate when one term is more specific than another


Associative: Indicate that the terms have some relationship
Synonyms: Indicate that two terms are synonyms and which one is used as
admitted

Query languages and search equations

Languages

Each retrieval system has its own query language, which is what allows it to
allows you to 'speak' in the same language as the database. This language, like any other
another has its own syntax that specifies the special features of the search
determining at every moment the relationship that the search elements have. The
Grammar rules in the language of interrogation are the operators.

How to formulate a search strategy

There are no guidelines that tell us how to accurately conduct all searches due to
that each query is different. That's why it's advisable to define a basic procedure for
work

Presenting the topic from different points of view


Determining what is known about the topic
Formulating our search by:
the selection of keywords that represent what I'm looking for (to use
dictionaries, synonyms, thesauruses, ontologies, etc.
The translation of important words into other languages (English)
Selecting search tools (indexes, engines, metasearchers).
it is recommended to use different tools simultaneously.

4
Applying the keywords in the selected search tools

Simple equations

Composite equations

Operators

Logical or Boolean: They allow converting the words of the query into sets
mathematicians, and operate with words as if they were sets. The operations
The basics are addition (OR), subtraction (NOT), and multiplication (AND).

logical OR (AND)
logical NOT
logical (OR)
Positionals: They allow specifying the position of words within the document.
oCerca (NEAR)
Joint
oPhrases
Existence: Indicates when the presence or absence of a word is required in the
recovered documents.
Presence / Absence
Absence
Accuracy: This type of operator is used when the intended query is
less specific since it allows for the possibility of truncating a search word to
its root.
Proximity
through fields

Navigation versus Information Retrieval


Concept

Navigation is the program that allows you to consult and obtain information through the
hypertext systems.

Differences

The essential difference between both concepts lies in the way of obtaining information;
while in information retrieval it is obtained linearly, navigation
it has the ability to obtain information through hypertext. This means that,
the acquisition of knowledge takes place gradually and depending on the interest of
the user delves into the information nodes on one subject or another.

Directories versus Search Engines

5
Search engines Vs. Directories
The information is updated by hand
The information is updated.
human who registers in the directory when
automatically over the network.
create a website.
They do not store all web content,
They gather all the stored information
only the most relevant fields such as
on the page.
the title, the keywords, etc.
They store information through directories.
own database. classified into categories.
The search is conducted in the database. The search is conducted hierarchically according to
through the search equation. the established categories.
The presentation of the results is carried out.
establish by order of relevance through a list of all documents
some established criteria in the corresponding category equation, without any
of search. presentation criterion.
Suitable for locating informationSuitable for locating general information
specific. about a topic.

Metadata

Metadata in navigation and information retrieval are used to detect


relevant information quickly and efficiently. The labels describe the content
from the web resource, which is later used by search tools to locate and
access the resource. Mainly, it is the keyword and title tags that provide
I proceed to locate the document.

Quality of recovery
The following are some basic criteria for the recovery carried out
quality sea.

Consistency: The ability of a search system to coordinate its system.


classification with the search language, thus allowing to establish
search equations on accepted terms.
Exhaustiveness: It is the quality of an information system to retrieve the
totality of the relevant documents held by a collection, in accordance with the
requirements established in the search strategy.
Success rate: coefficient obtained by dividing the number of documents
relevant retrieved, about the total number of relevant documents from the
collection
Relevance: A characteristic of a retrieved document that meets the
information needs.

6
Relevance rate: coefficient that arises from dividing the number of documents
relevant retrieved, about the total number of retrieved documents
Relevance: It is the quality of the retrieved document to adapt to the
information needs.
Relevance rate: coefficient that arises from dividing the number of documents
relevant retrieved, about the total number of retrieved documents
Precision: it is the ability of the search system to coordinate the equation
with the most relevant documents. Otherwise, they are those documents
relevant retrieved.
Precision rate: coefficient that arises from dividing the number of documents
relevant retrieved, about the total number of documents in the collection

Skills and competencies


Formulating a plan for information search: defining the subject or
aspects to look for, using a list of appropriate keywords, delimiting
the search according to chronological, linguistic criteria.
Knowledge of potential and real sources of information
Skills for locating relevant printed and electronic resources in
the context of the need for information
Ability to select the most appropriate search tool and formulate the
most appropriate strategy.
Mastery of advanced techniques for information retrieval on the Internet,
using engines, search directories, intelligent agents.
Skills to evaluate search results, reflecting on the
successes, failures, and alternative strategies.
Determine the location and access to information, respecting ethical principles
and legal.

Extracted from E-COMS (Electronic Content Management Skills) Available at:


[Link]

You might also like