Entity Search Engine

ENTITY SEARCH ENGINE : A NEW SEARCH TOOL
Speaker : Tanmay Mondal , MSLIS 2013-2015
Indian Statistical Institute , Bangalore
Documentation Research and Training Centre
Seminar ( 1 ) - 2014

OOvveerrvviieeww
PPrreesseenntt AApppprrooaacchh
EEnnttiittyy SSeeaarrcchh
BBeenneeffiitt ooff EEnnttiittyy SSeeaarrcchh
EEnnttiittyy && IIttss FFaacceettss
MMaaiinn WWoorrkk ooff EESSEE
PPooppuullaarr EEnnttiittyy SSeeaarrcchh
OOKKKKAAMM--EEnnaabblliinngg aa WWeebb ooff EEnnttiittiieess
WWoorrkkffllooww ooff OOkkkkaamm
MMyy LLiibbrraarryy
RReeffeerreenncceess

Present Approach
● Information is everywhere & it is growing exponentially
● A traditional information extraction approach is to scan every
document in any collection
● As document collection is the set of all web pages indexed by a
search engines
● Time consuming for users for getting pin-pointed information

Person
Location
Organization
Nationality
Religion
Product
For specific Information
Phone Number
Email Address/URL
Distance
Date
Time
Money Generic Number
Problem of identifying and linking / grouping different
manifestations of the same real world object

Web of Documents Web of Entites
Cluster the records that correspond to same entity

Entity Search
● Entity refers to any object or a thing that can be uniquely identified in
the world
● It's a better match search queries with a database containing hundreds of
millions of "entities"
● Each entity is in relation with many entites
● The answer entities have specific information & identifying the right
relationship among the entities
● Semantic or faceted search on entities

Why ?
● When people use retrieval systems they are often not searching for
documents or text passages
● Summarization of entities and concepts
● The named entities (persons, organizations, locations, products...) play a
central role in answering such information needs
● At least 20-30% of the queries submitted to Web SE are simply entities
● ~71% of Web search queries contain named entities
**Source - Building Taxonomy of Web Search Intents for Name Entity
Queries by Xiaoxin Yin & Sarthak Shah

Benefit of Entity Search
● Entities are often categorized into a taxonomy
● Primary task of the user is often to make a decision
● More structured than document based
● Entity is associated with the same URI across the different repositories
● Entity Information Integration
● More understandable by Human
● Increase precision & less Time Consuming

Entity & Its Facets
● An entity must be distinguished from other entities Can be anything
including an abstract thing like Diseases ,Imaginary art etc.
● Type of an entity refers to a generic class into which the given entity is
classified.
● Attribute refers to a property (predicate) associated with an entity.
● Value refers to the value of an attribute (for a given entity).
● Relation provides more information with many entites
● Entity, Prof. S.R. Ranganathan is a person , IBM is an organization

Main Work of ESE
● Entity Retrieval : Entity search engines can return aranked list of
entities most relevant for a user query
● Entity Relationship / Fact Mining and Navigation : It discover
interesting relationships / facts about the entities associated with their
queries
● Prominence Ranking : Detect the popularity of an entity and enable
users to browse entities in different categories
● Entity Description Retrieval : Entity description blocks for each entity
information about an object in a web page is generally grouped together
as an object block

Popular Entity Search
● Product search-Various Products like Books, Electronics, Clothes, etc.
● People search-Experts, Friends, Profile of famous persons, etc.
● Location search-Travel, Address ,Business, Govt Offices, etc.

Idea about entity search engine

Various ESE
● Freebase-http://www.freebase.com/
● Sindice-http://sindice.com/
●Geneview-http://bc3.informatik.hu-berlin.de/
●Okkam-http://www.okkam.org/
●WolframAlpha-http://www.wolframalpha.com/
● Yatedo-http://www.yatedo.com/
●GeoNames-http://www.geonames.org/
●Dbpedia-http://dbpedia.org/About
● EntityCube-http://
entitycube.research.microsoft.com/
etc......

OKKAM-Enabling a Web of Entities
● Any collection of data and information about any type of entities
published on the Web can be integrated into a single virtual,
decentralized, open knowledge base.
● It leads to a faster, more efficient and more precise way to
deal with the flood of information available on the Web today
Entities should not be multiplied beyond necessity

OKKAM ENS
● OKKAM ENS is for entity search, where storage, indexing
and matching technology was built for finding an entity given
its description
● Every entity (individual, instance, “thing”) is assigned a
global identifier, ideally unique
● More than 7.5 million entity repository with more structured
form
Entity identifiers should not be multiplied beyond necessity

Project Partners
● University of Trento, Italy (Co-Ordinator)
● L3S Research Center, Germany
● SAP Research, Germany
● Expert System, Italy
● Elsevier B.V., Netherlands
● Europe Unlimited SA, Belgium
● National Microelectronics Application Center (MAC), Ireland
● Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland
● DERI Galway, Ireland
● University of Malaga, Spain
● INMARK, Spain
● Agenzia Nazionale Stampa Associata (ANSA), Italy

Sources Of Information
● Wikipedia Provides lists of countries, cities, members of particulars
domains which is very common for our search query
● GeoNames contains over 10 million geographical names and consists of
over 9 million unique features of 2.8 million populated places and 5.5
million alternate names
● OkkamDBManager Another important information source for OKKAM
can be generic databases like extranets, online shops or publishing
houses
● OkkamManualEntry Another solution we provide to insert new entities
is the manual case

Data extracted from any unstructed sources more effectively

Cogito Semantic Technology
● Semantic analysis engine and complete semantic
network for a complete understanding of text
● Transforming unstructured information into structured
data
● Identifies the most relevant concepts
● Interprets the meaning of texts
● Precisely extracts information
● Automatically connet entites extracted from sources

Sensigrafo
● Enables the disambiguation of terms
● It allows Cogito to understand the meaning of words and
context
● Extraction of data and metadata
● Product development, competitive intelligence,marketing
,Finance, Media & Publishing, Oil & Gas, Life Sciences &
Pharma, Government and Telecommunications and many
activities where knowledge sharing is critical
● More than 1 million concepts,more than 4 million
relationships

WWoorrkkffllooww ooff OOkkkkaamm
● Storage: A scalable repository of entity profiles, in which billions of
entities are assigned an ID and a profile, to distinguish one entity from
another
● Matching: Requests from client applications arrive in the form of a bag
of keywords or a collection of name value pairs (unstructured or semi-structured
queries
● ID storage and management: stores, maintains and makes available
for reuse IDs (URIs) for anything which is named in a networked
environment
● Lifecycle Management: It takes care of the evolution Storage of the
repository and of all entity profiles through different time

Entity Query & Matching in Okkam

Wolfram|Alpha
● Wolfram|Alpha is an engine for computing answers and
providing knowledge
● It generates output by doing computations from its own
internal knowledge base, instead of searching the
web and returning links
● It is an online service that answers factual queries
directly by computing the answer
● Make all systematic knowledge immediately computable
and accessible to everyone

How many newspapers are available in the globe

Overall Difficulties
● The number of entities could be huge
● Information Redundancy
● Information Fragmentation
● Entity Information Integration
● A single algorithm for finegrained
entity matching may not exist
● Store and retrieve using IR based techniques
● Matching on very large datasets
● Natural Language Processing

Contd...
● Availability of a knowledge base is less
● Multi‐domain entites
● Deduplication Problem
● Some names and relationships could be incorrect & the
information may not be updatetodate
● Name disambiguation is still largely unsolved
● ESEs are at early age
Creating knowledge bases from text and unstructured data is the
goal

My Library
● EEnnttiitteess aarree ffoorr UUssee
● EEaacchh EEnnttiittyy hhaass iittss oowwnn aattttrriibbuutteess && rreellaattiioonn
● EEvveerryy EEnnttiittyy hhaass iittss iimmppoorrttaannccee
● SSaavvee tthhee TTiimmee ffoorr ffiinnddiinngg oouutt EEnnttiitteess
● EEnnttiitteess aarree ggrroowwiinngg rraappiiddllyy

References
1. Statistical Entity Extraction from Web by Zaiqing Nie, Ji-Rong Wen,
and Wei-Ying Ma, Fellow, IEEE
2. State of the art in IE, overview, comparison and analysis by Stefan
Dumitrescu ,PhD Student
3. The Entity Name System: Enabling the Web of Entities by Heiko
Stoermer, Themis Palpanas, George Giannakopoulos,University of
Trento
4. Hybrid entity clustering using crowds and data by Jongwuk Lee,
Hyunsouk Cho,Jin-Woo Park,Young-rok Cha,Seung-won Hwang, Zaiqing
Nie ,Ji-Rong Wen
5. Supporting Entity Search:A Large-Scale Prototype Search Engine by
Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang

References...
6. OKKAM: Enabling a Web of Entities by Paolo Bouquet ,Heiko
Stoermer ,Daniel Giacomuzzi ,University of Trento
7. Entity Data Management in OKKAM by Themis Palpanas 1 , Junaid
Chaudhry 2 , Periklis Andritsos 1 , Yannis Velegrakis 1 ,1 University of
Trento,2 Ajou University
8. SPACE AND TIME ENTITY REPOSITORY Human-enhanced time-awaremulti
media search funded by EU07
See :http://issuu.com/cubrikproject/docs/issuu.cubrik.d41.unitn.wp4.v1.0
9. http://api.okkam.org/search/
10. http://www.wolframalpha.com/

Entity Search Engine

More Related Content

What's hot(20)

Viewers also liked(20)

Similar to Entity Search Engine (20)

Recently uploaded(20)

Entity Search Engine