Semantic Search and Result Presentation with Entity Cards

SEMANTIC SEARCH AND RESULT
PRESENTATION WITH ENTITY CARDS
FAEGHEH HASIBI | SEARCH ENGINES AMSTERDAM | JUNE 30, 2017

“——————————”
SEMANTIC SEARCH

SEMANTIC SEARCH
“Search with Meaning”

KNOWLEDGE BASE
(Core data-enabling component of semantic search)

Albert Einstein
ENTITY
<dbr:Albert_Einstein, foaf:name, Albert Einstein>
<dbr:Albert_Einstein, dbo:birthDate, 1879-03-14>
<dbr:Albert_Einstein, dbo:birthPlace, dbr:Ulm>
<dbr:Albert_Einstein, dbo:birthPlace, dbr:German_Empire>
<dbr:Albert_Einstein, dbp:description, dbr:Physicist>
…

SEMANTIC SEARCH

SEMANTIC SEARCH
an umbrella term that encompasses various techniques

‣ Knowledge acquisition 
and curation
‣ Query understanding
‣ Answer retrieval
‣ Result presentation
‣ …
BUILDING BLOCKS

OVERVIEW
Result 
Presentation
01+ Resources

RESULT PRESENTATION
Summarizing Entities for Entity Cards
F. Hasibi, K. Balog, and S.E. Bratsberg. “Dynamic Factual Summaries for Entity Cards”.  
In Proceedings of SIGIR ’17

ENTITY SUMMARIZATION
Albert Einstein
… and ~700 more facts
dbo:almaMater dbr:ETH_Zurich
dbo:almaMater dbr:University_of_Zurich
dbo:award dbr:Max_Planck_Medal
dbo:award dbr:Nobel_Prize_in_Physics
dbo:birthDate 1879-03-14
dbo:birthPlace dbr:Ulm
dbo:birthPlace dbr:German_Empire
dbo:citizenship dbr:Austria-Hungary
dbo:children dbr:Eduard_Einstein
dbo:children dbr:Hans_Albert_Einstein
dbo:deathDate 1955-04-18
dbo:deathPlace dbr:Princeton,_New_Jersey
dbo:spouse dbr:Elsa_Einstein
dbo:spouse dbr:Mileva_Marić
dbp:influenced dbr:Leo_Szilard

ENTITY SUMMARIES
einstein awardseinstein family

Other applications
‣ News search
• hovering over an entity in entity-annotated documents
‣ Job search
• company descriptions for a given topic
ENTITY SUMMARIES

ENTITY SUMMARIES
? Question
How to generate  
query-dependent entity
summaries that can directly
address users’ information
needs?

METHOD
Fact ranking
Ranking a set of entity facts (and a search query)
with respect to some criterion
Summary generation
constructing an entity summary from ranked
entity facts, for a given size

RANKING CRITERIA
Importance: The general importance of that fact in
describing the entity, irrespective of any particular
information need.
Relevance: The relevance of fact to query reflects
how well the fact supports the information need
underlying the query.

Utility
RANKING CRITERIA
Utility: The utility of a fact combines the general
importance and the relevance of a fact into a single
number
importance relevance

Importance
..
FACT RANKING
Relevance
..
‣ Supervised ranking with fact-query pairs as learning instances
‣ Learning is optimized on utility with different weights
• more bias towards importance or relevance

FACT RANKING
‣ Knowledge bases statistics as ingredients for Importance features
• absence of query logs

SUMMARY GENERATION
1. dbo:birthDate 1879-03-14
2. dbp:placeOfBirth Ulm
3. dbo:birthPlace dbr:Ulm
4. dbo:deathDate 1955-04-18
5. dbo:award dbr:Nobel_Prize_in_Physics
6. dbo:deathPlace dbr:Princeton,_New_Jersey
7. dbo:birthPlace dbr:German_Empire
8. dbo:almaMater dbr:ETH_Zurich
9. dbo:award dbr:Max_Planck_Medal
10.dbp:influenced dbr:Nathan_Rosen
11.dbo:almaMater dbr:University_of_Zurich
…
multi-valued
predicates

SUMMARY GENERATION
1. dbo:birthDate 1879-03-14
2. dbp:placeOfBirth Ulm
3. dbo:birthPlace dbr:Ulm
4. dbo:deathDate 1955-04-18
5. dbo:award dbr:Nobel_Prize_in_Physics
6. dbo:deathPlace dbr:Princeton,_New_Jersey
7. dbo:birthPlace dbr:German_Empire
8. dbo:almaMater dbr:ETH_Zurich
9. dbo:award dbr:Max_Planck_Medal
10.dbp:influenced dbr:Nathan_Rosen
11.dbo:almaMater dbr:University_of_Zurich
…
identical facts

SUMMARY GENERATION
Algorithm 1 Summary generation algorithm
Input: Ranked facts Fe, max height h, max width w
Output: Entity summary lines
1: M Predicate-Name Mapping(Fe)
2: headin s [] . Determine line headings
3: for f in Fe do
4: pname M[fp]
5: if (pname < headin s) AND (size(headin s)  h) then
6: headin s.add((fp,pname ))
7: end if
8: end for
9: alues [] . Determine line values
10: for f in Fe do
11: if fp 2 headin s then
12: alues[fp].add(fo)
13: end if
14: end for
15: lines [] . Construct lines
16: for (fp,pname ) in headin s do
17: line pname + ‘:’
18: for in alues[fp] do
19: if len(line) + len(v)  w then
20: line line + . Add comma if needed
21: end if
22: end for
23: lines.add(line)
24: end for
‣ Creates a summary of a
given size (length and width)
‣ Resolving identical facts  
(RF feature)
‣ Grouping multi-valued
predicates (GF feature)

QUERIES
February 2014
Increase proﬁt by
35%
Keyword
Natural languageList search
Named entity
• “madrid”
• “brooklyn bridge”
• “vietnam war facts”
• “eiffel”
• “states that border
oklahoma”
• “What is the second
highest mountain?”
Taken from the DBpedia-entity collection
K. Balog and R. Neumayer. “A Test Collection for Entity Search in DBpedia” In proc of SIGIR ’13.
Query
types

EVALUATION (FACT RANKING)
Benchmark construction by Crowd sourcing experiments
‣ rate the importance of
the fact w.r.t. the entity
‣ rate the relevance of the fact to
the query for the given entity
Very important
Important
Not important
How important is this fact for the given entity?

EVALUATION (FACT RANKING)
Benchmark construction by crowd sourcing experiments
‣ Collecting judgments for ~4K facts
‣ 5 judgments per record
‣ Fleiss’ Kappa of 0.52 and 0.41 for importance and relevance,
(moderate agreement)

RESULTS (FACT RANKING)
number of
oximately
used and
e features.
validation,
We report
cal signi-
= 0.05) or
mance (for
icance.
approach
g fact rele-
ynES uses
nES/imp
mportance
ures only
Table 2: Comparison of fact ranking against the state-of-the-
art of approaches with URI-only objects. Signicance for
lines i 3 are tested against lines 1,2,3, and for lines 2,3
are tested against lines 1,2.
Model
Importance Utility
NDCG@5 NDCG@10 NDCG@5 NDCG@10
RELIN 0.6368 0.7130 0.6300 0.7066
LinkSum 0.7018M 0.7031 0.6504 0.6648
SUMMARUM 0.7181N 0.7412 M 0.6719 0.7111
DynES/imp 0.8354NNN 0.8604NNN 0.7645NNN 0.8117NNN
DynES 0.8291NNN 0.8652NNN 0.8164NNN 0.8569NNN
Table 4: Fact ranking performance by removing features;
features are sorted by the relative dierence they make.
Group Removed feature NDCG@10 % p
DynES - all features 0.7873 - -
Imp. - NEFp 0.7757 -1.16 0.08
Imp. - T peImp 0.7760 -1.13 0.14
16% improvements over the best baseline

‣ Users consume all facts displayed in the summary
‣ The quality of the whole summary should be assessed
‣ Side-by-side evaluation of factual summaries by human
EVALUATION (SUMMARY GENERATION)

RESULTS (SUMMARY GENERATION)−10
−5
0
Exp
User prefere
(a) DynES vs. DynES/imp
−10
−5
0
User prefer
−1
−
User preferen
Figure 4: Boxplot for distribution of user preferences for each q
DynES/imp or DynES/rel.
Table 5: Side-by-Side evaluation of summaries for dierent
fact ranking methods.
Model Win Loss Tie RI
DynES vs. DynES/imp 46 23 31 0.23
DynES vs. DynES/rel 75 12 13 0.63
DynES vs. RELIN 95 5 0 0.90
Utility vs. Importance 47 16 37 0.31
Table 6: Side-by-side evaluation of summaries for dierent
summary generation algorithms.
DynES vs. DynES(-GF)(-RF) 84 1 15 0.83
DynES vs. DynES(-GF) 74 0 26 0.74
DynES vs. DynES(-RF) 46 2 52 0.44
preferred DynES summaries over DynES/imp (or DynES/rel) sum-
maries; ties are ignored. Considering all queries (the black boxes),
we observe that the utility-based summaries (DynES) are generally
preferred over the other two, and especially over the relevance-
−10
−5
0
Exp
User prefere
(a) DynES vs. DynES/imp
−10
−5
0
User prefere
−1
−
User preferenc
Figure 4: Boxplot for distribution of user preferences for each q
DynES/imp or DynES/rel.
Table 5: Side-by-Side evaluation of summaries for dierent
fact ranking methods.
DynES vs. DynES/imp 46 23 31 0.23
DynES vs. DynES/rel 75 12 13 0.63
DynES vs. RELIN 95 5 0 0.90
Utility vs. Importance 47 16 37 0.31
Table 6: Side-by-side evaluation of summaries for dierent
summary generation algorithms.
DynES vs. DynES(-GF)(-RF) 84 1 15 0.83
DynES vs. DynES(-GF) 74 0 26 0.74
DynES vs. DynES(-RF) 46 2 52 0.44
preferred DynES summaries over DynES/imp (or DynES/rel) sum-
maries; ties are ignored. Considering all queries (the black boxes),
we observe that the utility-based summaries (DynES) are generally
• Users preferred utility-based
summaries over the others
• Grouping of multivalued
predicates (GF) is perceived
as more important by the
users than the resolution of
identical facts (RF)

RESOURCES
1. Entity search toolkit
2. Test collection

SEMANTIC SEARCH TOOLKIT
Entity retrieval: Returns a ranked list of entities in
response to a query
Entity linking: Identifies entities in a query and links them
to the corresponding entry in the Knowledge base
Target type identification: Detects the target types
(or categories) of a query
Functionalities

SEMANTIC SEARCH TOOLKIT
• Web interface, API, and
command line usage 
• 3-tier architecture
• Online source code and
documentation
Highlights

NORDLYS
http://nordlys.cc/

DBPEDIA-ENTITY V2
• (37 + 17) runs + old qrels 
• Pool size: 150K
• 3 point likert-scale
• 5 judgments per query-entity
Details

DBPEDIA-ENTITY V2
778.2.Retrievalmethod
Model
SemSearch ES INEX-LD ListSearch QALD-2 Total
@10 @100 @10 @100 @10 @100 @10 @100 @10 @100
BM25 0.2497 0.4110 0.1828 0.3612 0.0627 0.3302 0.2751 0.3366 0.2558 0.3582
PRMS 0.5340 0.6108 0.3590 0.4295 0.3684 0.4436 0.3151 0.4026 0.3905 0.4688
MLM-all 0.5528 0.6247 0.3752 0.4493 0.3712 0.4577 0.3249 0.4208 0.4021 0.4852
LM 0.5555 0.6475 0.3999 0.4745 0.3925 0.4723 0.3412 0.4338 0.4182 0.5036
SDM 0.5535 0.6672 0.4030 0.4911 0.3961 0.4900 0.3390 0.4274 0.4185 0.5143
LM+ELR 0.5557 0.6477 0.4013 0.4763 0.4037 0.4885 0.3464 0.4377 0.4228 0.5093
SDM+ELR 0.5533 0.6676 0.4097 0.4975 0.4142 0.5058 0.3434 0.4350 0.4257 0.5220
MLM-CA 0.6247 0.6854 0.4029 0.4796 0.4021 0.4786 0.3365 0.4301 0.4365 0.5143
BM25-CA 0.5858 0.6883 0.4120 0.5050 0.4220 0.5142 0.3566 0.4426 0.4399 0.5329
FSDM 0.6521 0.7220 0.4214 0.5043 0.4196 0.4952 0.3401 0.4358 0.4524 0.5342
BM25F-CA 0.6281 0.7200 0.4394 0.5296 0.4252 0.5106 0.3689 0.4614 0.4605 0.5505
FSDM+ELR 0.6568 0.7260 0.4397 0.5144 0.4246 0.5011 0.3467 0.4450 0.4607 0.5416
Table 8.3: Results, broken down into query subtypes, on DBpedia-entity v2.
Baseline runs:
Generative models are available on Nordlys

DBPEDIA-ENTITY COLLECTIONS
Entity  
Retrieval
Entity  
Summarization
Target Type  
Identification
• 468 queries
• 19K relevant entities
• 100 queries
• 4K entity facts
• 485 queries
• ~900 types
Built with DBpedia 2015-10
https://github.com/iai-group/DBpedia-Entity

Nordlys:
Krisztian Balog, Dario Garigliotti, Shuo Zhang, Heng Ding
DBpedia-Entity Collection:
Fedor Nikolaev, Chenyan Xiong, Svein Erik Bratsberg, Krisztian Balog,
Alexander Kotov, Jamie Callan
ACKNOWLEDGEMENT

Semantic Search and Result Presentation with Entity Cards

More Related Content

Similar to Semantic Search and Result Presentation with Entity Cards (20)

Recently uploaded (20)

Semantic Search and Result Presentation with Entity Cards