Link Discovery Tutorial
Introduction
Axel-Cyrille Ngonga Ngomo(1)
, Irini Fundulaki(2)
, Mohamed Ahmed Sherif(1)
(1) Institute for Applied Informatics, Germany
(2) FORTH, Greece
October 18th, 2016
Kobe, Japan
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 1 / 24
Introduction
Disclaimer
No pursuit of completeness [Nen+15]
Focus on
Basic ideas and principles
Principles
Evaluation
Open questions and challenges
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 2 / 24
Introduction
Disclaimer
No pursuit of completeness [Nen+15]
Focus on
Basic ideas and principles
Principles
Evaluation
Open questions and challenges
Guidelines
http://iswc2016ldtutorial.aksw.org/
Question? Just ask!
Comment? Go for it!
Be kind :)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 2 / 24
Let’s Go!
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 3 / 24
Linked Data Principles
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 4 / 24
Why Link Discovery?
1 Linked Open Data Cloud
130+ billion triples
≈ 0.5 billion links
Mostly owl:sameAs
2 Decentralized dataset
creation
3 Complex information needs
⇒ Need to consume data
across knowledge bases
4 Links are central for
Cross-ontology QA
Data Integration
Reasoning
Federated Queries
...
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 5 / 24
Cross-Ontology QA
Example
Give me the name and description of all drugs that cure their side-effect. [SNA13]
1 Need information from
Drugbank (Drug description)
Sider (Side-effects)
DBpedia (Description)
2 Gathering information via SPARQL query
using links
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 6 / 24
Cross-Ontology QA
Example
Give me the name and description of all drugs that cure their side-effect.
SELECT ?drug ?name ?desc WHERE
{
?drug a drugbank:Drug .
?drug rdfs:label ?name .
?drug drugbank:cures ?disease .
?drug owl:sameAs ?drug2 .
?drug owl:sameAs ?drug3 .
?drug2 sider:hasSideEffect ?effect .
?effect owl:sameAs ?disease .
?drug3 dbo:hasWikiPage ?desc .
}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 7 / 24
Cross-Ontology QA
Example (DEQA)
Give me flats near kindergartens in Kobe. [Leh+12]
SELECT ?flat WHERE
{
?flat a deqa:Flat .
?flat deqa:near ?school .
?school a lgdo:School .
?school lgdo:city lgdo:Kobe .
}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 8 / 24
Data Integration
Federated Queries on Patient Data [Kha+14]
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 9 / 24
Federated Queries
Example (FedBench CD2)
Return Barack Obama’s party membership and news pages. [Sal+15]
SELECT ?party ?page WHERE
{
dbr:Barack_Obama dbo:party ?party .
?x nytimes:topicPage ?page .
?x owl:sameAs dbr:Barack_Obama .
}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 10 / 24
Definition
Definition (Link Discovery, informal)
Given two knowledge bases S and T, find links of type R between S and T
Here, declarative link discovery
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 11 / 24
Definition
Definition (Link Discovery, informal)
Given two knowledge bases S and T, find links of type R between S and T
Here, declarative link discovery
Definition (Declarative Link Discovery, formal, similarities)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Common approach: Find M = {(s, t) ∈ S × T : σ(s, t) ≥ θ}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 11 / 24
Definition
Definition (Link Discovery, informal)
Given two knowledge bases S and T, find links of type R between S and T
Here, declarative link discovery
Definition (Declarative Link Discovery, formal, similarities)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Common approach: Find M = {(s, t) ∈ S × T : σ(s, t) ≥ θ}
Definition (Declarative Link Discovery, formal, distances)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ τ}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 11 / 24
Definition
Most common: R = owl:sameAs
Also known as deduplication [Nen+15]
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 12 / 24
Definition
Goal: Address all possible relations R
Declarative Link Discovery: Similarity/distance defined using property values
(incl. property chains)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 13 / 24
Definition
Goal: Address all possible relations R
Declarative Link Discovery: Similarity/distance defined using property values
(incl. property chains)
Example: R = :sameModel
:s770fm rdfs:label "S770FM"@en
:s770fm rdf:type :SABER
:s770fm :model :770
:s770fm :top :FlamedMaple
:s770fm :producer :Ibanez
:s770fm rdfs:label "S770BEM"@en
:s770fm rdf:type :SABER
:s770fm :model :770
:s770fm :top :BirdEyeMaple
:s770fm :producer :Ibanez
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 13 / 24
Why is it difficult?
1 Time complexity
Large number of triples (e.g.,
LinkedTCGA with 20.4 billion
triples [Sal+14])
Quadratic a-priori runtime
69 days for mapping cities from
DBpedia to Geonames
Solutions usually in-memory
(insufficient heap space)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 14 / 24
Why is it difficult?
1 Time complexity
Large number of triples (e.g.,
LinkedTCGA with 20.4 billion
triples [Sal+14])
Quadratic a-priori runtime
69 days for mapping cities from
DBpedia to Geonames
Solutions usually in-memory
(insufficient heap space)
2 Accuracy
Combination of several attributes
required for high precision
Tedious discovery of most
adequate mapping
Dataset-dependent similarity
functions
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 14 / 24
Structure
1 Time complexity(≈ 60 min)
LIMES algorithm [NA11]
MultiBlock [IJB11]
HR3
[Ngo12]
AEGLE [GSN16]
Summary and Challenges
2 Accuracy(≈ 30 min)
RAVEN [Ngo+11]
EAGLE [NL12]
COALA [NLC13]
Summary and Challenges
3 Benchmarking (≈ 30 min)
Benchmarking [NGF16]
Synthetic Benchmarks [Sav+15]
Real Benchmarks [Mor+11]
Summary and Challenges
4 Hands-On Session (≈ 45 min)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 15 / 24
That’s all Folks!
Axel Ngonga
AKSW Research Group
Institute for Applied Informatics
ngonga@informatik.uni-leipzig.de
Irini Fundulaki
ICS FORTH
fundul@ics.forth.gr
Mohamed Ahmed Sherif
AKSW Research Group
Institute for Applied Informatics
sherif@informatik.uni-leipzig.de
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 16 / 24
Acknowledgment
This work was supported by grants from the EU H2020 Framework Programme
provided for the project HOBBIT (GA no. 688227).
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 17 / 24
References I
Kleanthi Georgala, Mohamed Ahmed Sherif, and
Axel-Cyrille Ngonga Ngomo. “An Efficient Approach for the
Generation of Allen Relations”. In: ECAI 2016 - 22nd European
Conference on Artificial Intelligence, 29 August-2 September 2016,
The Hague, The Netherlands - Including Prestigious Applications of
Artificial Intelligence (PAIS 2016). Ed. by Gal A. Kaminka et al.
Vol. 285. Frontiers in Artificial Intelligence and Applications. IOS
Press, 2016, pp. 948–956. isbn: 978-1-61499-671-2. doi:
10.3233/978-1-61499-672-9-948. url:
http://dx.doi.org/10.3233/978-1-61499-672-9-948.
Robert Isele, Anja Jentzsch, and Christian Bizer. “Efficient
Multidimensional Blocking for Link Discovery without losing Recall”.
In: Proceedings of the 14th International Workshop on the Web and
Databases 2011, WebDB 2011, Athens, Greece, June 12, 2011. Ed. by
Amélie Marian and Vasilis Vassalos. 2011. url:
http://webdb2011.rutgers.edu/papers/Paper%2039/silk.pdf.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 18 / 24
References II
Yasar Khan et al. “SAFE: Policy Aware SPARQL Query Federation
Over RDF Data Cubes”. In: Proceedings of the 7th International
Workshop on Semantic Web Applications and Tools for Life Sciences,
Berlin, Germany, December 9-11, 2014. Ed. by Adrian Paschke et al.
Vol. 1320. CEUR Workshop Proceedings. CEUR-WS.org, 2014. url:
http://ceur-ws.org/Vol-1320/Preface_SWAT4LS2014.pdf.
Jens Lehmann et al. “deqa: Deep Web Extraction for Question
Answering”. In: The Semantic Web - ISWC 2012 - 11th International
Semantic Web Conference, Boston, MA, USA, November 11-15, 2012,
Proceedings, Part II. Ed. by Philippe Cudré-Mauroux et al. Vol. 7650.
Lecture Notes in Computer Science. Springer, 2012, pp. 131–147.
isbn: 978-3-642-35172-3. doi: 10.1007/978-3-642-35173-0_9.
url: http://dx.doi.org/10.1007/978-3-642-35173-0_9.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 19 / 24
References III
Mohamed Morsey et al. “DBpedia SPARQL Benchmark -
Performance Assessment with Real Queries on Real Data”. In: The
Semantic Web - ISWC 2011 - 10th International Semantic Web
Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part
I. Ed. by Lora Aroyo et al. Vol. 7031. Lecture Notes in Computer
Science. Springer, 2011, pp. 454–469. isbn: 978-3-642-25072-9. doi:
10.1007/978-3-642-25073-6_29. url:
http://dx.doi.org/10.1007/978-3-642-25073-6_29.
Axel-Cyrille Ngonga Ngomo and Sören Auer. “LIMES - A
Time-Efficient Approach for Large-Scale Link Discovery on the Web
of Data”. In: IJCAI 2011, Proceedings of the 22nd International Joint
Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July
16-22, 2011. Ed. by Toby Walsh. IJCAI/AAAI, 2011, pp. 2312–2317.
isbn: 978-1-57735-516-8. doi:
10.5591/978-1-57735-516-8/IJCAI11-385. url: http:
//dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-385.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 20 / 24
References IV
Markus Nentwig et al. “A survey of current Link Discovery
frameworks”. In: Semantic Web Preprint (2015), pp. 1–18.
Axel-Cyrille Ngonga Ngomo, Alejandra Garcıa-Rojas, and
Irini Fundulaki. “HOBBIT: Holistic Benchmarking of Big Linked
Data”. In: ERCIM News 2016.105 (2016). url:
http://ercim-news.ercim.eu/en105/r-i/hobbit-holistic-
benchmarking-of-big-linked-data.
Axel-Cyrille Ngonga Ngomo et al. “RAVEN - active learning of link
specifications”. In: Proceedings of the 6th International Workshop on
Ontology Matching, Bonn, Germany, October 24, 2011. Ed. by
Pavel Shvaiko et al. Vol. 814. CEUR Workshop Proceedings.
CEUR-WS.org, 2011. url:
http://ceur-ws.org/Vol-814/om2011_Tpaper3.pdf.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 21 / 24
References V
Axel-Cyrille Ngonga Ngomo. “Link Discovery with Guaranteed
Reduction Ratio in Affine Spaces with Minkowski Measures”. In: The
Semantic Web - ISWC 2012 - 11th International Semantic Web
Conference, Boston, MA, USA, November 11-15, 2012, Proceedings,
Part I. Ed. by Philippe Cudré-Mauroux et al. Vol. 7649. Lecture Notes
in Computer Science. Springer, 2012, pp. 378–393. isbn:
978-3-642-35175-4. doi: 10.1007/978-3-642-35176-1_24. url:
http://dx.doi.org/10.1007/978-3-642-35176-1_24.
Axel-Cyrille Ngonga Ngomo and Klaus Lyko. “EAGLE: Efficient Active
Learning of Link Specifications Using Genetic Programming”. In: The
Semantic Web: Research and Applications - 9th Extended Semantic
Web Conference, ESWC 2012, Heraklion, Crete, Greece, May 27-31,
2012. Proceedings. Ed. by Elena Simperl et al. Vol. 7295. Lecture
Notes in Computer Science. Springer, 2012, pp. 149–163. isbn:
978-3-642-30283-1. doi: 10.1007/978-3-642-30284-8_17. url:
http://dx.doi.org/10.1007/978-3-642-30284-8_17.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 22 / 24
References VI
Axel-Cyrille Ngonga Ngomo, Klaus Lyko, and Victor Christen.
“COALA - Correlation-Aware Active Learning of Link Specifications”.
In: The Semantic Web: Semantics and Big Data, 10th International
Conference, ESWC 2013, Montpellier, France, May 26-30, 2013.
Proceedings. Ed. by Philipp Cimiano et al. Vol. 7882. Lecture Notes
in Computer Science. Springer, 2013, pp. 442–456. isbn:
978-3-642-38287-1. doi: 10.1007/978-3-642-38288-8_30. url:
http://dx.doi.org/10.1007/978-3-642-38288-8_30.
Muhammad Saleem et al. “TopFed: TCGA Tailored Federated Query
Processing and Linking to LOD”. In: J. Biomedical Semantics 5
(2014), p. 47. doi: 10.1186/2041-1480-5-47. url:
http://dx.doi.org/10.1186/2041-1480-5-47.
Muhammad Saleem et al. “A fine-grained evaluation of SPARQL
endpoint federation systems”. In: Semantic Web 7.5 (2015),
pp. 493–518. doi: 10.3233/SW-150186. url:
http://dx.doi.org/10.3233/SW-150186.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 23 / 24
References VII
Tzanina Saveta et al. “LANCE: Piercing to the Heart of Instance
Matching Tools”. In: The Semantic Web - ISWC 2015 - 14th
International Semantic Web Conference, Bethlehem, PA, USA,
October 11-15, 2015, Proceedings, Part I. Ed. by Marcelo Arenas
et al. Vol. 9366. Lecture Notes in Computer Science. Springer, 2015,
pp. 375–391. isbn: 978-3-319-25006-9. doi:
10.1007/978-3-319-25007-6_22. url:
http://dx.doi.org/10.1007/978-3-319-25007-6_22.
Saeedeh Shekarpour, Axel-Cyrille Ngonga Ngomo, and Sören Auer.
“Question answering on interlinked data”. In: 22nd International
World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, May
13-17, 2013. Ed. by Daniel Schwabe et al. International World Wide
Web Conferences Steering Committee / ACM, 2013, pp. 1145–1156.
isbn: 978-1-4503-2035-1. url:
http://dl.acm.org/citation.cfm?id=2488488.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 24 / 24

More Related Content

PDF
Link Discovery Tutorial Part V: Hands-On
PDF
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
PDF
Link Discovery Tutorial Part II: Accuracy
PDF
Link Discovery Tutorial Part I: Efficiency
PDF
How well does your Instance Matching system perform? Experimental evaluation ...
PDF
Can Deep Learning Techniques Improve Entity Linking?
PDF
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
PDF
Enhancing Entity Linking by Combining NER Models
Link Discovery Tutorial Part V: Hands-On
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part II: Accuracy
Link Discovery Tutorial Part I: Efficiency
How well does your Instance Matching system perform? Experimental evaluation ...
Can Deep Learning Techniques Improve Entity Linking?
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Enhancing Entity Linking by Combining NER Models

What's hot (20)

PDF
Framester: A Wide Coverage Linguistic Linked Data Hub
PPT
Computing with Directed Labeled Graphs
PPTX
DS2014: Feature selection in hierarchical feature spaces
PPTX
RDF2Vec: RDF Graph Embeddings for Data Mining
PDF
Applications of Word Vectors in Text Retrieval and Classification
PPTX
mchristy-Dh2014- emop-postOCR-triage
PDF
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
PPTX
Mchristy-eMOP-workflows2-24x7
PPTX
mchristy-DH2014-emop-bookhistory-tools
PDF
ESWC 2013 Poster: Representing and Querying Negative Knowledge in RDF
PPTX
Entity Linking in Queries: Tasks and Evaluation
PDF
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
PPTX
Semantic Web questions we couldn't ask 10 years ago
PDF
Entity Search: The Last Decade and the Next
PPTX
The Empirical Turn in Knowledge Representation
PDF
Table Retrieval and Generation
PPTX
eMOP-PennSt-lunch
PPTX
Quick tour all handout
PDF
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpedia
PDF
Platforms and the Semantic Web
Framester: A Wide Coverage Linguistic Linked Data Hub
Computing with Directed Labeled Graphs
DS2014: Feature selection in hierarchical feature spaces
RDF2Vec: RDF Graph Embeddings for Data Mining
Applications of Word Vectors in Text Retrieval and Classification
mchristy-Dh2014- emop-postOCR-triage
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Mchristy-eMOP-workflows2-24x7
mchristy-DH2014-emop-bookhistory-tools
ESWC 2013 Poster: Representing and Querying Negative Knowledge in RDF
Entity Linking in Queries: Tasks and Evaluation
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
Semantic Web questions we couldn't ask 10 years ago
Entity Search: The Last Decade and the Next
The Empirical Turn in Knowledge Representation
Table Retrieval and Generation
eMOP-PennSt-lunch
Quick tour all handout
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpedia
Platforms and the Semantic Web

Similar to Link Discovery Tutorial Introduction (20)

PPTX
Knowledge Graph Introduction
PDF
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
PPTX
Lotico oct 2010
PPTX
Wi2015 - Clustering of Linked Open Data - the LODeX tool
PDF
Relaxing global-as-view in mediated data integration from linked data
PDF
Improving Entity Retrieval on Structured Data
PDF
Ontologies & linked open data
PDF
Hide the Stack: Toward Usable Linked Data
PDF
Session 1.5 supporting virtual integration of linked data with just-in-time...
ODP
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
PPTX
Mining and Managing Large-scale Linked Open Data
PPTX
Mining and Managing Large-scale Linked Open Data
PPTX
Linked Open Data - Masaryk University in Brno 8.11.2016
PDF
The technical case for a semantic web
ODP
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
PPTX
Processing Life Science Data at Scale - using Semantic Web Technologies
PPTX
Linked Open Data Utrecht University Library
PDF
Early Analysis and Debuggin of Linked Open Data Cubes
PPTX
Linked Data Modeling for Beginner
PDF
Linked Open Data
Knowledge Graph Introduction
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Lotico oct 2010
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Relaxing global-as-view in mediated data integration from linked data
Improving Entity Retrieval on Structured Data
Ontologies & linked open data
Hide the Stack: Toward Usable Linked Data
Session 1.5 supporting virtual integration of linked data with just-in-time...
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open Data
Linked Open Data - Masaryk University in Brno 8.11.2016
The technical case for a semantic web
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
Processing Life Science Data at Scale - using Semantic Web Technologies
Linked Open Data Utrecht University Library
Early Analysis and Debuggin of Linked Open Data Cubes
Linked Data Modeling for Beginner
Linked Open Data

More from Holistic Benchmarking of Big Linked Data (20)

PDF
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
PDF
Benchmarking Big Linked Data: The case of the HOBBIT Project
PDF
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
PDF
The DEBS Grand Challenge 2018
PPTX
Benchmarking of distributed linked data streaming systems
PDF
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
PDF
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
PPTX
The DEBS Grand Challenge 2017
PDF
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
PDF
Scalable Link Discovery for Modern Data-Driven Applications (poster)
PDF
An Evaluation of Models for Runtime Approximation in Link Discovery
PDF
Scalable Link Discovery for Modern Data-Driven Applications
PDF
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
PPTX
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
PDF
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
PDF
OKE2018 Challenge @ ESWC2018
PDF
MOCHA 2018 Challenge @ ESWC2018
PDF
Dynamic planning for link discovery - ESWC 2018
PDF
Hobbit project overview presented at EBDVF 2017
PDF
Leopard ISWC Semantic Web Challenge 2017 (poster)
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
Benchmarking Big Linked Data: The case of the HOBBIT Project
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
The DEBS Grand Challenge 2018
Benchmarking of distributed linked data streaming systems
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
The DEBS Grand Challenge 2017
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
Scalable Link Discovery for Modern Data-Driven Applications (poster)
An Evaluation of Models for Runtime Approximation in Link Discovery
Scalable Link Discovery for Modern Data-Driven Applications
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
OKE2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018
Dynamic planning for link discovery - ESWC 2018
Hobbit project overview presented at EBDVF 2017
Leopard ISWC Semantic Web Challenge 2017 (poster)

Recently uploaded (20)

PPTX
MBC Unit – 7 Nucleic acid metabolism.pptx
DOCX
atomic physics ookikkkkkkkkkkkkkkkkkkkkd
PPTX
Personality for guidance related to theories
PDF
Sujay Rao Mandavilli Degrowth delusion FINAL FINAL FINAL FINAL FINAL.pdf
PPT
what do you want to know about myeloprolifritive disorders .ppt
PPTX
Anesthesia for TURP sundromy anesthesia.ppt
PDF
CITOQUINAS EN ORTODONCIA BIOLOGIA DEL MOVIMIENTO
PDF
naas-journal-rating-2025 for all the journals
PDF
7th Introduction to Waves waves waves .pdf
PDF
FSNRD Proceeding Finalized on May 11 2021.pdf
PPTX
ENDOCRINE_SYSTEM_ANATOMY_AND_PHYSIOLOGY.pptx
PPTX
692953925-General-Chemistry-PPT-1.pptxhj
PPT
Fundamentals of Forensic DNA Typing .ppt
PPTX
Antihypertensive Medicinal Chemistry Unit II BP501T.pptx
PDF
Coronary artery disease.post mi and post
PDF
XUE: The CO2-rich terrestrial planet-forming region of an externally irradiat...
PDF
software engineering for computer science
PDF
Pentose Phosphate Pathway by Rishikanta Usham, Dhanamanjuri University
PPTX
UV-Visible spectroscopy Presentation.
PPTX
IMPROVEMENT IN FOOD AND RESOURCES Class 9th NCERT .pptx
MBC Unit – 7 Nucleic acid metabolism.pptx
atomic physics ookikkkkkkkkkkkkkkkkkkkkd
Personality for guidance related to theories
Sujay Rao Mandavilli Degrowth delusion FINAL FINAL FINAL FINAL FINAL.pdf
what do you want to know about myeloprolifritive disorders .ppt
Anesthesia for TURP sundromy anesthesia.ppt
CITOQUINAS EN ORTODONCIA BIOLOGIA DEL MOVIMIENTO
naas-journal-rating-2025 for all the journals
7th Introduction to Waves waves waves .pdf
FSNRD Proceeding Finalized on May 11 2021.pdf
ENDOCRINE_SYSTEM_ANATOMY_AND_PHYSIOLOGY.pptx
692953925-General-Chemistry-PPT-1.pptxhj
Fundamentals of Forensic DNA Typing .ppt
Antihypertensive Medicinal Chemistry Unit II BP501T.pptx
Coronary artery disease.post mi and post
XUE: The CO2-rich terrestrial planet-forming region of an externally irradiat...
software engineering for computer science
Pentose Phosphate Pathway by Rishikanta Usham, Dhanamanjuri University
UV-Visible spectroscopy Presentation.
IMPROVEMENT IN FOOD AND RESOURCES Class 9th NCERT .pptx

Link Discovery Tutorial Introduction

  • 1. Link Discovery Tutorial Introduction Axel-Cyrille Ngonga Ngomo(1) , Irini Fundulaki(2) , Mohamed Ahmed Sherif(1) (1) Institute for Applied Informatics, Germany (2) FORTH, Greece October 18th, 2016 Kobe, Japan Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 1 / 24
  • 2. Introduction Disclaimer No pursuit of completeness [Nen+15] Focus on Basic ideas and principles Principles Evaluation Open questions and challenges Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 2 / 24
  • 3. Introduction Disclaimer No pursuit of completeness [Nen+15] Focus on Basic ideas and principles Principles Evaluation Open questions and challenges Guidelines http://iswc2016ldtutorial.aksw.org/ Question? Just ask! Comment? Go for it! Be kind :) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 2 / 24
  • 4. Let’s Go! Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 3 / 24
  • 5. Linked Data Principles Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 4 / 24
  • 6. Why Link Discovery? 1 Linked Open Data Cloud 130+ billion triples ≈ 0.5 billion links Mostly owl:sameAs 2 Decentralized dataset creation 3 Complex information needs ⇒ Need to consume data across knowledge bases 4 Links are central for Cross-ontology QA Data Integration Reasoning Federated Queries ... Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 5 / 24
  • 7. Cross-Ontology QA Example Give me the name and description of all drugs that cure their side-effect. [SNA13] 1 Need information from Drugbank (Drug description) Sider (Side-effects) DBpedia (Description) 2 Gathering information via SPARQL query using links Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 6 / 24
  • 8. Cross-Ontology QA Example Give me the name and description of all drugs that cure their side-effect. SELECT ?drug ?name ?desc WHERE { ?drug a drugbank:Drug . ?drug rdfs:label ?name . ?drug drugbank:cures ?disease . ?drug owl:sameAs ?drug2 . ?drug owl:sameAs ?drug3 . ?drug2 sider:hasSideEffect ?effect . ?effect owl:sameAs ?disease . ?drug3 dbo:hasWikiPage ?desc . } Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 7 / 24
  • 9. Cross-Ontology QA Example (DEQA) Give me flats near kindergartens in Kobe. [Leh+12] SELECT ?flat WHERE { ?flat a deqa:Flat . ?flat deqa:near ?school . ?school a lgdo:School . ?school lgdo:city lgdo:Kobe . } Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 8 / 24
  • 10. Data Integration Federated Queries on Patient Data [Kha+14] Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 9 / 24
  • 11. Federated Queries Example (FedBench CD2) Return Barack Obama’s party membership and news pages. [Sal+15] SELECT ?party ?page WHERE { dbr:Barack_Obama dbo:party ?party . ?x nytimes:topicPage ?page . ?x owl:sameAs dbr:Barack_Obama . } Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 10 / 24
  • 12. Definition Definition (Link Discovery, informal) Given two knowledge bases S and T, find links of type R between S and T Here, declarative link discovery Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 11 / 24
  • 13. Definition Definition (Link Discovery, informal) Given two knowledge bases S and T, find links of type R between S and T Here, declarative link discovery Definition (Declarative Link Discovery, formal, similarities) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : R(s, t)} Common approach: Find M = {(s, t) ∈ S × T : σ(s, t) ≥ θ} Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 11 / 24
  • 14. Definition Definition (Link Discovery, informal) Given two knowledge bases S and T, find links of type R between S and T Here, declarative link discovery Definition (Declarative Link Discovery, formal, similarities) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : R(s, t)} Common approach: Find M = {(s, t) ∈ S × T : σ(s, t) ≥ θ} Definition (Declarative Link Discovery, formal, distances) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : R(s, t)} Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ τ} Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 11 / 24
  • 15. Definition Most common: R = owl:sameAs Also known as deduplication [Nen+15] Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 12 / 24
  • 16. Definition Goal: Address all possible relations R Declarative Link Discovery: Similarity/distance defined using property values (incl. property chains) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 13 / 24
  • 17. Definition Goal: Address all possible relations R Declarative Link Discovery: Similarity/distance defined using property values (incl. property chains) Example: R = :sameModel :s770fm rdfs:label "S770FM"@en :s770fm rdf:type :SABER :s770fm :model :770 :s770fm :top :FlamedMaple :s770fm :producer :Ibanez :s770fm rdfs:label "S770BEM"@en :s770fm rdf:type :SABER :s770fm :model :770 :s770fm :top :BirdEyeMaple :s770fm :producer :Ibanez Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 13 / 24
  • 18. Why is it difficult? 1 Time complexity Large number of triples (e.g., LinkedTCGA with 20.4 billion triples [Sal+14]) Quadratic a-priori runtime 69 days for mapping cities from DBpedia to Geonames Solutions usually in-memory (insufficient heap space) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 14 / 24
  • 19. Why is it difficult? 1 Time complexity Large number of triples (e.g., LinkedTCGA with 20.4 billion triples [Sal+14]) Quadratic a-priori runtime 69 days for mapping cities from DBpedia to Geonames Solutions usually in-memory (insufficient heap space) 2 Accuracy Combination of several attributes required for high precision Tedious discovery of most adequate mapping Dataset-dependent similarity functions Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 14 / 24
  • 20. Structure 1 Time complexity(≈ 60 min) LIMES algorithm [NA11] MultiBlock [IJB11] HR3 [Ngo12] AEGLE [GSN16] Summary and Challenges 2 Accuracy(≈ 30 min) RAVEN [Ngo+11] EAGLE [NL12] COALA [NLC13] Summary and Challenges 3 Benchmarking (≈ 30 min) Benchmarking [NGF16] Synthetic Benchmarks [Sav+15] Real Benchmarks [Mor+11] Summary and Challenges 4 Hands-On Session (≈ 45 min) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 15 / 24
  • 21. That’s all Folks! Axel Ngonga AKSW Research Group Institute for Applied Informatics [email protected] Irini Fundulaki ICS FORTH [email protected] Mohamed Ahmed Sherif AKSW Research Group Institute for Applied Informatics [email protected] Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 16 / 24
  • 22. Acknowledgment This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227). Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 17 / 24
  • 23. References I Kleanthi Georgala, Mohamed Ahmed Sherif, and Axel-Cyrille Ngonga Ngomo. “An Efficient Approach for the Generation of Allen Relations”. In: ECAI 2016 - 22nd European Conference on Artificial Intelligence, 29 August-2 September 2016, The Hague, The Netherlands - Including Prestigious Applications of Artificial Intelligence (PAIS 2016). Ed. by Gal A. Kaminka et al. Vol. 285. Frontiers in Artificial Intelligence and Applications. IOS Press, 2016, pp. 948–956. isbn: 978-1-61499-671-2. doi: 10.3233/978-1-61499-672-9-948. url: http://dx.doi.org/10.3233/978-1-61499-672-9-948. Robert Isele, Anja Jentzsch, and Christian Bizer. “Efficient Multidimensional Blocking for Link Discovery without losing Recall”. In: Proceedings of the 14th International Workshop on the Web and Databases 2011, WebDB 2011, Athens, Greece, June 12, 2011. Ed. by Amélie Marian and Vasilis Vassalos. 2011. url: http://webdb2011.rutgers.edu/papers/Paper%2039/silk.pdf. Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 18 / 24
  • 24. References II Yasar Khan et al. “SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes”. In: Proceedings of the 7th International Workshop on Semantic Web Applications and Tools for Life Sciences, Berlin, Germany, December 9-11, 2014. Ed. by Adrian Paschke et al. Vol. 1320. CEUR Workshop Proceedings. CEUR-WS.org, 2014. url: http://ceur-ws.org/Vol-1320/Preface_SWAT4LS2014.pdf. Jens Lehmann et al. “deqa: Deep Web Extraction for Question Answering”. In: The Semantic Web - ISWC 2012 - 11th International Semantic Web Conference, Boston, MA, USA, November 11-15, 2012, Proceedings, Part II. Ed. by Philippe Cudré-Mauroux et al. Vol. 7650. Lecture Notes in Computer Science. Springer, 2012, pp. 131–147. isbn: 978-3-642-35172-3. doi: 10.1007/978-3-642-35173-0_9. url: http://dx.doi.org/10.1007/978-3-642-35173-0_9. Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 19 / 24
  • 25. References III Mohamed Morsey et al. “DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data”. In: The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I. Ed. by Lora Aroyo et al. Vol. 7031. Lecture Notes in Computer Science. Springer, 2011, pp. 454–469. isbn: 978-3-642-25072-9. doi: 10.1007/978-3-642-25073-6_29. url: http://dx.doi.org/10.1007/978-3-642-25073-6_29. Axel-Cyrille Ngonga Ngomo and Sören Auer. “LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data”. In: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011. Ed. by Toby Walsh. IJCAI/AAAI, 2011, pp. 2312–2317. isbn: 978-1-57735-516-8. doi: 10.5591/978-1-57735-516-8/IJCAI11-385. url: http: //dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-385. Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 20 / 24
  • 26. References IV Markus Nentwig et al. “A survey of current Link Discovery frameworks”. In: Semantic Web Preprint (2015), pp. 1–18. Axel-Cyrille Ngonga Ngomo, Alejandra Garcıa-Rojas, and Irini Fundulaki. “HOBBIT: Holistic Benchmarking of Big Linked Data”. In: ERCIM News 2016.105 (2016). url: http://ercim-news.ercim.eu/en105/r-i/hobbit-holistic- benchmarking-of-big-linked-data. Axel-Cyrille Ngonga Ngomo et al. “RAVEN - active learning of link specifications”. In: Proceedings of the 6th International Workshop on Ontology Matching, Bonn, Germany, October 24, 2011. Ed. by Pavel Shvaiko et al. Vol. 814. CEUR Workshop Proceedings. CEUR-WS.org, 2011. url: http://ceur-ws.org/Vol-814/om2011_Tpaper3.pdf. Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 21 / 24
  • 27. References V Axel-Cyrille Ngonga Ngomo. “Link Discovery with Guaranteed Reduction Ratio in Affine Spaces with Minkowski Measures”. In: The Semantic Web - ISWC 2012 - 11th International Semantic Web Conference, Boston, MA, USA, November 11-15, 2012, Proceedings, Part I. Ed. by Philippe Cudré-Mauroux et al. Vol. 7649. Lecture Notes in Computer Science. Springer, 2012, pp. 378–393. isbn: 978-3-642-35175-4. doi: 10.1007/978-3-642-35176-1_24. url: http://dx.doi.org/10.1007/978-3-642-35176-1_24. Axel-Cyrille Ngonga Ngomo and Klaus Lyko. “EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming”. In: The Semantic Web: Research and Applications - 9th Extended Semantic Web Conference, ESWC 2012, Heraklion, Crete, Greece, May 27-31, 2012. Proceedings. Ed. by Elena Simperl et al. Vol. 7295. Lecture Notes in Computer Science. Springer, 2012, pp. 149–163. isbn: 978-3-642-30283-1. doi: 10.1007/978-3-642-30284-8_17. url: http://dx.doi.org/10.1007/978-3-642-30284-8_17. Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 22 / 24
  • 28. References VI Axel-Cyrille Ngonga Ngomo, Klaus Lyko, and Victor Christen. “COALA - Correlation-Aware Active Learning of Link Specifications”. In: The Semantic Web: Semantics and Big Data, 10th International Conference, ESWC 2013, Montpellier, France, May 26-30, 2013. Proceedings. Ed. by Philipp Cimiano et al. Vol. 7882. Lecture Notes in Computer Science. Springer, 2013, pp. 442–456. isbn: 978-3-642-38287-1. doi: 10.1007/978-3-642-38288-8_30. url: http://dx.doi.org/10.1007/978-3-642-38288-8_30. Muhammad Saleem et al. “TopFed: TCGA Tailored Federated Query Processing and Linking to LOD”. In: J. Biomedical Semantics 5 (2014), p. 47. doi: 10.1186/2041-1480-5-47. url: http://dx.doi.org/10.1186/2041-1480-5-47. Muhammad Saleem et al. “A fine-grained evaluation of SPARQL endpoint federation systems”. In: Semantic Web 7.5 (2015), pp. 493–518. doi: 10.3233/SW-150186. url: http://dx.doi.org/10.3233/SW-150186. Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 23 / 24
  • 29. References VII Tzanina Saveta et al. “LANCE: Piercing to the Heart of Instance Matching Tools”. In: The Semantic Web - ISWC 2015 - 14th International Semantic Web Conference, Bethlehem, PA, USA, October 11-15, 2015, Proceedings, Part I. Ed. by Marcelo Arenas et al. Vol. 9366. Lecture Notes in Computer Science. Springer, 2015, pp. 375–391. isbn: 978-3-319-25006-9. doi: 10.1007/978-3-319-25007-6_22. url: http://dx.doi.org/10.1007/978-3-319-25007-6_22. Saeedeh Shekarpour, Axel-Cyrille Ngonga Ngomo, and Sören Auer. “Question answering on interlinked data”. In: 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, May 13-17, 2013. Ed. by Daniel Schwabe et al. International World Wide Web Conferences Steering Committee / ACM, 2013, pp. 1145–1156. isbn: 978-1-4503-2035-1. url: http://dl.acm.org/citation.cfm?id=2488488. Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 24 / 24