Benchmarks 
What benchmarks are commonly 
used and what they mean
Overview 
• SP2Bench 
• LUBM 
• BSBM 
• UOBM 
• SIB 
• DBPedia SPARQL 
Benchmark 
• LODIB 
• FedBench 
• THALIA Testbed 
• Benchmark for Spatial 
Semantic Web 
Systems 
• LODQA 
• LinkBench
SP2Bench 
• Is language specific: SPARQL-based 
performance benchmark. 
• Components: data generator, query set 
• provides a scalable RDF data generator and a 
set of benchmark queries, designed to test 
typical SPARQL operator constellations and 
RDF data access patterns. 
• Example comparison: 
http://arxiv.org/pdf/0806.4627v2.pdf
LUBM 
• The Lehigh University Benchmark 
• “The Lehigh University Benchmark is developed 
to facilitate the evaluation of Semantic Web 
repositories in a standard and systematic way. 
The benchmark is intended to evaluate the 
performance of those repositories with respect to 
extensional queries over a large data set that 
commits to a single realistic ontology.“ 
• Components: ontology, data generator, test 
queries, tester 
• http://swat.cse.lehigh.edu/projects/lubm/
BSBM 
• Berlin SPARQL Benchmark 
• provides for comparing the performance of 
RDF and Named Graph stores as well as RDF-mapped 
relational databases and other 
systems that expose SPARQL endpoints. 
Designed along an e-commerce use case. 
SPARQL and SQL version available. 
• http://wifo5-03.informatik.uni-mannheim. 
de/bizer/berlinsparqlbenchmark/
UOBM 
• The Ontology Benchmark 
• Extends the LUBM benchmark in terms of 
inference and scalability testing. 
• Components: ontology and test data set 
• http://www.springerlink.com/content/l0wu54 
3x26350462/University
SIB 
• Social Network Intelligence Benchmark (SIB) 
• A benchmark suite developed by people at 
CWI and Openlink taking the schema from 
Social Networks for generating test areas 
where RDF/SPARQL can truly excel, and 
challenging query processing over highly 
connected graph. 
• http://www.w3.org/wiki/Social_Network_Inte 
lligence_BenchMark
DBPedia SPARQL Benchmark 
• Designed to be benchmark against DBPedia 
data in order to provide a clear picture of real 
world performance. 
• “Performance Assessment with Real Queries 
on Real Data.” 
• http://svn.aksw.org/papers/2011/VLDB_AKSW 
Benchmark/public.pdf
LODIB 
• The Linked Data Integration Benchmark 
• Is a benchmark for comparing the expressivity 
as well as the runtime performance of Linked 
Data translation/integration systems. 
• http://wifo5-03.informatik.uni-mannheim. 
de/bizer/lodib/
FedBench 
• Benchmark for measuring the performance of 
federated SPARQL query processing. 
• ISWC2011 whitepaper: https://www.uni-koblenz. 
de/~goerlitz/publications/ISWC2011- 
FedBench.pdf
THALIA Testbed 
• Is designed to test the expressiveness of 
relational-to-RDF mapping languages. 
• http://esw.w3.org/topic/TaskForces/Communi 
tyProjects/LinkingOpenData/THALIATestbed
Benchmark for Spatial Semantic Web 
Systems 
• Extends LUBM with sample spatial data. 
• https://filebox.vt.edu/users/dkolas/public/ssb 
m/
LODQA 
• The Linked Open Data Quality Assessment 
• Is a benchmark for comparing data quality 
assessment and data fusion systems. 
• https://filebox.vt.edu/users/dkolas/public/ssb 
m/
LinkBench 
• A database benchmark that is designed for the 
Facebook Social Graph. 
• Whitepaper: 
http://people.cs.uchicago.edu/~tga/pubs/sig 
mod-linkbench-2013.pdf 
• https://www.facebook.com/notes/facebook-engineering/ 
linkbench-a-database-benchmark- 
for-the-social-graph/ 
10151391496443920
Performance Results 
• Results provided by store implementers themselves: 
– Virtuoso BSBM benchmark results (native RDF store versus mapped relational 
database) 
– Jena TDB BSBM benchmark results (native RDF store) 
– OWLIM Benchmark results (LUBM, BSBM and Linked Data loading/inference) 
– SemWeb .NET library BSBM benchmark results 
– Virtuoso LUBM benchmark results 
– AllegroGraph 2.0 Benchmark for LUBM-50-0 
– Sesame NativeStore LUBM benchmark results 
– RacerPro LUBM benchmark results 
– SwiftOWLIM benchmark results for the LUBM and City benchmark (from slide 
27 onwards) 
– Oracle 11g benchmark results for the LUBM and Uniprot benchmark (from 
slide 20 onwards) 
– Jena SDB/Query performance and SDB/Loading performance 
– Bigdata BSBM V3 Reduced Query Mix benchmark results
Performance Results 
• Results provided by third parties: 
– Cudre-Mauroux, et al.: NoSQL Databases for RDF: An Empirical Evaluation (November 2013, 
Uses the BSBM benchmark with workloads from 10 million to 1 billion triples to benchmark 
several NoSQL databases). 
– Peter Boncz, Minh-Duc Pham: Berlin SPARQL Benchmark Results for Virtuoso, Jena TDB, 
BigData, and BigOWLIM (April 2013, 100 million to 150 billion triples, Explore and Business 
Intelligence Use Cases). 
– Christian Bizer, Andreas Schultz: Berlin SPARQL Benchmark Results for Virtuoso, Jena TDB, 
4store, BigData, and BigOWLIM(February 2011, 100 and 200 million triples, Explore and 
Update Use Cases). 
– Christian Bizer, Andreas Schultz: Berlin SPARQL Benchmark Results for Virtuoso, Jena TDB and 
BigOWLIM(November 2009, 100 and 200 million triples). 
– L.Sidirourgos et al.: Column-Store Support for RDF Data Management: not all swans are white. 
An experimental analysis along two dimensions – triple-store vs. vertically-partitioned and 
row-store vs. column-store – individually, before analyzing their combined effects. In VLDB 
2008. 
– Christian Bizer, Andreas Schultz: Berlin SPARQL Benchmark Results. Benchmark along an e-commerce 
use case comparing Virtuoso, Sesame, Jena TDB, D2R Server and MySQL with 
datasets ranging from 250,000 to 100,000,000 triples and setting the results into relation to 
two RDBMS. 2008. (Note: As discussed in Orri Erling's blog, the SQL mix results did not 
accurately reflect steady-state of all players, and should be taken with a grain of salt. Warm-up 
steps will change for future runs.)
Performance Results 
• Results provided by third parties (cont): 
– Michael Schmidt et al.: SP2Bench: A SPARQL Performance Benchmark. Benchmark based on the DBLP data 
set comparing current versions of ARQ, Redland, Sesame, SDB, and Virtuoso. TR, 2008 (short version of the 
TR to appear in ICDE 2009). 
– Michael Schmidt et al.: An Experimental Comparison of RDF Data Management Approaches in a SPARQL 
Benchmark Scenario. Benchmarking Relational Database schemes on top of SP2Bench Suite. In ISWC 2008. 
– Atanas Kiryakov: Measurable Targets for Scalable Reasoning 
– Baolin Liu and Bo Hu: An Evaluation of RDF Storage Systems for Large Data Applications 
– Christian Becker: RDF Store Benchmarks with DBpedia comparing Virtuoso, SDB and Sesame. 2007 
– Kurt Rohloff et al.: An Evaluation of Triple-Store Technologies for Large Data Stores. Comparing Sesame, Jena 
and AllegroGraph. 2007 
– Christian Weiske: SPARQL Engines Benchmark Results 
– Ryan Lee: Scalability Report on Triple Store Applications comparing Jena, Kowari, 3store, Sesame. 2004 
– Martin Svihala, Ivan Jelinek: Benchmarking RDF Production Tools Paper comparing the performance of 
relational database to RDF mapping tools (METAmorphoses, D2RQ, SquirrelRDF) with native RDF stores 
(Jena, Sesame) 
– Michael Streatfield, Hugh Glaser: Benchmarking RDF Triplestores, 2005
Observations 
• LUBM and BSBM results are shown often on the 
major players’ own websites. 
• SP2Bench is harder to find on their websites, and 
other resources. 
• A lot of the reported results by companies 
highlights performance on a very high 
performance computer, not commodity 
computers. 
• http://www.w3.org/wiki/RdfStoreBenchmarking

SPARQL and Linked Data Benchmarking

  • 1.
    Benchmarks What benchmarksare commonly used and what they mean
  • 2.
    Overview • SP2Bench • LUBM • BSBM • UOBM • SIB • DBPedia SPARQL Benchmark • LODIB • FedBench • THALIA Testbed • Benchmark for Spatial Semantic Web Systems • LODQA • LinkBench
  • 3.
    SP2Bench • Islanguage specific: SPARQL-based performance benchmark. • Components: data generator, query set • provides a scalable RDF data generator and a set of benchmark queries, designed to test typical SPARQL operator constellations and RDF data access patterns. • Example comparison: http://arxiv.org/pdf/0806.4627v2.pdf
  • 4.
    LUBM • TheLehigh University Benchmark • “The Lehigh University Benchmark is developed to facilitate the evaluation of Semantic Web repositories in a standard and systematic way. The benchmark is intended to evaluate the performance of those repositories with respect to extensional queries over a large data set that commits to a single realistic ontology.“ • Components: ontology, data generator, test queries, tester • http://swat.cse.lehigh.edu/projects/lubm/
  • 5.
    BSBM • BerlinSPARQL Benchmark • provides for comparing the performance of RDF and Named Graph stores as well as RDF-mapped relational databases and other systems that expose SPARQL endpoints. Designed along an e-commerce use case. SPARQL and SQL version available. • http://wifo5-03.informatik.uni-mannheim. de/bizer/berlinsparqlbenchmark/
  • 6.
    UOBM • TheOntology Benchmark • Extends the LUBM benchmark in terms of inference and scalability testing. • Components: ontology and test data set • http://www.springerlink.com/content/l0wu54 3x26350462/University
  • 7.
    SIB • SocialNetwork Intelligence Benchmark (SIB) • A benchmark suite developed by people at CWI and Openlink taking the schema from Social Networks for generating test areas where RDF/SPARQL can truly excel, and challenging query processing over highly connected graph. • http://www.w3.org/wiki/Social_Network_Inte lligence_BenchMark
  • 8.
    DBPedia SPARQL Benchmark • Designed to be benchmark against DBPedia data in order to provide a clear picture of real world performance. • “Performance Assessment with Real Queries on Real Data.” • http://svn.aksw.org/papers/2011/VLDB_AKSW Benchmark/public.pdf
  • 9.
    LODIB • TheLinked Data Integration Benchmark • Is a benchmark for comparing the expressivity as well as the runtime performance of Linked Data translation/integration systems. • http://wifo5-03.informatik.uni-mannheim. de/bizer/lodib/
  • 10.
    FedBench • Benchmarkfor measuring the performance of federated SPARQL query processing. • ISWC2011 whitepaper: https://www.uni-koblenz. de/~goerlitz/publications/ISWC2011- FedBench.pdf
  • 11.
    THALIA Testbed •Is designed to test the expressiveness of relational-to-RDF mapping languages. • http://esw.w3.org/topic/TaskForces/Communi tyProjects/LinkingOpenData/THALIATestbed
  • 12.
    Benchmark for SpatialSemantic Web Systems • Extends LUBM with sample spatial data. • https://filebox.vt.edu/users/dkolas/public/ssb m/
  • 13.
    LODQA • TheLinked Open Data Quality Assessment • Is a benchmark for comparing data quality assessment and data fusion systems. • https://filebox.vt.edu/users/dkolas/public/ssb m/
  • 14.
    LinkBench • Adatabase benchmark that is designed for the Facebook Social Graph. • Whitepaper: http://people.cs.uchicago.edu/~tga/pubs/sig mod-linkbench-2013.pdf • https://www.facebook.com/notes/facebook-engineering/ linkbench-a-database-benchmark- for-the-social-graph/ 10151391496443920
  • 15.
    Performance Results •Results provided by store implementers themselves: – Virtuoso BSBM benchmark results (native RDF store versus mapped relational database) – Jena TDB BSBM benchmark results (native RDF store) – OWLIM Benchmark results (LUBM, BSBM and Linked Data loading/inference) – SemWeb .NET library BSBM benchmark results – Virtuoso LUBM benchmark results – AllegroGraph 2.0 Benchmark for LUBM-50-0 – Sesame NativeStore LUBM benchmark results – RacerPro LUBM benchmark results – SwiftOWLIM benchmark results for the LUBM and City benchmark (from slide 27 onwards) – Oracle 11g benchmark results for the LUBM and Uniprot benchmark (from slide 20 onwards) – Jena SDB/Query performance and SDB/Loading performance – Bigdata BSBM V3 Reduced Query Mix benchmark results
  • 16.
    Performance Results •Results provided by third parties: – Cudre-Mauroux, et al.: NoSQL Databases for RDF: An Empirical Evaluation (November 2013, Uses the BSBM benchmark with workloads from 10 million to 1 billion triples to benchmark several NoSQL databases). – Peter Boncz, Minh-Duc Pham: Berlin SPARQL Benchmark Results for Virtuoso, Jena TDB, BigData, and BigOWLIM (April 2013, 100 million to 150 billion triples, Explore and Business Intelligence Use Cases). – Christian Bizer, Andreas Schultz: Berlin SPARQL Benchmark Results for Virtuoso, Jena TDB, 4store, BigData, and BigOWLIM(February 2011, 100 and 200 million triples, Explore and Update Use Cases). – Christian Bizer, Andreas Schultz: Berlin SPARQL Benchmark Results for Virtuoso, Jena TDB and BigOWLIM(November 2009, 100 and 200 million triples). – L.Sidirourgos et al.: Column-Store Support for RDF Data Management: not all swans are white. An experimental analysis along two dimensions – triple-store vs. vertically-partitioned and row-store vs. column-store – individually, before analyzing their combined effects. In VLDB 2008. – Christian Bizer, Andreas Schultz: Berlin SPARQL Benchmark Results. Benchmark along an e-commerce use case comparing Virtuoso, Sesame, Jena TDB, D2R Server and MySQL with datasets ranging from 250,000 to 100,000,000 triples and setting the results into relation to two RDBMS. 2008. (Note: As discussed in Orri Erling's blog, the SQL mix results did not accurately reflect steady-state of all players, and should be taken with a grain of salt. Warm-up steps will change for future runs.)
  • 17.
    Performance Results •Results provided by third parties (cont): – Michael Schmidt et al.: SP2Bench: A SPARQL Performance Benchmark. Benchmark based on the DBLP data set comparing current versions of ARQ, Redland, Sesame, SDB, and Virtuoso. TR, 2008 (short version of the TR to appear in ICDE 2009). – Michael Schmidt et al.: An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario. Benchmarking Relational Database schemes on top of SP2Bench Suite. In ISWC 2008. – Atanas Kiryakov: Measurable Targets for Scalable Reasoning – Baolin Liu and Bo Hu: An Evaluation of RDF Storage Systems for Large Data Applications – Christian Becker: RDF Store Benchmarks with DBpedia comparing Virtuoso, SDB and Sesame. 2007 – Kurt Rohloff et al.: An Evaluation of Triple-Store Technologies for Large Data Stores. Comparing Sesame, Jena and AllegroGraph. 2007 – Christian Weiske: SPARQL Engines Benchmark Results – Ryan Lee: Scalability Report on Triple Store Applications comparing Jena, Kowari, 3store, Sesame. 2004 – Martin Svihala, Ivan Jelinek: Benchmarking RDF Production Tools Paper comparing the performance of relational database to RDF mapping tools (METAmorphoses, D2RQ, SquirrelRDF) with native RDF stores (Jena, Sesame) – Michael Streatfield, Hugh Glaser: Benchmarking RDF Triplestores, 2005
  • 18.
    Observations • LUBMand BSBM results are shown often on the major players’ own websites. • SP2Bench is harder to find on their websites, and other resources. • A lot of the reported results by companies highlights performance on a very high performance computer, not commodity computers. • http://www.w3.org/wiki/RdfStoreBenchmarking