SlideShare a Scribd company logo
TinkerPop Backed By Accumulo
6/12/2014
Ryan Webb
Associate Professional
Ryan.Webb@jhuapl.edu
Agenda
 Introduction to TinkerPop
 Detailed Implementation
 Obstacles
 Overcoming Obstacles
 Map Reduce Integration
 Performance
Background
 Associate Professional at The Johns Hopkins Applied Physics
Laboratory
 Bachelors of Science in Computer Science with a minor in
Mathematics from the University of Delaware
 Pursing a Masters in Computer Science with a focus on
Distributed Systems at the Whiting School of Engineering
TinkerPop Blueprints
 Foundational technology for a
complete graph stack
 Extensive test suite to ensure
implementations follow all the
rules required.
 Only a simple API
 getVertex
 getEdge
 setProperty
 getProperty
 Multiple Interfaces with
incremental features
TinkerPop Blueprints Graph API
Graph Creation
Configuration cfg = new AccumuloGraphConfiguration()
.instance("accumulo").user("user").zkHosts("zk1")
.password("password".getBytes()).name("myGraph");
Graph graph = GraphFactory.open(cfg);
Vertex v1 = graph.addVertex("1");
v1.setProperty("name", "Alice");
Vertex v2 = graph.addVertex("2");
v2.setProperty("name", "Bob");
Edge e1 = graph.addEdge("E1", v1, v2, "knows");
e1.setProperty("since", new Date());
Trade off Spectrum
Consistency
Performance
Accumulo Implementation
 Base Naïve implementation passes all required TinkerPop tests
 Far Right of the spectrum
 As consistent as you can get
 Table Structure
 Edge and Vertex
 Edge and Vertex Index table
 Metadata Table for indexes
Table Structure
Vertex
Edge
Row ID Column Family Column Qualifier Value
VertexID Label Flag Exists Flag [empty]
VertexID INVERTEX OutVertexID_EdgeID Edge Label
VertexID OUTVERTEX InVertexID_EdgeID Edge Label
VertexID Property Key [empty] Serialized Value
Row ID Column Family Column Qualifier Value
EdgeID Label Flag InVertexID_OutVertexID Edge Label
EdgeID Property Key [empty] Serialized Value
Graph Access and Index Creation/Use
// Access before Index
for (Vertex v: graph.getVertices()) {
String name = v.getProperty("name");
}
((KeyIndexableGraph)graph)
.createKeyIndex("name", Vertex.class);
// Access after Index
for (Vertex v: graph.getVertices()) {
String name = v.getProperty("name");
}
Table Structure - Continued
Indexes
Metadata
Row Column Family Column Qualifier Value
Serialized Value Property Key VertexID [empty]
Row Column Family Column Qualifier Value
Index Name Index Class [empty] [empty]
Obstacles
 Existence checking is expensive
 Required for TinkerPop test suite
 Writing every graph object out is expensive
 Building indexes post ingest is expensive
 Blocking, full table scan
 Consistency is expensive
Overcoming Obstacles
Give more power to users who know they are using an Accumulo
Graph
 Ingest Improvements
 Give option to disable existence checks
 Allow manual batching
 Specialized Ingest path
 Traversal Improvements
 Attribute preloading
 Property caching
 Element caching
Simple Bulk Ingest
// Will migrate to BatchGraph
AccumuloBulkIngester g = new AccumuloBulkIngester(cfg);
PropertyBuilder v1 = g.addVertex("ID1");
PropertyBuilder v2 = g.addVertex("ID2");
PropertyBuilder edge = g.addEdge("ID1", "ID2", "knows");
v1.add("name", "alice");
v2.add("name", "bob");
edge.add("since", new Date());
Map Reduce Integration
 In your Tool
j.setInputFormatClass(VertexInputFormat.class);
VertexInputFormat.setAccumuloGraphConfiguration(
new AccumuloGraphConfiguration()
.instance(“accumulo").zkHosts(“zk1").user("root")
.password(“secret".getBytes()).name("myGraph"));
 In your Mapper
public void map(Text k, Vertex v, Context c){
System.out.println(v.getId().toString());
}
Results
2 Nodes 4 Nodes 8 Nodes
20 Hours 9 Minutes 13 Hours 47 Minutes 7 Hours 4 Minutes
Cluster Stats
8 Node Cluster
64 GB Ram
Quad-Core Xeon Processor 2.50GHz 10MB
2x 4 TB 6.0Gb/s 7200 RPM Drives
1 Gb/s Networking
Accumulo 1.5.1, Hadoop 2.0.0 – MR1
Stanford SNAP Friendster Graph
65,608,366 Vertices
1,806,067,135 Edges
2 Nodes 4 Nodes 8 Nodes
55 Minutes 29 Minutes 15 Minutes
Vertex Iteration
Ingest
Conclusion
 Simple, easy to read graph API
 Give developers a lot of tuning points for their implementations
 Performance is “good enough”
 Not meant for high performance, specialized solutions
 Quick to develop new ideas and investigate your graph.
 Easy to integrate and already integrated.
 Low effort to get REST access to your graph
Future
 Polish and open source
 Iterators
 Locality Groups
 Addressing Security
 Graph Query
 Extending MapReduce Integration
 Upgrading to Accumulo 1.6, TinkerPop 2.5
 Conditional Mutations
 Table namespaces
Resources
 http://www.tinkerpop.com/
 http://snap.stanford.edu/data/com-Friendster.html
Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation

More Related Content

PPTX
Barbara Nelson [InfluxData] | How Can I Put That Dashboard in My App? | Influ...
PPTX
Taming the Tiger: Tips and Tricks for Using Telegraf
PDF
TensorFlow on GCP
PDF
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
PDF
Deep dive into deeplearn.js
PDF
GR.jl - Plotting for Julia based on GR
PDF
Write your own telegraf plugin
PDF
From Java to Kotlin - The first month in practice v2
Barbara Nelson [InfluxData] | How Can I Put That Dashboard in My App? | Influ...
Taming the Tiger: Tips and Tricks for Using Telegraf
TensorFlow on GCP
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Deep dive into deeplearn.js
GR.jl - Plotting for Julia based on GR
Write your own telegraf plugin
From Java to Kotlin - The first month in practice v2

What's hot (20)

PDF
How to make GAE adapt the Great Firewall
PDF
Scaling up data science applications
PDF
Getting more out of Matplotlib with GR
PDF
Updates on the Fake Object Pipeline for HSC Survey
PDF
R and C++
ODP
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
PPT
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
PDF
R and cpp
PDF
Tank War and Katch and Pop
PDF
Business Dashboards using Bonobo ETL, Grafana and Apache Airflow
PDF
How to Build a Telegraf Plugin by Noah Crowley
PPTX
An introduction to Test Driven Development on MapReduce
PDF
PDF
Elasticsearch's aggregations & esctl in action or how i built a cli tool...
PDF
Property-based Testing and Generators (Lua)
PDF
Functional Programming with JavaScript
PDF
Últimas atualizações de produtividade no Visual Studio 2017​
PPTX
02 database eudomdet
PDF
QConSF 2014 talk on Netflix Mantis, a stream processing system
KEY
Testing Hadoop jobs with MRUnit
How to make GAE adapt the Great Firewall
Scaling up data science applications
Getting more out of Matplotlib with GR
Updates on the Fake Object Pipeline for HSC Survey
R and C++
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
R and cpp
Tank War and Katch and Pop
Business Dashboards using Bonobo ETL, Grafana and Apache Airflow
How to Build a Telegraf Plugin by Noah Crowley
An introduction to Test Driven Development on MapReduce
Elasticsearch's aggregations & esctl in action or how i built a cli tool...
Property-based Testing and Generators (Lua)
Functional Programming with JavaScript
Últimas atualizações de produtividade no Visual Studio 2017​
02 database eudomdet
QConSF 2014 talk on Netflix Mantis, a stream processing system
Testing Hadoop jobs with MRUnit
Ad

Viewers also liked (6)

PPTX
Accumulo Summit 2014: Accumulo Visibility Labels and Pluggable Authorization ...
PPTX
Accumulo Summit 2014: Accumulo on YARN
PPTX
Accumulo Summit 2015: Attempting to answer unanswerable questions: Key manage...
PDF
Cassandra Day London 2015: Securing Cassandra and DataStax Enterprise
PDF
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
PDF
Securing Cassandra The Right Way
Accumulo Summit 2014: Accumulo Visibility Labels and Pluggable Authorization ...
Accumulo Summit 2014: Accumulo on YARN
Accumulo Summit 2015: Attempting to answer unanswerable questions: Key manage...
Cassandra Day London 2015: Securing Cassandra and DataStax Enterprise
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Securing Cassandra The Right Way
Ad

Similar to Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation (20)

PDF
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
PDF
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
PDF
Accumulo Summit Keynote 2018
PDF
Gelly in Apache Flink Bay Area Meetup
PDF
Improving the Accumulo User Experience
PPTX
Apache Accumulo 1.8.0 Overview
PDF
VelocityGraph Introduction
PDF
Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]
DOCX
Granular Access Control Using Cell Level Security In Accumulo
PDF
Performance Models for Apache Accumulo
PPTX
Mrongraphs acm-sig-2 (1)
PDF
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
PDF
Processing large-scale graphs with Google(TM) Pregel
PDF
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
PPTX
Introduction to Apache Accumulo
PDF
Accumulo Tutorial — Up and Running (or at Least Walking) in 90 Minutes
PDF
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
PDF
Machine Learning & Graph Processing w/ Spark and Accumulo
PDF
Mapreduce Algorithms
PDF
EclipseCon-Europe 2013: Optimizing performance - how to make your Eclipse-bas...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Accumulo Summit Keynote 2018
Gelly in Apache Flink Bay Area Meetup
Improving the Accumulo User Experience
Apache Accumulo 1.8.0 Overview
VelocityGraph Introduction
Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]
Granular Access Control Using Cell Level Security In Accumulo
Performance Models for Apache Accumulo
Mrongraphs acm-sig-2 (1)
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Processing large-scale graphs with Google(TM) Pregel
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Introduction to Apache Accumulo
Accumulo Tutorial — Up and Running (or at Least Walking) in 90 Minutes
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Machine Learning & Graph Processing w/ Spark and Accumulo
Mapreduce Algorithms
EclipseCon-Europe 2013: Optimizing performance - how to make your Eclipse-bas...

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
MYSQL Presentation for SQL database connectivity
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Advanced IT Governance
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
20250228 LYD VKU AI Blended-Learning.pptx
Review of recent advances in non-invasive hemoglobin estimation
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The AUB Centre for AI in Media Proposal.docx
Per capita expenditure prediction using model stacking based on satellite ima...
Spectral efficient network and resource selection model in 5G networks
MYSQL Presentation for SQL database connectivity
The Rise and Fall of 3GPP – Time for a Sabbatical?
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Advanced IT Governance
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Mobile App Security Testing_ A Comprehensive Guide.pdf
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation

  • 1. TinkerPop Backed By Accumulo 6/12/2014 Ryan Webb Associate Professional [email protected]
  • 2. Agenda  Introduction to TinkerPop  Detailed Implementation  Obstacles  Overcoming Obstacles  Map Reduce Integration  Performance
  • 3. Background  Associate Professional at The Johns Hopkins Applied Physics Laboratory  Bachelors of Science in Computer Science with a minor in Mathematics from the University of Delaware  Pursing a Masters in Computer Science with a focus on Distributed Systems at the Whiting School of Engineering
  • 4. TinkerPop Blueprints  Foundational technology for a complete graph stack  Extensive test suite to ensure implementations follow all the rules required.  Only a simple API  getVertex  getEdge  setProperty  getProperty  Multiple Interfaces with incremental features
  • 6. Graph Creation Configuration cfg = new AccumuloGraphConfiguration() .instance("accumulo").user("user").zkHosts("zk1") .password("password".getBytes()).name("myGraph"); Graph graph = GraphFactory.open(cfg); Vertex v1 = graph.addVertex("1"); v1.setProperty("name", "Alice"); Vertex v2 = graph.addVertex("2"); v2.setProperty("name", "Bob"); Edge e1 = graph.addEdge("E1", v1, v2, "knows"); e1.setProperty("since", new Date());
  • 8. Accumulo Implementation  Base Naïve implementation passes all required TinkerPop tests  Far Right of the spectrum  As consistent as you can get  Table Structure  Edge and Vertex  Edge and Vertex Index table  Metadata Table for indexes
  • 9. Table Structure Vertex Edge Row ID Column Family Column Qualifier Value VertexID Label Flag Exists Flag [empty] VertexID INVERTEX OutVertexID_EdgeID Edge Label VertexID OUTVERTEX InVertexID_EdgeID Edge Label VertexID Property Key [empty] Serialized Value Row ID Column Family Column Qualifier Value EdgeID Label Flag InVertexID_OutVertexID Edge Label EdgeID Property Key [empty] Serialized Value
  • 10. Graph Access and Index Creation/Use // Access before Index for (Vertex v: graph.getVertices()) { String name = v.getProperty("name"); } ((KeyIndexableGraph)graph) .createKeyIndex("name", Vertex.class); // Access after Index for (Vertex v: graph.getVertices()) { String name = v.getProperty("name"); }
  • 11. Table Structure - Continued Indexes Metadata Row Column Family Column Qualifier Value Serialized Value Property Key VertexID [empty] Row Column Family Column Qualifier Value Index Name Index Class [empty] [empty]
  • 12. Obstacles  Existence checking is expensive  Required for TinkerPop test suite  Writing every graph object out is expensive  Building indexes post ingest is expensive  Blocking, full table scan  Consistency is expensive
  • 13. Overcoming Obstacles Give more power to users who know they are using an Accumulo Graph  Ingest Improvements  Give option to disable existence checks  Allow manual batching  Specialized Ingest path  Traversal Improvements  Attribute preloading  Property caching  Element caching
  • 14. Simple Bulk Ingest // Will migrate to BatchGraph AccumuloBulkIngester g = new AccumuloBulkIngester(cfg); PropertyBuilder v1 = g.addVertex("ID1"); PropertyBuilder v2 = g.addVertex("ID2"); PropertyBuilder edge = g.addEdge("ID1", "ID2", "knows"); v1.add("name", "alice"); v2.add("name", "bob"); edge.add("since", new Date());
  • 15. Map Reduce Integration  In your Tool j.setInputFormatClass(VertexInputFormat.class); VertexInputFormat.setAccumuloGraphConfiguration( new AccumuloGraphConfiguration() .instance(“accumulo").zkHosts(“zk1").user("root") .password(“secret".getBytes()).name("myGraph"));  In your Mapper public void map(Text k, Vertex v, Context c){ System.out.println(v.getId().toString()); }
  • 16. Results 2 Nodes 4 Nodes 8 Nodes 20 Hours 9 Minutes 13 Hours 47 Minutes 7 Hours 4 Minutes Cluster Stats 8 Node Cluster 64 GB Ram Quad-Core Xeon Processor 2.50GHz 10MB 2x 4 TB 6.0Gb/s 7200 RPM Drives 1 Gb/s Networking Accumulo 1.5.1, Hadoop 2.0.0 – MR1 Stanford SNAP Friendster Graph 65,608,366 Vertices 1,806,067,135 Edges 2 Nodes 4 Nodes 8 Nodes 55 Minutes 29 Minutes 15 Minutes Vertex Iteration Ingest
  • 17. Conclusion  Simple, easy to read graph API  Give developers a lot of tuning points for their implementations  Performance is “good enough”  Not meant for high performance, specialized solutions  Quick to develop new ideas and investigate your graph.  Easy to integrate and already integrated.  Low effort to get REST access to your graph
  • 18. Future  Polish and open source  Iterators  Locality Groups  Addressing Security  Graph Query  Extending MapReduce Integration  Upgrading to Accumulo 1.6, TinkerPop 2.5  Conditional Mutations  Table namespaces