Neo4j a NOSQL overview
                     and
                the benefits of
               graph databases
                             #neo4j
Emil Eifrem                  @emileifrem
CEO, Neo Technology          emil@neotechnology.com
What's the plan?

 Why now? – Four trends

 NoSQL NOSQL overview

 Graph databases && Neo4j

 Conclusions

 Food
Trend 1:
data set size



40
2007            Source: IDC 2007
988

Trend 1:
data set size



40
2007            2010   Source: IDC 2007
Trend 2: connectedness
                                                                                  Giant
                                                                                 Global
 Information connectivity


                                                                                 Graph
                                                                                 (GGG)

                                                                    Ontologies


                                                              RDF

                                                                          Folksonomies
                                                          Tagging

                                                                User-
                                              Wikis
                                                              generated
                                                               content
                                                      Blogs


                                             RSS


                                 Hypertext


                       Text
                    documents      web 1.0            web 2.0              “web 3.0”

                                1990         2000                   2010                  2020
Trend 3: semi-structure
 Individualization of content!
   In the salary lists of the 1970s, all elements had
   exactly one job
   In the salary lists of the 2000s, we need 5 job
   columns! Or 8? Or 15?

 Trend accelerated by the decentralization of
 content generation that is the hallmark of the age
 of participation (“web 2.0”)
Aside: RDBMS performance
                                                            Relational database
               Salary List
 Performance




                             Majority of
                             Webapps



                                           Social network

                                                                   Semantic




                                                  }
                                                                    Trading




                                                              custom


                                                 Data complexity
Trend 4: architecture

       1990s: Database as integration hub
Trend 4: architecture

               2000s: (Slowly towards...)
       Decoupled services with own backend
Why NoSQL 2009?

 Trend 1: Size.

 Trend 2: Connectivity.

 Trend 3: Semi-structure.

 Trend 4: Architecture.
NoSQL
overview
First off: the damn name

 NoSQL is NOT “Never SQL”

 NoSQL is NOT “No To SQL”

 NoSQL is NOT “WE HATE CHRIS' DOG”
NOSQL
    is simply


Not Only SQL!
Four (emerging) NOSQL categories
 Key-value stores
   Based on DHTs / Amazon's Dynamo paper
   Data model: (global) collection of K-V pairs
   Example: Dynomite, Voldemort, Tokyo

 BigTable clones
   Based on Google's BigTable paper
   Data model: big table, column families
   Example: Hbase, Hypertable
Four (emerging) NOSQL categories
 Document databases
   Inspired by Lotus Notes
   Data model: collections of K-V collections
   Example: CouchDB, MongoDB

 Graph databases
   Inspired by Euler & graph theory
   Data model: nodes, rels, K-V on both
   Example: AllegroGraph, VertexDB, Neo4j
NOSQL data models
 Size



        Key-value stores


                     Bigtable clones


                                       Document
                                       databases


                                                   Graph databases




                                                        Complexity
NOSQL data models
   Size



          Key-value stores


                       Bigtable clones


                                         Document
                                         databases


                                                     Graph databases

                                                                (This is still billions of
 90%                                                            nodes & relationships)
  of
 use
cases




                                                          Complexity
Graph DBs
& Neo4j intro
The Graph DB model: representation
 Core abstractions:                          name = “Emil”
                                             age = 29
   Nodes                                     sex = “yes”



   Relationships between nodes
                                         1                         2
   Properties on both

                        type = KNOWS
                        time = 4 years                       3

                                                                 type = car
                                                                 vendor = “SAAB”
                                                                 model = “95 Aero”
Example: The Matrix
                                                                            name = “The Architect”
                               name = “Morpheus”
                               rank = “Captain”
name = “Thomas Anderson”
                               occupation = “Total badass”                                        42
age = 29
                                                  disclosure = public


                 KNOWS                             KNOWS                                             CODED_BY
     1                                                                           KN O
                                          7                             3            WS

                                                                                                 13
                                           S
                 KN                                       name = “Cypher”
                                          KNOW


                      OW                                  last name = “Reagan”
                           S
                                                                                              name = “Agent Smith”
                                                                        disclosure = secret   version = 1.0b
         age = 3 days                                                   age = 6 months        language = C++
                                      2

                               name = “Trinity”
Code (1): Building a node space
NeoService neo = ... // Get factory


// Create Thomas 'Neo' Anderson
Node mrAnderson = neo.createNode();
mrAnderson.setProperty( "name", "Thomas Anderson" );
mrAnderson.setProperty( "age", 29 );

// Create Morpheus
Node morpheus = neo.createNode();
morpheus.setProperty( "name", "Morpheus" );
morpheus.setProperty( "rank", "Captain" );
morpheus.setProperty( "occupation", "Total bad ass" );

// Create a relationship representing that they know each other
mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS );
// ...create Trinity, Cypher, Agent Smith, Architect similarly
Code (1): Building a node space
NeoService neo = ... // Get factory
Transaction tx = neo.beginTx();

// Create Thomas 'Neo' Anderson
Node mrAnderson = neo.createNode();
mrAnderson.setProperty( "name", "Thomas Anderson" );
mrAnderson.setProperty( "age", 29 );

// Create Morpheus
Node morpheus = neo.createNode();
morpheus.setProperty( "name", "Morpheus" );
morpheus.setProperty( "rank", "Captain" );
morpheus.setProperty( "occupation", "Total bad ass" );

// Create a relationship representing that they know each other
mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS );
// ...create Trinity, Cypher, Agent Smith, Architect similarly

tx.commit();
Code (1b): Defining RelationshipTypes
// In package org.neo4j.api.core
public interface RelationshipType
{
   String name();
}

// In package org.yourdomain.yourapp
// Example on how to roll dynamic RelationshipTypes
class MyDynamicRelType implements RelationshipType
{
   private final String name;
   MyDynamicRelType( String name ){ this.name = name; }
   public String name() { return this.name; }
}

// Example on how to kick it, static-RelationshipType-like
enum MyStaticRelTypes implements RelationshipType
{
   KNOWS,
   WORKS_FOR,
}
Whiteboard friendly



                                owns
                      Björn                 Big Car
                       build             drives


                               DayCare
The Graph DB model: traversal
 Traverser framework for                    name = “Emil”
 high-performance traversing                age = 29
                                            sex = “yes”
 across the node space
                                        1                         2



                       type = KNOWS
                       time = 4 years                       3

                                                                type = car
                                                                vendor = “SAAB”
                                                                model = “95 Aero”
Example: Mr Anderson’s friends
                                                                            name = “The Architect”
                               name = “Morpheus”
                               rank = “Captain”
name = “Thomas Anderson”
                               occupation = “Total badass”                                        42
age = 29
                                                  disclosure = public


                 KNOWS                             KNOWS                                             CODED_BY
     1                                                                           KN O
                                          7                             3            WS

                                                                                                 13
                                           S
                 KN                                       name = “Cypher”
                                          KNOW


                      OW                                  last name = “Reagan”
                           S
                                                                                              name = “Agent Smith”
                                                                        disclosure = secret   version = 1.0b
         age = 3 days                                                   age = 6 months        language = C++
                                      2

                               name = “Trinity”
Code (2): Traversing a node space
// Instantiate a traverser that returns Mr Anderson's friends
Traverser friendsTraverser = mrAnderson.traverse(
   Traverser.Order.BREADTH_FIRST,
   StopEvaluator.END_OF_GRAPH,
   ReturnableEvaluator.ALL_BUT_START_NODE,
   RelTypes.KNOWS,
   Direction.OUTGOING );

// Traverse the node space and print out the result
System.out.println( "Mr Anderson's friends:" );
for ( Node friend : friendsTraverser )
{
   System.out.printf( "At depth %d => %s%n",
       friendsTraverser.currentPosition().getDepth(),
       friend.getProperty( "name" ) );
}
name = “The Architect”
                                name = “Morpheus”
                                rank = “Captain”
 name = “Thomas Anderson”
                                occupation = “Total badass”                                         42
 age = 29
                                                   disclosure = public


                  KNOWS                             KNOWS                                              CODED_BY
      1                                                                           KN O
                                           7                             3            WS

                                                                                                   13

                                            S
                  KN                                       name = “Cypher”

                                           KNOW
                       OW                                  last name = “Reagan”
                            S
                                                                                                name = “Agent Smith”
                                                                         disclosure = secret    version = 1.0b
          age = 3 days                                                   age = 6 months         language = C++
                                       2

                                name = “Trinity”
                                                                     $ bin/start-neo-example
                                                                     Mr Anderson's friends:

                                                                     At      depth     1   =>   Morpheus
friendsTraverser = mrAnderson.traverse(
  Traverser.Order.BREADTH_FIRST,                                     At      depth     1   =>   Trinity
  StopEvaluator.END_OF_GRAPH,                                        At      depth     2   =>   Cypher
  ReturnableEvaluator.ALL_BUT_START_NODE,
  RelTypes.KNOWS,
                                                                     At      depth     3   =>   Agent Smith
  Direction.OUTGOING );                                              $
Example: Friends in love?
                                                                                  name = “The Architect”
                                     name = “Morpheus”
                                     rank = “Captain”
name = “Thomas Anderson”
                                     occupation = “Total badass”                                        42
age = 29
                                                        disclosure = public


                       KNOWS                             KNOWS                                             CODED_BY
     1                                          7                             3        KN O
                                                                                           WS

                                                                                                       13
                                                 S

                       KN
                                                KNOW


                                                                name = “Cypher”
                            OW                                  last name = “Reagan”
                                 S
                                                                                                    name = “Agent Smith”
         LO                                                                   disclosure = secret   version = 1.0b
              VE                                                              age = 6 months        language = C++
                   S
                                            2

                                     name = “Trinity”
Code (3a): Custom traverser
// Create a traverser that returns all “friends in love”
Traverser loveTraverser = mrAnderson.traverse(
   Traverser.Order.BREADTH_FIRST,
   StopEvaluator.END_OF_GRAPH,
   new ReturnableEvaluator()
   {
       public boolean isReturnableNode( TraversalPosition pos )
       {
          return pos.currentNode().hasRelationship(
              RelTypes.LOVES, Direction.OUTGOING );
       }
   },
   RelTypes.KNOWS,
   Direction.OUTGOING );
Code (3a): Custom traverser
// Traverse the node space and print out the result
System.out.println( "Who’s a lover?" );
for ( Node person : loveTraverser )
{
   System.out.printf( "At depth %d => %s%n",
       loveTraverser.currentPosition().getDepth(),
       person.getProperty( "name" ) );
}
name = “The Architect”
                                   name = “Morpheus”
                                   rank = “Captain”
 name = “Thomas Anderson”
                                   occupation = “Total badass”                                        42
 age = 29
                                                      disclosure = public


                     KNOWS                             KNOWS                         KN O                CODED_BY
      1                                       7                             3            WS

                                                                                                     13

                                               S
                     KN

                                              KNOW
                                                              name = “Cypher”
                          OW                                  last name = “Reagan”
                               S
                                                                                                  name = “Agent Smith”
          LO                                                                disclosure = secret   version = 1.0b
            VE                                                              age = 6 months        language = C++
                 S                        2

                                   name = “Trinity”
                                                                       $ bin/start-neo-example
new ReturnableEvaluator()
                                                                       Who’s a lover?
{
  public boolean isReturnableNode(
    TraversalPosition pos)
                                                                       At depth 1 => Trinity
  {                                                                    $
    return pos.currentNode().
      hasRelationship( RelTypes.LOVES,
         Direction.OUTGOING );
  }
},
Bonus code: domain model
    How do you implement your domain model?
    Use the delegator pattern, i.e. every domain entity
    wraps a Neo4j primitive:
// In package org.yourdomain.yourapp
class PersonImpl implements Person
{
   private final Node underlyingNode;
   PersonImpl( Node node ){ this.underlyingNode = node; }

    public String getName()
    {
       return this.underlyingNode.getProperty( "name" );
    }
    public void setName( String name )
    {
       this.underlyingNode.setProperty( "name", name );
    }
}
Domain layer frameworks
 Qi4j (www.qi4j.org)
   Framework for doing DDD in pure Java5
   Defines Entities / Associations / Properties
     Sound familiar? Nodes / Rel’s / Properties!
   Neo4j is an “EntityStore” backend

 NeoWeaver (http://components.neo4j.org/neo-weaver)
   Weaves Neo4j-backed persistence into domain
   objects in runtime (dynamic proxy / cglib based)
   Veeeery alpha
Neo4j system characteristics
 Disk-based
   Native graph storage engine with custom binary
   on-disk format
 Transactional
   JTA/JTS, XA, 2PC, Tx recovery, deadlock
   detection, MVCC, etc
 Scales up (what's the x and the y?)
   Several billions of nodes/rels/props on single JVM
 Robust
   6+ years in 24/7 production
Social network pathExists()
          12
                               ~1k persons
                           3
7         1                    Avg 50 friends per
                               person
                               pathExists(a, b) limit
    36
         41           77       depth 4
                 5
                               Two backends
                               Eliminate disk IO so
                               warm up caches
Social network pathExists()

                 2
                Emil
         1                                    5
                                      7
        Mike                                Kevin
                           3        John
                         Marcus
                   9                4
                 Bruce            Leigh

                                  # persons query time
Relational database                   1 000 2 000 ms
Graph database (Neo4j)                1 000      2 ms
Graph database (Neo4j)            1 000 000      2 ms
Pros & Cons compared to RDBMS
+ No O/R impedance mismatch (whiteboard friendly)
+ Can easily evolve schemas
+ Can represent semi-structured info
+ Can represent graphs/networks (with performance)


- Lacks in tool and framework support
- Few other implementations => potential lock in
- No support for ad-hoc queries
+
More consequences
 Ability to capture semi-structured information
   => allowing individualization of content
 No predefined schema
   => easier to evolve model
   => can capture ad-hoc relationships
 Can capture non-normative relations
   => easy to model specific links to specific sets
 All state is kept in transactional memory
   => improves application concurrency
The Neo4j ecosystem
 Neo4j is an embedded database
   Tiny teeny lil jar file
 Component ecosystem
   index-util
   neo-meta
   neo-utils
   pattern-match
   sparql-engine
   ...
 See http://components.neo4j.org
Language bindings
 Neo4j.py – bindings for Jython and CPython
   http://components.neo4j.org/neo4j.py

 Neo4jrb – bindings for JRuby (incl RESTful API)
   http://wiki.neo4j.org/content/Ruby

 Clojure
   http://wiki.neo4j.org/content/Clojure

 Scala (incl RESTful API)
   http://wiki.neo4j.org/content/Scala

 … .NET? Erlang?
Grails Neoclipse screendump
Scale out – replication
 Rolling out Neo4j HA before end-of-year
   Side note: ppl roll it today w/ REST frontends & onlinebackup

 Master-slave replication, 1st configuration
   MySQL style... ish
   Except all instances can write, synchronously
   between writing slave & master (strong consistency)
   Updates are asynchronously propagated to the
   other slaves (eventual consistency)
 This can handle billions of entities...
 … but not 100B
Scale out – partitioning
 Sharding possible today
   … but you have to do manual work
   … just as with MySQL
   Great option: shard on top of resilient, scalable
   OSS app server             , see: www.codecauldron.org
 Transparent partitioning? Neo4j 2.0
   100B? Easy to say. Sliiiiightly harder to do.
   Fundamentals: BASE & eventual consistency
   Generic clustering algorithm as base case, but
   give lots of knobs for developers
How ego are you? (aka other impls?)
 Franz’ AllegroGraph      (http://agraph.franz.com)

   Proprietary, Lisp, RDF-oriented but real graphdb
 FreeBase graphd     (http://bit.ly/13VITB)

   In-house at Metaweb
 Kloudshare   (http://kloudshare.com)

   Graph database in the cloud, still stealth mode
 Google Pregel   (http://bit.ly/dP9IP)

   We are oh-so-secret
 Some academic papers from ~10 years ago
   G = {V, E}   #FAIL
Conclusion
 Graphs && Neo4j => teh awesome!
 Available NOW under AGPLv3 / commercial license
   AGPLv3: “if you’re open source, we’re open source”
   If you have proprietary software? Must buy a
   commercial license
   But up to 1M primitives it’s free for all uses!
 Download
   http://neo4j.org
 Feedback
   http://lists.neo4j.org
Party pooper slides
Poop 1
 Key-value stores?
   => the awesome
   … if you have 1000s of BILLIONS records OR you
   don't care about programmer productivity

 What if you had no variables at all in your programs
 except a single globally accessible hashtable?
 Would your software be maintainable?
Poop 2
 In a not-suck architecture...

 … the only thing that makes sense is to have an
 embedded database.
Poop 3
 Exposing your data model on the wire is bad.
 Period.

 Adding a couple of buzzwords doesn't make it less
 bad.

 If it was bad with SQL-over-sockets (hint: it was)
 then – surprise! – it's still bad even tho you use
 Hype-compliant(tm) JSON-over-REST.

 We don't want to couple everything to a specific
 data model again!
Poop 4
 In-memory database

 What the hell?
   That's an oxymoron!
   Up next: ascii-only JPEG
   Up next: loopback-only web server

 If you're not durable, you're a cache!
 If you happen to asynchronously spill over to disk,
 you're a cache that asynchronously spills over to
 disk.
Ait
so, srsly?
Looking ahead: polyglot persistence


      SQL     &&      NoSQL
Questions?




             Image credit: lost again! Sorry :(
http://neotechnology.com

A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)

  • 1.
    Neo4j a NOSQLoverview and the benefits of graph databases #neo4j Emil Eifrem @emileifrem CEO, Neo Technology [email protected]
  • 2.
    What's the plan? Why now? – Four trends NoSQL NOSQL overview Graph databases && Neo4j Conclusions Food
  • 3.
    Trend 1: data setsize 40 2007 Source: IDC 2007
  • 4.
    988 Trend 1: data setsize 40 2007 2010 Source: IDC 2007
  • 5.
    Trend 2: connectedness Giant Global Information connectivity Graph (GGG) Ontologies RDF Folksonomies Tagging User- Wikis generated content Blogs RSS Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020
  • 6.
    Trend 3: semi-structure Individualization of content! In the salary lists of the 1970s, all elements had exactly one job In the salary lists of the 2000s, we need 5 job columns! Or 8? Or 15? Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”)
  • 7.
    Aside: RDBMS performance Relational database Salary List Performance Majority of Webapps Social network Semantic } Trading custom Data complexity
  • 8.
    Trend 4: architecture 1990s: Database as integration hub
  • 9.
    Trend 4: architecture 2000s: (Slowly towards...) Decoupled services with own backend
  • 10.
    Why NoSQL 2009? Trend 1: Size. Trend 2: Connectivity. Trend 3: Semi-structure. Trend 4: Architecture.
  • 11.
  • 12.
    First off: thedamn name NoSQL is NOT “Never SQL” NoSQL is NOT “No To SQL” NoSQL is NOT “WE HATE CHRIS' DOG”
  • 13.
    NOSQL is simply Not Only SQL!
  • 14.
    Four (emerging) NOSQLcategories Key-value stores Based on DHTs / Amazon's Dynamo paper Data model: (global) collection of K-V pairs Example: Dynomite, Voldemort, Tokyo BigTable clones Based on Google's BigTable paper Data model: big table, column families Example: Hbase, Hypertable
  • 15.
    Four (emerging) NOSQLcategories Document databases Inspired by Lotus Notes Data model: collections of K-V collections Example: CouchDB, MongoDB Graph databases Inspired by Euler & graph theory Data model: nodes, rels, K-V on both Example: AllegroGraph, VertexDB, Neo4j
  • 16.
    NOSQL data models Size Key-value stores Bigtable clones Document databases Graph databases Complexity
  • 17.
    NOSQL data models Size Key-value stores Bigtable clones Document databases Graph databases (This is still billions of 90% nodes & relationships) of use cases Complexity
  • 18.
  • 19.
    The Graph DBmodel: representation Core abstractions: name = “Emil” age = 29 Nodes sex = “yes” Relationships between nodes 1 2 Properties on both type = KNOWS time = 4 years 3 type = car vendor = “SAAB” model = “95 Aero”
  • 20.
    Example: The Matrix name = “The Architect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS CODED_BY 1 KN O 7 3 WS 13 S KN name = “Cypher” KNOW OW last name = “Reagan” S name = “Agent Smith” disclosure = secret version = 1.0b age = 3 days age = 6 months language = C++ 2 name = “Trinity”
  • 21.
    Code (1): Buildinga node space NeoService neo = ... // Get factory // Create Thomas 'Neo' Anderson Node mrAnderson = neo.createNode(); mrAnderson.setProperty( "name", "Thomas Anderson" ); mrAnderson.setProperty( "age", 29 ); // Create Morpheus Node morpheus = neo.createNode(); morpheus.setProperty( "name", "Morpheus" ); morpheus.setProperty( "rank", "Captain" ); morpheus.setProperty( "occupation", "Total bad ass" ); // Create a relationship representing that they know each other mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS ); // ...create Trinity, Cypher, Agent Smith, Architect similarly
  • 22.
    Code (1): Buildinga node space NeoService neo = ... // Get factory Transaction tx = neo.beginTx(); // Create Thomas 'Neo' Anderson Node mrAnderson = neo.createNode(); mrAnderson.setProperty( "name", "Thomas Anderson" ); mrAnderson.setProperty( "age", 29 ); // Create Morpheus Node morpheus = neo.createNode(); morpheus.setProperty( "name", "Morpheus" ); morpheus.setProperty( "rank", "Captain" ); morpheus.setProperty( "occupation", "Total bad ass" ); // Create a relationship representing that they know each other mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS ); // ...create Trinity, Cypher, Agent Smith, Architect similarly tx.commit();
  • 23.
    Code (1b): DefiningRelationshipTypes // In package org.neo4j.api.core public interface RelationshipType { String name(); } // In package org.yourdomain.yourapp // Example on how to roll dynamic RelationshipTypes class MyDynamicRelType implements RelationshipType { private final String name; MyDynamicRelType( String name ){ this.name = name; } public String name() { return this.name; } } // Example on how to kick it, static-RelationshipType-like enum MyStaticRelTypes implements RelationshipType { KNOWS, WORKS_FOR, }
  • 24.
    Whiteboard friendly owns Björn Big Car build drives DayCare
  • 26.
    The Graph DBmodel: traversal Traverser framework for name = “Emil” high-performance traversing age = 29 sex = “yes” across the node space 1 2 type = KNOWS time = 4 years 3 type = car vendor = “SAAB” model = “95 Aero”
  • 27.
    Example: Mr Anderson’sfriends name = “The Architect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS CODED_BY 1 KN O 7 3 WS 13 S KN name = “Cypher” KNOW OW last name = “Reagan” S name = “Agent Smith” disclosure = secret version = 1.0b age = 3 days age = 6 months language = C++ 2 name = “Trinity”
  • 28.
    Code (2): Traversinga node space // Instantiate a traverser that returns Mr Anderson's friends Traverser friendsTraverser = mrAnderson.traverse( Traverser.Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL_BUT_START_NODE, RelTypes.KNOWS, Direction.OUTGOING ); // Traverse the node space and print out the result System.out.println( "Mr Anderson's friends:" ); for ( Node friend : friendsTraverser ) { System.out.printf( "At depth %d => %s%n", friendsTraverser.currentPosition().getDepth(), friend.getProperty( "name" ) ); }
  • 29.
    name = “TheArchitect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS CODED_BY 1 KN O 7 3 WS 13 S KN name = “Cypher” KNOW OW last name = “Reagan” S name = “Agent Smith” disclosure = secret version = 1.0b age = 3 days age = 6 months language = C++ 2 name = “Trinity” $ bin/start-neo-example Mr Anderson's friends: At depth 1 => Morpheus friendsTraverser = mrAnderson.traverse( Traverser.Order.BREADTH_FIRST, At depth 1 => Trinity StopEvaluator.END_OF_GRAPH, At depth 2 => Cypher ReturnableEvaluator.ALL_BUT_START_NODE, RelTypes.KNOWS, At depth 3 => Agent Smith Direction.OUTGOING ); $
  • 30.
    Example: Friends inlove? name = “The Architect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS CODED_BY 1 7 3 KN O WS 13 S KN KNOW name = “Cypher” OW last name = “Reagan” S name = “Agent Smith” LO disclosure = secret version = 1.0b VE age = 6 months language = C++ S 2 name = “Trinity”
  • 31.
    Code (3a): Customtraverser // Create a traverser that returns all “friends in love” Traverser loveTraverser = mrAnderson.traverse( Traverser.Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, new ReturnableEvaluator() { public boolean isReturnableNode( TraversalPosition pos ) { return pos.currentNode().hasRelationship( RelTypes.LOVES, Direction.OUTGOING ); } }, RelTypes.KNOWS, Direction.OUTGOING );
  • 32.
    Code (3a): Customtraverser // Traverse the node space and print out the result System.out.println( "Who’s a lover?" ); for ( Node person : loveTraverser ) { System.out.printf( "At depth %d => %s%n", loveTraverser.currentPosition().getDepth(), person.getProperty( "name" ) ); }
  • 33.
    name = “TheArchitect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS KN O CODED_BY 1 7 3 WS 13 S KN KNOW name = “Cypher” OW last name = “Reagan” S name = “Agent Smith” LO disclosure = secret version = 1.0b VE age = 6 months language = C++ S 2 name = “Trinity” $ bin/start-neo-example new ReturnableEvaluator() Who’s a lover? { public boolean isReturnableNode( TraversalPosition pos) At depth 1 => Trinity { $ return pos.currentNode(). hasRelationship( RelTypes.LOVES, Direction.OUTGOING ); } },
  • 34.
    Bonus code: domainmodel How do you implement your domain model? Use the delegator pattern, i.e. every domain entity wraps a Neo4j primitive: // In package org.yourdomain.yourapp class PersonImpl implements Person { private final Node underlyingNode; PersonImpl( Node node ){ this.underlyingNode = node; } public String getName() { return this.underlyingNode.getProperty( "name" ); } public void setName( String name ) { this.underlyingNode.setProperty( "name", name ); } }
  • 35.
    Domain layer frameworks Qi4j (www.qi4j.org) Framework for doing DDD in pure Java5 Defines Entities / Associations / Properties Sound familiar? Nodes / Rel’s / Properties! Neo4j is an “EntityStore” backend NeoWeaver (http://components.neo4j.org/neo-weaver) Weaves Neo4j-backed persistence into domain objects in runtime (dynamic proxy / cglib based) Veeeery alpha
  • 36.
    Neo4j system characteristics Disk-based Native graph storage engine with custom binary on-disk format Transactional JTA/JTS, XA, 2PC, Tx recovery, deadlock detection, MVCC, etc Scales up (what's the x and the y?) Several billions of nodes/rels/props on single JVM Robust 6+ years in 24/7 production
  • 37.
    Social network pathExists() 12 ~1k persons 3 7 1 Avg 50 friends per person pathExists(a, b) limit 36 41 77 depth 4 5 Two backends Eliminate disk IO so warm up caches
  • 38.
    Social network pathExists() 2 Emil 1 5 7 Mike Kevin 3 John Marcus 9 4 Bruce Leigh # persons query time Relational database 1 000 2 000 ms Graph database (Neo4j) 1 000 2 ms Graph database (Neo4j) 1 000 000 2 ms
  • 41.
    Pros & Conscompared to RDBMS + No O/R impedance mismatch (whiteboard friendly) + Can easily evolve schemas + Can represent semi-structured info + Can represent graphs/networks (with performance) - Lacks in tool and framework support - Few other implementations => potential lock in - No support for ad-hoc queries +
  • 42.
    More consequences Abilityto capture semi-structured information => allowing individualization of content No predefined schema => easier to evolve model => can capture ad-hoc relationships Can capture non-normative relations => easy to model specific links to specific sets All state is kept in transactional memory => improves application concurrency
  • 43.
    The Neo4j ecosystem Neo4j is an embedded database Tiny teeny lil jar file Component ecosystem index-util neo-meta neo-utils pattern-match sparql-engine ... See http://components.neo4j.org
  • 44.
    Language bindings Neo4j.py– bindings for Jython and CPython http://components.neo4j.org/neo4j.py Neo4jrb – bindings for JRuby (incl RESTful API) http://wiki.neo4j.org/content/Ruby Clojure http://wiki.neo4j.org/content/Clojure Scala (incl RESTful API) http://wiki.neo4j.org/content/Scala … .NET? Erlang?
  • 47.
  • 48.
    Scale out –replication Rolling out Neo4j HA before end-of-year Side note: ppl roll it today w/ REST frontends & onlinebackup Master-slave replication, 1st configuration MySQL style... ish Except all instances can write, synchronously between writing slave & master (strong consistency) Updates are asynchronously propagated to the other slaves (eventual consistency) This can handle billions of entities... … but not 100B
  • 49.
    Scale out –partitioning Sharding possible today … but you have to do manual work … just as with MySQL Great option: shard on top of resilient, scalable OSS app server , see: www.codecauldron.org Transparent partitioning? Neo4j 2.0 100B? Easy to say. Sliiiiightly harder to do. Fundamentals: BASE & eventual consistency Generic clustering algorithm as base case, but give lots of knobs for developers
  • 50.
    How ego areyou? (aka other impls?) Franz’ AllegroGraph (http://agraph.franz.com) Proprietary, Lisp, RDF-oriented but real graphdb FreeBase graphd (http://bit.ly/13VITB) In-house at Metaweb Kloudshare (http://kloudshare.com) Graph database in the cloud, still stealth mode Google Pregel (http://bit.ly/dP9IP) We are oh-so-secret Some academic papers from ~10 years ago G = {V, E} #FAIL
  • 51.
    Conclusion Graphs &&Neo4j => teh awesome! Available NOW under AGPLv3 / commercial license AGPLv3: “if you’re open source, we’re open source” If you have proprietary software? Must buy a commercial license But up to 1M primitives it’s free for all uses! Download http://neo4j.org Feedback http://lists.neo4j.org
  • 52.
  • 53.
    Poop 1 Key-valuestores? => the awesome … if you have 1000s of BILLIONS records OR you don't care about programmer productivity What if you had no variables at all in your programs except a single globally accessible hashtable? Would your software be maintainable?
  • 54.
    Poop 2 Ina not-suck architecture... … the only thing that makes sense is to have an embedded database.
  • 55.
    Poop 3 Exposingyour data model on the wire is bad. Period. Adding a couple of buzzwords doesn't make it less bad. If it was bad with SQL-over-sockets (hint: it was) then – surprise! – it's still bad even tho you use Hype-compliant(tm) JSON-over-REST. We don't want to couple everything to a specific data model again!
  • 56.
    Poop 4 In-memorydatabase What the hell? That's an oxymoron! Up next: ascii-only JPEG Up next: loopback-only web server If you're not durable, you're a cache! If you happen to asynchronously spill over to disk, you're a cache that asynchronously spills over to disk.
  • 57.
  • 58.
    Looking ahead: polyglotpersistence SQL && NoSQL
  • 60.
    Questions? Image credit: lost again! Sorry :(
  • 61.