Warning
|
Deprecated: Neo4j-Gremlin is not compatible with versions of Neo4j beyond 3.4 (Reached End of Life March 31, 2020). For this reason, use of Neo4j-Gremlin is not recommended for production environments. Neo4j-Gremlin is expected to remain compatible with upcoming releases of TinkerPop, however long term support is not guaranteed. Neo4j-Gremlin may be dropped from future versions of TinkerPop if compatibility cannot reasonably be maintained. Alternative TinkerPop enabled graph providers can be found on the TinkerPop site. |
Warning
|
Neo4j-Gremlin can work with JDK17, but requires the use of the --add-opens flag to be provided to the JVM
as follows: --add-opens=java.base/sun.nio.ch=ALL-UNNAMED .
|
<dependency>
<groupId>org.apache.tinkerpop</groupId>
<artifactId>neo4j-gremlin</artifactId>
<version>x.y.z</version>
</dependency>
<!-- neo4j-tinkerpop-api-impl is NOT Apache 2 licensed - more information below -->
<!-- supports Neo4j 3.4.11 -->
<dependency>
<groupId>org.neo4j</groupId>
<artifactId>neo4j-tinkerpop-api-impl</artifactId>
<version>0.9-3.4.0</version>
</dependency>
Neo4j, Inc. are the developers of the OLTP-based Neo4j graph database.
Warning
|
Unless under a commercial agreement with Neo4j, Inc., Neo4j is licensed
AGPL. The neo4j-gremlin module is licensed Apache2
because it only references the Apache2-licensed Neo4j API (not its implementation). Note that neither the
Gremlin Console nor Gremlin Server distribute with the Neo4j implementation
binaries. To access the binaries, use the :install command to download binaries from
Maven Central Repository.
|
Important
|
When connecting to existing Neo4j databases, ensure that this database is compatible with the version of
Neo4j that TinkerPop currently supports in the neo4j-tinkerpop-api-impl .
|
Tip
|
For configuring Grape, the dependency resolver of Groovy, please refer to the Gremlin Applications section. |
gremlin> :install org.apache.tinkerpop neo4j-gremlin x.y.z
==>Loaded: [org.apache.tinkerpop, neo4j-gremlin, x.y.z] - restart the console to use [tinkerpop.neo4j]
gremlin> :q
...
gremlin> :plugin use tinkerpop.neo4j
==>tinkerpop.neo4j activated
gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]
Tip
|
To host Neo4j in Gremlin Server, the dependencies must first be "installed" or otherwise
copied to the Gremlin Server path. The automated method for doing this would be to execute
bin/gremlin-server.sh install org.apache.tinkerpop neo4j-gremlin x.y.z . Once installed, the Gremlin Server
configuration file must be edited to include the Neo4jGremlinPlugin as shown in conf/gremlin-server-neo4j.yaml .
|
Neo4j 2.x indices leverage vertex labels to partition the index space. TinkerPop does not provide method interfaces for defining schemas/indices for the underlying graph system. Thus, in order to create indices, it is important to call the Neo4j API directly.
Note
|
Neo4jGraphStep will attempt to discern which indices to use when executing a traversal of the form g.V().has() .
|
The Gremlin-Console session below demonstrates Neo4j indices. For more information, please refer to the Neo4j documentation:
gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
==>neo4jgraph[community single [/tmp/neo4j]]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[neo4jgraph[community single [/tmp/neo4j]], standard]
gremlin> graph.cypher("CREATE INDEX ON :person(name)")
gremlin> graph.tx().commit() //// (1)
==>null
gremlin> g.addV('person').property('name','marko')
==>v[0]
gremlin> g.addV('dog').property('name','puppy')
==>v[1]
gremlin> g.V().hasLabel('person').has('name','marko').values('name')
==>marko
gremlin> graph.close()
==>null
-
Schema mutations must happen in a different transaction than graph mutations
Below demonstrates the runtime benefits of indices and demonstrates how if there is no defined index (only vertex labels), a linear scan of the vertex-label partition is still faster than a linear scan of all vertices.
gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
==>neo4jgraph[community single [/tmp/neo4j]]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[neo4jgraph[community single [/tmp/neo4j]], standard]
gremlin> g.io('data/grateful-dead.xml').read().iterate()
gremlin> g.tx().commit()
==>null
gremlin> clock(1000) {g.V().hasLabel('artist').has('name','Garcia').iterate()} //// (1)
==>0.35031228
gremlin> graph.cypher("CREATE INDEX ON :artist(name)") //// (2)
gremlin> g.tx().commit()
==>null
gremlin> Thread.sleep(5000) //// (3)
==>null
gremlin> clock(1000) {g.V().hasLabel('artist').has('name','Garcia').iterate()} //// (4)
==>0.061846079
gremlin> clock(1000) {g.V().has('name','Garcia').iterate()} //// (5)
==>0.6680353569999999
gremlin> graph.cypher("DROP INDEX ON :artist(name)") //// (6)
gremlin> g.tx().commit()
==>null
gremlin> graph.close()
==>null
-
Find all artists whose name is Garcia which does a linear scan of the artist vertex-label partition.
-
Create an index for all artist vertices on their name property.
-
Neo4j indices are eventually consistent so this stalls to give the index time to populate itself.
-
Find all artists whose name is Garcia which uses the pre-defined schema index.
-
Find all vertices whose name is Garcia which requires a linear scan of all the data in the graph.
-
Drop the created index.
NeoTechnology are the creators of the graph pattern-match query language Cypher.
It is possible to leverage Cypher from within Gremlin by using the Neo4jGraph.cypher()
graph traversal method.
gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
==>neo4jgraph[community single [/tmp/neo4j]]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[neo4jgraph[community single [/tmp/neo4j]], standard]
gremlin> g.io('data/tinkerpop-modern.kryo').read().iterate()
gremlin> graph.cypher('MATCH (a {name:"marko"}) RETURN a')
==>[a:v[0]]
gremlin> graph.cypher('MATCH (a {name:"marko"}) RETURN a').select('a').out('knows').values('name')
==>josh
==>vadas
gremlin> graph.close()
==>null
Thus, like match()
-step in Gremlin, it is possible to do a declarative pattern match and then move
back into imperative Gremlin.
Tip
|
For those developers using Gremlin Server against Neo4j, it is possible to do Cypher queries
by simply placing the Cypher string in graph.cypher(…) before submission to the server.
|
TinkerPop requires every Element
to have a single, immutable string label (i.e. a Vertex
, Edge
, and
VertexProperty
). In Neo4j, a Node
(vertex) can have an
arbitrary number of labels while a Relationship
(edge) can have one and only one. Furthermore, in Neo4j, Node
labels are mutable while Relationship
labels are
not. In order to handle this mismatch, three Neo4jVertex
specific methods exist in Neo4j-Gremlin.
public Set<String> labels() // get all the labels of the vertex
public void addLabel(String label) // add a label to the vertex
public void removeLabel(String label) // remove a label from the vertex
An example use case is presented below.
gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
==>neo4jgraph[community single [/tmp/neo4j]]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[neo4jgraph[community single [/tmp/neo4j]], standard]
gremlin> vertex = (Neo4jVertex) g.addV('human::animal').next() //// (1)
==>v[0]
gremlin> vertex.label() //// (2)
==>animal::human
gremlin> vertex.labels() //// (3)
==>animal
==>human
gremlin> vertex.addLabel('organism') //// (4)
==>null
gremlin> vertex.label()
==>animal::human::organism
gremlin> vertex.removeLabel('human') //// (5)
==>null
gremlin> vertex.labels()
==>animal
==>organism
gremlin> vertex.addLabel('organism') //// (6)
==>null
gremlin> vertex.labels()
==>animal
==>organism
gremlin> vertex.removeLabel('human') //// (7)
==>null
gremlin> vertex.label()
==>animal::organism
gremlin> g.V().has(label,'organism') //// (8)
gremlin> g.V().has(label,of('organism')) //// (9)
==>v[0]
gremlin> g.V().has(label,of('organism')).has(label,of('animal'))
==>v[0]
gremlin> g.V().has(label,of('organism').and(of('animal')))
==>v[0]
gremlin> graph.close()
==>null
-
Typecasting to a
Neo4jVertex
is only required in Java. -
The standard
Vertex.label()
method returns all the labels in alphabetical order concatenated using::
. -
Neo4jVertex.labels()
method returns the individual labels as a set. -
Neo4jVertex.addLabel()
method adds a single label. -
Neo4jVertex.removeLabel()
method removes a single label. -
Labels are unique and thus duplicate labels don’t exist.
-
If a label that does not exist is removed, nothing happens.
-
P.eq()
does a full string match and should only be used if multi-labels are not leveraged. -
LabelP.of()
is specific toNeo4jGraph
and used for multi-label matching.
Important
|
LabelP.of() is only required if multi-labels are leveraged. LabelP.of() is used when
filtering/looking-up vertices by their label(s) as the standard P.eq() does a direct match on the :: -representation
of vertex.label()
|
The previous examples showed how to create a Neo4jGraph
with the default configuration, but Neo4j has many other
options to initialize it that are native to Neo4j. In order to expose those, Neo4jGraph
has an open(Configuration)
method which takes a standard Apache Configuration object. The same can be said of the standard method for creating
Graph
instances with GraphFactory
. Each configuration key that Neo4j has must simply be prefixed with
gremlin.neo4j.conf.
and the suffix configuration key will be passed through to Neo4j.
Note
|
Gremlin Server uses GraphFactory to instantiate the Graph instances it manages, so the example below is also
relevant for that purpose as well.
|
For example, a standard configuration file called neo4j.properties
that sets the Neo4j
dbms.index_sampling.background_enabled
setting might look like:
gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph
gremlin.neo4j.directory=/tmp/neo4j
gremlin.neo4j.conf.dbms.index_sampling.background_enabled=true
which can then be used as follows:
gremlin> graph = GraphFactory.open('neo4j.properties')
==>neo4jgraph[community single [/tmp/neo4j]]
gremlin> g = traversal().with(graph)
==>graphtraversalsource[neo4jgraph[community single [/tmp/neo4j]], standard]
Having this ability to set standard Neo4j configurations makes it possible to better control the initialization of Neo4j itself and provides the ability to enable certain features that would not otherwise be accessible.
While Neo4jGraph
enables Gremlin based queries, users may find it helpful to also be able to connect to that graph
with native Neo4j drivers and other tools from that space. It is possible to enable the
Bolt Protocol as a way to do this:
gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph
gremlin.neo4j.directory=/tmp/neo4j
gremlin.neo4j.conf.dbms.connector.0.type=BOLT
gremlin.neo4j.conf.dbms.connector.0.enabled=true
gremlin.neo4j.conf.dbms.connector.0.address=localhost:7687
This configuration is especially relevant to Gremlin Server where one might want to connect to the same graph instance with both Gremlin and Cypher.
gremlin> :install org.neo4j.driver neo4j-java-driver 1.7.2
==>Loaded: [org.neo4j.driver, neo4j-java-driver, 1.7.2]
... // restart Gremlin Console
gremlin> import org.neo4j.driver.v1.*
==>org.apache.tinkerpop.gremlin.structure.*, org.apache.tinkerpop.gremlin.structure.util.*, ... org.neo4j.driver.v1.*
gremlin> driver = GraphDatabase.driver( "bolt://localhost:7687", AuthTokens.basic("neo4j", "neo4j"))
Oct 28, 2019 3:28:20 PM org.neo4j.driver.internal.logging.JULogger info
INFO: Direct driver instance 1385140107 created for server address localhost:7687
==>org.neo4j.driver.internal.InternalDriver@528f8f8b
gremlin> session = driver.session()
==>org.neo4j.driver.internal.NetworkSession@f3fcd59
gremlin> session.run( "CREATE (a:person {name: {name}, age: {age}})",
......1> Values.parameters("name", "stephen", "age", 29))
gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Configured localhost/127.0.0.1:8182
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182] - type ':remote console' to return to local mode
gremlin> g.V().elementMap()
==>{id=0, label=person, name=stephen, age=29}
TinkerPop supports running Neo4j with its fault tolerant master-slave
replication configuration, referred to as its
High Availability (HA) cluster. From the
TinkerPop perspective, configuring for HA is not that different than configuring for embedded mode as shown above. The
main difference is the usage of HA configuration options that enable the cluster. Once connected to a cluster, usage
from the TinkerPop perspective is largely the same.
In configuring for HA the most important thing to realize is that all Neo4j HA settings are simply passed through the
TinkerPop configuration settings given to the GraphFactory.open()
or Neo4j.open()
methods. For example, to
provide the all-important ha.server_id
configuration option through TinkerPop, simply prefix that key with the
TinkerPop Neo4j key of gremlin.neo4j.conf
.
The following properties demonstrates one of the three configuration files required to setup a simple three node HA cluster on the same machine instance:
gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph
gremlin.neo4j.directory=/tmp/neo4j.server1
gremlin.neo4j.conf.ha.server_id=1
gremlin.neo4j.conf.ha.initial_hosts=localhost:5001\,localhost:5002\,localhost:5003
gremlin.neo4j.conf.ha.host.coordination=localhost:5001
gremlin.neo4j.conf.ha.host.data=localhost:6001
Assuming the intent is to configure this cluster completely within TinkerPop (perhaps within three separate Gremlin Server instances), the other two configuration files will be quite similar. The second will be:
gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph
gremlin.neo4j.directory=/tmp/neo4j.server2
gremlin.neo4j.conf.ha.server_id=2
gremlin.neo4j.conf.ha.initial_hosts=localhost:5001\,localhost:5002\,localhost:5003
gremlin.neo4j.conf.ha.host.coordination=localhost:5002
gremlin.neo4j.conf.ha.host.data=localhost:6002
and the third will be:
gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph
gremlin.neo4j.directory=/tmp/neo4j.server3
gremlin.neo4j.conf.ha.server_id=3
gremlin.neo4j.conf.ha.initial_hosts=localhost:5001\,localhost:5002\,localhost:5003
gremlin.neo4j.conf.ha.host.coordination=localhost:5003
gremlin.neo4j.conf.ha.host.data=localhost:6003
Important
|
The backslashes in the values provided to gremlin.neo4j.conf.ha.initial_hosts prevent that configuration
setting as being interpreted as a List .
|
Create three separate Gremlin Server configuration files and point each at one of these Neo4j files. Since these Gremlin
Server instances will be running on the same machine, ensure that each Gremlin Server instance has a unique port
setting in that Gremlin Server configuration file. Start each Gremlin Server instance to bring the HA cluster online.
Note
|
Neo4jGraph instances will block until all nodes join the cluster.
|
Neither Gremlin Server nor Neo4j will share transactions across the cluster. Be sure to either use Gremlin Server managed transactions or, if using a session without that option, ensure that all requests are being routed to the same server.
This example discussed use of Gremlin Server to demonstrate the HA configuration, but it is also easy to setup with
three Gremlin Console instances. Simply start three Gremlin Console instances and use GraphFactory
to read those
configuration files to form the cluster. Furthermore, keep in mind that it is possible to have a Gremlin Console join
a cluster handled by two Gremlin Servers or Neo4j Enterprise. The only limits as to how the configuration can be
utilized are prescribed by Neo4j itself. Please refer to their
documentation for more information on how
this feature works.