Skip to content

Latest commit

 

History

History
230 lines (186 loc) · 11.8 KB

the-graph.asciidoc

File metadata and controls

230 lines (186 loc) · 11.8 KB

The Graph

gremlin standing

The Introduction discussed the diversity of TinkerPop-enabled graphs, with special attention paid to the different connection models, and how TinkerPop makes it possible to bridge that diversity in an agnostic manner. This particular section deals with elements of the Graph API which was noted as an API to avoid when trying to build an agnostic system. The Graph API refers to the core elements of what composes the structure of a graph within the Gremlin Traversal Machine (GTM), such as the Graph, Vertex and Edge Java interfaces.

To maintain the most portable code, users should only reference these interfaces. To "reference", simply means to utilize it as a pointer. For Graph, that means holding a pointer to the location of graph data and then using it to spawn GraphTraversalSource instances so as to write Gremlin:

graph = TinkerGraph.open()
g = traversal().with(graph)
g.addV('person')

In the above example, "graph" is the Graph interface produced by calling open() on TinkerGraph which creates the instance. Note that while the end intent of the code is to create a "person" vertex, it does not use the APIs on Graph to do that - e.g. graph.addVertex(T.label,'person').

Even if the developer desired to use the graph.addVertex() method there are only a handful of scenarios where it is possible:

  • The application is being developed on the JVM and the developer is using embedded mode

  • The architecture includes Gremlin Server and the user is sending Gremlin scripts to the server

  • The graph system chosen is a Remote Gremlin Provider and they expose the Graph API via scripts

Note that Gremlin Language Variants force developers to use the Graph API by reference. There is no addVertex() method available to GLVs on their respective Graph instances, nor are their graph elements filled with data at the call of properties(). Developing applications to meet this lowest common denominator in API usage will go a long way to making that application portable across TinkerPop-enabled systems.

When considering the remaining sub-sections that follow, recall that they are all generally bound to the Graph API. They are described here for reference and in some sense backward compatibility with older recommended models of development. In the future, the contents of this section will become less and less relevant.

Features

A Feature implementation describes the capabilities of a Graph instance. This interface is implemented by graph system providers for two purposes:

  1. It tells users the capabilities of their Graph instance.

  2. It allows the features they do comply with to be tested against the Gremlin Test Suite - tests that do not comply are "ignored").

The following example in the Gremlin Console shows how to print all the features of a Graph:

graph = TinkerGraph.open()
graph.features()

A common pattern for using features is to check their support prior to performing an operation:

graph.features().graph().supportsTransactions()
graph.features().graph().supportsTransactions() ? g.tx().commit() : "no tx"
Tip
To ensure provider agnostic code, always check feature support prior to usage of a particular function. In that way, the application can behave gracefully in case a particular implementation is provided at runtime that does not support a function being accessed.
Warning
Features of reference graphs which are used to connect to remote graphs do not reflect the features of the graph to which it connects. It reflects the features of instantiated graph itself, which will likely be quite different considering that reference graphs will typically be immutable.

Vertex Properties

vertex properties TinkerPop introduces the concept of a VertexProperty<V>. All the properties of a Vertex are a VertexProperty. A VertexProperty implements Property and as such, it has a key/value pair. However, VertexProperty also implements Element and thus, can have a collection of key/value pairs. Moreover, while an Edge can only have one property of key "name" (for example), a Vertex can have multiple "name" properties. With the inclusion of vertex properties, two features are introduced which ultimately advance the graph modelers toolkit:

  1. Multiple properties (multi-properties): a vertex property key can have multiple values. For example, a vertex can have multiple "name" properties.

  2. Properties on properties (meta-properties): a vertex property can have properties (i.e. a vertex property can have key/value data associated with it).

Possible use cases for meta-properties:

  1. Permissions: Vertex properties can have key/value ACL-type permission information associated with them.

  2. Auditing: When a vertex property is manipulated, it can have key/value information attached to it saying who the creator, deletor, etc. are.

  3. Provenance: The "name" of a vertex can be declared by multiple users. For example, there may be multiple spellings of a name from different sources.

A running example using vertex properties is provided below to demonstrate and explain the API.

graph = TinkerGraph.open()
g = traversal().with(graph)
v = g.addV().property('name','marko').property('name','marko a. rodriguez').next()
g.V(v).properties('name').count() (1)
v.property(list, 'name', 'm. a. rodriguez') (2)
g.V(v).properties('name').count()
g.V(v).properties()
g.V(v).properties('name')
g.V(v).properties('name').hasValue('marko')
g.V(v).properties('name').hasValue('marko').property('acl','private') (3)
g.V(v).properties('name').hasValue('marko a. rodriguez')
g.V(v).properties('name').hasValue('marko a. rodriguez').property('acl','public')
g.V(v).properties('name').has('acl','public').value()
g.V(v).properties('name').has('acl','public').drop() (4)
g.V(v).properties('name').has('acl','public').value()
g.V(v).properties('name').has('acl','private').value()
g.V(v).properties()
g.V(v).properties().properties() (5)
g.V(v).properties().property('date',2014) (6)
g.V(v).properties().property('creator','stephen')
g.V(v).properties().properties()
g.V(v).properties('name').valueMap()
g.V(v).property('name','okram') (7)
g.V(v).properties('name')
g.V(v).values('name') (8)
  1. A vertex can have zero or more properties with the same key associated with it.

  2. If a property is added with a cardinality of Cardinality.list, an additional property with the provided key will be added.

  3. A vertex property can have standard key/value properties attached to it.

  4. Vertex property removal is identical to property removal.

  5. Gets the meta-properties of each vertex property.

  6. A vertex property can have any number of key/value properties attached to it.

  7. property(…​) will remove all existing key’d properties before adding the new single property (see VertexProperty.Cardinality).

  8. If only the value of a property is needed, then values() can be used.

If the concept of vertex properties is difficult to grasp, then it may be best to think of vertex properties in terms of "literal vertices." A vertex can have an edge to a "literal vertex" that has a single value key/value — e.g. "value=okram." The edge that points to that literal vertex has an edge-label of "name." The properties on the edge represent the literal vertex’s properties. The "literal vertex" can not have any other edges to it (only one from the associated vertex).

Tip
A toy graph demonstrating all of the new TinkerPop graph structure features is available at TinkerFactory.createTheCrew() and data/tinkerpop-crew*. This graph demonstrates multi-properties and meta-properties.
the crew graph
Figure 1. TinkerPop Crew
g.V().as('a').
      properties('location').as('b').
      hasNot('endTime').as('c').
      select('a','b','c').by('name').by(value).by('startTime') // determine the current location of each person
g.V().has('name','gremlin').inE('uses').
      order().by('skill',asc).as('a').
      outV().as('b').
      select('a','b').by('skill').by('name') // rank the users of gremlin by their skill level

Graph Variables

Graph.Variables are key/value pairs associated with the graph itself — in essence, a Map<String,Object>. These variables are intended to store metadata about the graph. Example use cases include:

  • Schema information: What do the namespace prefixes resolve to and when was the schema last modified?

  • Global permissions: What are the access rights for particular groups?

  • System user information: Who are the admins of the system?

An example of graph variables in use is presented below:

graph = TinkerGraph.open()
graph.variables()
graph.variables().set('systemAdmins',['stephen','peter','pavel'])
graph.variables().set('systemUsers',['matthias','marko','josh'])
graph.variables().keys()
graph.variables().get('systemUsers')
graph.variables().get('systemUsers').get()
graph.variables().remove('systemAdmins')
graph.variables().keys()
Important
Graph variables are not intended to be subject to heavy, concurrent mutation nor to be used in complex computations. The intention is to have a location to store data about the graph for administrative purposes.
Warning
Attempting to set graph variables in a reference graph will not promote them to the remote graph. Typically, a reference graph has immutable features and will not support this features.

Namespace Conventions

End users, graph system providers, GraphComputer algorithm designers, GremlinPlugin creators, etc. all leverage properties on elements to store information. There are a few conventions that should be respected when naming property keys to ensure that conflicts between these stakeholders do not conflict.

  • End users are granted the flat namespace (e.g. name, age, location) to key their properties and label their elements.

  • Graph system providers are granted the hidden namespace (e.g. ~metadata) to key their properties and labels. Data keyed as such is only accessible via the graph system implementation and no other stakeholders are granted read nor write access to data prefixed with "~" (see Graph.Hidden). Test coverage and exceptions exist to ensure that graph systems respect this hard boundary.

  • VertexProgram and MapReduce developers should leverage qualified namespaces particular to their domain (e.g. mydomain.myvertexprogram.computedata).

  • GremlinPlugin creators should prefix their plugin name with their domain (e.g. mydomain.myplugin).

Important
TinkerPop uses tinkerpop. and gremlin. as the prefixes for provided strategies, vertex programs, map reduce implementations, and plugins.

The only truly protected namespace is the hidden namespace provided to graph systems. From there, it’s up to engineers to respect the namespacing conventions presented.