Skip to content

Latest commit

 

History

History
582 lines (388 loc) · 18.5 KB

graphbinary.asciidoc

File metadata and controls

582 lines (388 loc) · 18.5 KB

GraphBinary

GraphBinary is a binary serialization format suitable for object trees, designed to reduce serialization overhead on both the client and the server, as well as limiting the size of the payload that is transmitted over the wire.

It describes arbitrary object graphs with a fully-qualified format:

{type_code}{type_info}{value_flag}{value}

Where:

  • {type_code} is a single unsigned byte representing the type number.

  • {type_info} is an optional sequence of bytes providing additional information of the type represented. This is specially useful for representing complex and custom types.

  • {value_flag} is a single byte providing information about the value. Each type may have its own specific flags so see each type for more details. Generally, flags have the following meaning:

    • 0x01 The value is null. When this flag is set, no bytes for {value} will be provided.

  • {value} is a sequence of bytes which content is determined by the type.

All encodings are big-endian.

Quick examples, using hexadecimal notation to represent each byte:

  • 01 00 00 00 00 01: a 32-bit integer number, that represents the decimal number 1. It’s composed by the type_code 0x01, and empty flag value 0x00 and four bytes to describe the value.

  • 01 00 00 00 00 ff: a 32-bit integer, representing the number 256.

  • 01 01: a null value for a 32-bit integer. It’s composed by the type_code 0x01, and a null flag value 0x01.

  • 02 00 00 00 00 00 00 00 00 01: a 64-bit integer number 1. It’s composed by the type_code 0x02, empty flags and eight bytes to describe the value.

Version 4.0

Forward Compatibility

The serialization format supports new types being added without the need to introduce a new version.

Changes to existing types require new revision.

Data Type Codes

Core Data Types

  • 0x01: Int

  • 0x02: Long

  • 0x03: String

  • 0x04: DateTime

  • 0x07: Double

  • 0x08: Float

  • 0x09: List

  • 0x0a: Map

  • 0x0b: Set

  • 0x0c: UUID

  • 0x0d: Edge

  • 0x0e: Path

  • 0x0f: Property

  • 0x10: TinkerGraph

  • 0x11: Vertex

  • 0x12: VertexProperty

  • 0x18: Direction

  • 0x20: T

  • 0x22: BigDecimal

  • 0x23: BigInteger

  • 0x24: Byte

  • 0x25: Binary

  • 0x26: Short

  • 0x27: Boolean

  • 0x2b: Tree

  • 0xf0: CompositePDT

  • 0xf1: PrimitivePDT

  • 0xfd: Marker

  • 0xfe: Unspecified null object

Extended Types

  • 0x80: Char

  • 0x81: Duration

Null handling

The serialization format defines two ways to represent null values:

  • Unspecified null object

  • Fully-qualified null

When a parent type can contain any subtype e.g., a object collection, a null value must be represented using the "Unspecified Null Object" type code and the null value flag.

In contrast, when the parent type contains a type parameter that must be specified, a null value is represented using a fully-qualified object using the appropriate type code and type information.

Data Type Formats

Int

Format: 4-byte two’s complement integer.

Example values:

  • 00 00 00 01: 32-bit integer number 1.

  • 00 00 01 01: 32-bit integer number 256.

  • ff ff ff ff: 32-bit integer number -1.

  • ff ff ff fe: 32-bit integer number -2.

Long

Format: 8-byte two’s complement integer.

Example values

  • 00 00 00 00 00 00 00 01: 64-bit integer number 1.

  • ff ff ff ff ff ff ff fe: 64-bit integer number -2.

String

Format: {length}{text_value}

Where:

  • {length} is an Int describing the byte length of the text. Length is a positive number or zero to represent the empty string.

  • {text_value} is a sequence of bytes representing the string value in UTF8 encoding.

Example values

  • 00 00 00 03 61 62 63: the string 'abc'.

  • 00 00 00 04 61 62 63 64: the string 'abcd'.

  • 00 00 00 00: the empty string ''.

DateTime

A date-time with an offset from UTC/Greenwich in the ISO-8601 calendar system, such as 2007-12-03T10:15:30+01:00.

Format: {year}{month}{day}{time}{offset}

Where:

  • {year} is an Int from -999,999,999 to 999,999,999.

  • {month} is a Byte to represent the month, from 1 (January) to 12 (December)

  • {day} is a Byte from 1 to 31.

  • {time} is a Long to represent nanoseconds since midnight, from 0 to 86399999999999

  • {offset} is an Int to represent total zone offset in seconds, from -64800 (-18:00) to 64800 (+18:00).

Double

Format: 8 bytes representing IEEE 754 double-precision binary floating-point format.

Example values

  • 3f f0 00 00 00 00 00 00: Double 1

  • 3f 70 00 00 00 00 00 00: Double 0.00390625

  • 3f b9 99 99 99 99 99 9a: Double 0.1

Float

Format: 4 bytes representing IEEE 754 single-precision binary floating-point format.

Example values

  • 3f 80 00 00: Float 1

  • 3e c0 00 00: Float 0.375

List

An ordered collection of items. The format depends on the {value_flag}.

Format (value_flag=0x00): {length}{item_0}…​{item_n}

Where:

  • {length} is an Int describing the length of the collection.

  • {item_0}…​{item_n} are the items of the list. {item_i} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value}.

Format (value_flag=0x02): {length}{item_0}{bulk_0}…​{item_n}{bulk_n}

Where:

  • {length} is an Int describing the length of the collection.

  • {item_0}…​{item_n} are the items of the list. {item_i} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value}.

  • {bulk_0}…​{bulk_n} are Int that represent how many times that item should be repeated in the expanded list.

Set

A collection that contains no duplicate elements.

Format: Same as List.

Map

A dictionary of keys to values. A {value_flag} equal to 0x02 means that the map is ordered.

Format: {length}{item_0}…​{item_n}

Where:

  • {length} is an Int describing the length of the map.

  • {item_0}…​{item_n} are the items of the map. {item_i} is sequence of 2 fully qualified typed values one representing the key and the following representing the value, each composed of {type_code}{type_info}{value_flag}{value}.

UUID

A 128-bit universally unique identifier.

Format: 16 bytes representing the uuid.

Example

  • 00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff: Uuid 00112233-4455-6677-8899-aabbccddeeff.

Edge

Format: {id}{label}{inVId}{inVLabel}{outVId}{outVLabel}{parent}{properties}

Where:

  • {id} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value}.

  • {label} is a List {value}.

  • {inVId} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value}.

  • {inVLabel} is a List {value}.

  • {outVId} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value}.

  • {outVLabel} is a List {value}.

  • {parent} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value} which contains the parent Vertex. Note that as TinkerPop currently send "references" only, this value will always be null.

  • {properties} is a List of Property objects.

Example values:

01 00  00 00 00 0d                                            id is 13
00 00 00 01  03 00  00 00 00 08  64 65 76 65 6c 6f 70 73      label is a size 1 list with string 'develops'
01 00  00 00 00 0a                                            inVId is 10
00 00 00 01  03 00  00 00 00 08  73 6f 66 74 77 61 72 65      inVLabel is a size 1 list with string 'software'
01 00  00 00 00 01                                            outVId is 1
00 00 00 01  03 00  00 00 00 06  70 65 72 73 6f 6e            outVLabel is a size 1 list with string 'person'
fe 01                                                         parent is always null
09 00  00 00 00 01                                            properties is a size 1 list
0f 00 00 00 00 05  73 69 6e 63 65  01 00  00 00 07 d9  fe 01  property with key 'since' and value 2009 and null parent

Path

Format: {labels}{objects}

Where:

  • {labels} is a fully qualified List in which each item is a fully qualified Set of String.

  • {objects} is a fully qualified List of fully qualified typed values.

Property

Format: {key}{value}{parent}

Where:

  • {key} is a String value.

  • {value} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value}.

  • {parent} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value} which is either an Edge or VertexProperty. Note that as TinkerPop currently sends "references" only this value will always be null.

Graph

A collection of vertices and edges. Note that while similar the vertex/edge formats here hold some differences as compared to the Vertex and Edge formats used for standard serialization/deserialiation of a single graph element.

Format: {vlength}{vertex_0}…​{vertex_n}{elength}{edge_0}…​{edge_n}

Where:

  • {vlength} is an Int describing the number of vertices.

  • {vertex_0}…​{vertex_n} are vertices as described below.

  • {elength} is an Int describing the number of edges.

  • {edge_0}…​{edge_n} are edges as described below.

Vertex Format: {id}{label}{plength}{property_0}…​{property_n}

  • {id} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value}.

  • {label} is a String value.

  • {plength} is an Int describing the number of properties on the vertex.

  • {property_0}…​{property_n} are the vertex properties consisting of {id}{label}{value}{parent}{properties} as defined in VertexProperty where the {parent} is always null and {properties} is a List of Property objects.

Edge Format: {id}{label}{inVId}{inVLabel}{outVId}{outVLabel}{parent}{properties}

Where:

  • {id} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value}.

  • {label} is a String value.

  • {inVId} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value}.

  • {inVLabel} is always null.

  • {outVId} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value}.

  • {outVLabel} is always null.

  • {parent} is always null.

  • {properties} is a List of Property objects.

Vertex

Format: {id}{label}{properties}

Where:

  • {id} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value}.

  • {label} is a List {value}.

  • {properties} is a List of VertexProperty values.

Example values:

01 00  00 00 00 01                                        id is int 1
00 00 00 01  03 00  00 00 00 06  70 65 72 73 6f 6e        label is size 1 list with string 'person'
09 00  00 00 00 01  12 00  02 00  00 00 00 00 00 00 00 09 properties is a size 1 list with VertexProperty id 9
00 00 00 01  03 00  00 00 00 08  6c 6f 63 61 74 69 6f 6e  VertexProperty label is string 'location'
03 00  00 00 00 08  73 61 6e 74 61 20 66 65               VertexProperty value is string 'santa fe'
fe 01                                                     VertexProperty parent is always null
09 00  00 00 00 01                                        VertexProperty has a size 1 list
0f 00  00 00 00 09 73 74 61 72 74 54 69 6d 65             metaproperty with string key 'startTime'
01 00  00 00 07 d5  fe 01                                 VertexProperty metaproperty value is 2005 with null parent

VertexProperty

Format: {id}{label}{value}{parent}{properties}

Where:

  • {id} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value}.

  • {label} is a List {value}.

  • {value} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value}.

  • {parent} is a fully qualified typed value composed of {type_code}{type_info}{value_flag}{value} which contains the parent Vertex. Note that as TinkerPop currently send "references" only, this value will always be null.

  • {properties} is a List of Property objects.

Example values:

02 00  00 00 00 00 00 00 00 00                id is Long 0
00 00 00 01  03 00  00 00 00 04  6e 61 6d 65  label is size 1 list with string 'name'
03 00  00 00 00 05  6d 61 72 6b 6f            value is string 'marko'
fe 01                                         parent is always null
09 00  00 00 00 00                            metaproperties is empty list

Direction

Format: a fully qualified single String representing the enum value.

Example values:

  • 00 00 00 03 4F 55 54: OUT

  • 00 00 00 02 49 4E: IN

T

Format: a fully qualified single String representing the enum value.

Example values:

  • 00 00 00 05 6C 61 62 65 6C: label

  • 00 00 00 02 69 64: id

BigDecimal

Represents an arbitrary-precision signed decimal number, consisting of an arbitrary precision integer unscaled value and a 32-bit integer scale.

Format: {scale}{unscaled_value}

Where:

  • {scale} is an Int.

  • {unscaled_value} is a BigInteger.

BigInteger

A variable-length two’s complement encoding of a signed integer.

Format: {length}{value}

Where:

  • {length} is an Int describing the size of {value} in bytes.

  • {value} is the two’s complement of the BigInteger.

Example values of the two’s complement {value}:

  • 00: Integer 0.

  • 01: Integer 1.

  • 127: Integer 7f.

  • 00 80: Integer 128.

  • ff: Integer -1.

  • 80: Integer -128.

  • ff 7f: Integer -129.

Byte

Format: 1-byte two’s complement integer.

Example values:

  • 01: 8-bit integer number 1.

  • ff: 8-bit integer number -1.

Binary

Format: {length}{value}

Where:

  • {length} is an Int representing the amount of bytes contained in the value.

  • {value} sequence of bytes.

Short

Format: 2-byte two’s complement integer.

Example values:

  • 00 01: 16-bit integer number 1.

  • 01 02: 16-bit integer number 258.

Boolean

Format: A single byte containing the value 0x01 when it’s true and 0 otherwise.

Tree

Format: {length}{item_0}…​{item_n}

Where:

  • {length} is an Int describing the amount of items.

  • {item_0}…​{item_n} are the items of the Tree. {item_i} is composed of a {key} which is a fully-qualified typed value followed by a {Tree}.

Marker

A 1-byte marker used to separate the end of the data and the beginning of the status of a ResponseMessage. This is mainly used by language variants during deserialization.

Format: 1-byte integer with a value of 00.

CompositePDT

A composite custom type, represented as a type and a map of values.

Format: {type}{fields}

Where:

  • {type} is a String containing the implementation specific text identifier of the custom type.

  • {fields} is a Map representing the fields of the composite type.

Example values:

03 00 00 00 00 05 50 6F 69 6E 74: the string "Point"
0A 00 00 00 00 02 31 30: length 2 map header
03 00 00 00 00 01 78 01 00 00 00 00 01: {x:1}
03 00 00 00 00 01 79 01 00 00 00 00 02: {y:2}

PrimitivePDT

A primitive custom type, represented as a type and the stringified value.

Format: {type}{value}

Where:

  • {type} is a String containing the implementation specific text identifier of the custom type.

  • {value} is a String representing the string version of the value.

Example values:

03 00 00 00 00 05 55 69 6E 74 38: the string "Uint8"
03 00 00 00 00 02 31 30: the string "10"

Unspecified Null Object

A null value for an unspecified Object value.

It’s represented using the null {value_flag} set and no sequence of bytes (which is FE 01).

Char

Format: one to four bytes representing a single UTF8 char, according to the Unicode standard.

For characters 0x00-0x7F, UTF-8 encodes the character as a single byte.

For characters 0x80-0x07FF, UTF-8 uses 2 bytes: the first byte is binary 110 followed by the 5 high bits of the character, while the second byte is binary 10 followed by the 6 low bits of the character.

The 3 and 4-byte encodings are similar to the 2-byte encoding, except that the first byte of the 3-byte encoding starts with 1110 and the first byte of the 4-byte encoding starts with 11110.

Example values (hex bytes)

  • 97: Character 'a'.

  • c2 a2: Character '¢'.

  • e2 82 ac: Character '€'

Duration

A time-based amount of time.

Format: {seconds}{nanos}

Where:

  • {seconds} is a Long.

  • {nanos} is an Int.

Request and Response Messages

Request and response messages are special container types used to represent messages from client to the server and the other way around. These messages are independent from the transport layer.

Request Message

Represents a message from the client to the server.

Format: {version}{fields}{gremlin}

Where:

  • {version} is a Byte representing the specification version, with the most significant bit set to one. For this version of the format, the value expected is 0x84 (10000004).

  • {fields} is a Map.

  • {gremlin} is a String.

The total length is not part of the message as the transport layer will provide it. For example: in HTTP, there is the Content-Length header which defines the payload size.

Response Message

Format: {version}{bulked}{result_data}{marker}{status_code}{status_message}{exception}

Where:

  • {version} is a Byte representing the protocol version, with the most significant bit set to one. For this version of the protocol, the value expected is 0x84 (10000004).

  • {bulked} is a Byte representing whether {result_data} is bulked. 00 is false and 01 is true.

  • {result_data} is a sequence of fully qualified typed value composed of {type_code}{type_info}{value_flag}{value}. If {bulked} is 01 then each value is followed by an 8-byte integer denoting the bulk of the preceding value.

  • {marker} is a Marker.

  • {status_code} is an Int.

  • {status_message} is a nullable String.

  • {exception} is a nullable String.

The total length is not part of the message as the transport layer will provide it.