Query execution reference

This page explains the output of a query executed with Query Explain. To learn how to execute a query with Query Explain, see Analyze query execution with Query Explain.

Common Concepts

The following common concepts and terms are used throughout the execution tree.

Rows and records

The terms row and record are used to generically refer to a document or index entry.

Variables

The following internal variables can appear in the execution nodes:

  • __key__-the key is an internal identifier for a document. This is an absolute, unique identifier with the project, database, and the full path of the document.
  • __id__-the ID is a unique identifier for a document within its collection. This is unique within a single collection.
  • __$0__…__$N__-these are context specific variables which are created or referenced in the execution tree. These variables are typically used to refer to the contents of a document or the value of an expression evaluated during the execution of a query.

Consider an example where an extend node is used to extract the __id__ from the document __key__:

Extend
    |  expressions: [_id(__key__) AS __id__]
    |  records returned: 1

Constraints and ranges

Some scan nodes use constraints and ranges attributes to describe the range of values that are scanned. These attributes use a range tree format which contains a list of values. These values correspond to the ordered list of keys which appear in the index definition. For example, the first range which appears in the tree, here (1..5], corresponds to the constraints on the first key, here a, in the ordered list of keys:

| index: type=CollectionGroupIndex, id=CICAgOjXh#EK, keys=[a ASC, b ASC, __key__ ASC]
| constraints: /
               |----(1..5]
                    |----[1L]

Each level of indentation indicates the constraint applying to the next key in the list. Square brackets represent an inclusive range, rounded brackets are an exclusive range. In this case, the constraint translates to 1 < "a" <= 5, and "b" = 1.

In the following example with multiple branches for a, the constraint corresponds to 1 < a <= 5 OR a = 10:

| constraints: /
               |----(1L, 5L]
               |----[10L]

Key Variables

In some scan nodes (such as SequentialScan), there is both a list of keys as part of the index attribute, and a separate keys attribute in the Scan node. The keys attribute in the Scan node denotes the variable name of each key in the index definition, in order. The variables can be used to reference the runtime values of the scanned field further up in the execution tree.

In the following example, the value of the user field for the current document maps to variable __$6__ and the value of date_placed to __$7__.

index: type=CollectionGroupIndex, id=CICAgOjXh4EK, keys=[user ASC, date_placed ASC, __path__ ASC]
keys: [__$6__ ASC, __$7__ ASC, __path__ ASC]

Execution Nodes

A query execution tree can contain the following nodes.

SeekingScan

Represents a dynamic scan where the rows returned may not be along a single sequential range of the index, and multiple distinct scans must be performed to satisfy the query.

For example, a query where a exists and b equals 1 working on an index of ["a" ASC, "b" ASC], would need to scan and return a separate, potentially non-sequential range for each distinct value of a. This is more efficient than a full TableScan, but less efficient than a single SequentialScan on a composite index of ["b" ASC, "a" ASC].

• SeekingScan
| constraints: /
               |----(-∞..+∞)
                    |----[1L]
| index: type=CollectionGroupIndex, id=CAE, keys=[user ASC, quantity ASC, __key__ ASC]
| keys: [__$1__ ASC, __$2__ ASC, __key__ ASC]
| properties: Selection { user }
| records returned: 1
| records scanned: 1

SequentialScan

Represents a scan of a static, sequential range of rows in storage that can be performed in a single read operation.

The key ordering length refers to the number of keys that must be preserved and returned in original key order. For a schema of [k1, k2, k3], a key ordering length of 0 means the scan can return in any order, 1 means order by k1, but rows with the same k1 value can come with any order, 3 returns documents in exact sorted order.

• SequentialScan
| index: type=CollectionGroupIndex, id=CAE, keys=[user ASC, date_placed ASC, __key__ ASC]
| key ordering length: 3
| keys: [__$1__ ASC, __$2__ ASC, __key__ ASC]
| limit: 10
| properties: Selection { a }
| ranges: /
| records returned: 1
| records scanned: 1

UniqueScan

Represents a scan of a static, sequential range of rows in storage with in-memory deduplication of rows.

• UniqueScan
| index: type=CollectionGroupIndex, id=CAE, keys=[user ASC, date_placed ASC, __key__ ASC]
| keys: [__$1__ ASC, __$2__ ASC, __key__ ASC]
| properties: Selection { a }
| ranges: /
          |----(-∞..+∞)
| records returned: 1
| records scanned: 1

TableAccess

Back-joins the supplied row's identifier to the actual row contents from primary storage. TableAccess is required if a parent node (or the final query result) requires a subset of fields from the documents.

• TableAccess
|  order: PRESERVE_INPUT_ORDER
|  peak memory usage: 4.00 KiB (4,096 B)
|  properties: *
|  records returned: 1

TableScan

A full, unordered scan of a collection. Used when a query is run without an associated index.

Order can be either STABLE or UNDEFINED, with STABLE denoting a deterministic ordering.

• TableScan
|  order: STABLE
|  properties: *
|  records returned: 1
|  records scanned: 1
|  source: (default)#/**/collection

HashAggregate

Hash-backed implementation of aggregate operations. Requires materializing the full group in-memory before returning the result and must not exceed the the query memory limit.

• HashAggregate
|  aggregations: [sum(__$0__) AS total]
|  groups: [a]
|  peak memory usage: 4.00 KiB (4,096 B)
|  records returned: 0

StreamAggregate

Specialized aggregate node which only maintains state for a single group at a time, reducing peak memory usage. Used when the underlying child node will return groups sequentially. For example, when grouping by distinct values of a field while using an index on that field.

• StreamAggregate
|  keys: [foo ASC, bar ASC]
|  properties: Selection { baz }
|  aggregations: [$sum(foo) AS baz]

MajorSort

Performs a sort operation on a fixed set of properties. Materializes all records in memory at once and returns the sorted values in order, the size of the sort set is limited by the query memory limit.

When a subsequent limit is provided, a top-k sorting algorithm is used to reduce the memory usage. With it, sorts can be performed on an arbitrarily large set of records so long as the memory used by storing the k considered elements does not exceed the limit.

• MajorSort
|  fields: [a ASC, b DESC]
|  limit: 10
|  peak memory usage: 4.00 KiB (4,096 B)
|  records returned: 1

Concat

Concatenates the results of multiple child nodes and returns the result to the parent node. This node does not deduplicate results that appear in multiple children, and the order of returned results is nondeterministic.

• Concat
├── • TableAccess
...
├── • TableAccess

Restrict

Projects a set of properties to be passed to its parent node. Reduces memory usage by preventing propagation of unused fields as soon as they are irrelevant to the rest of the query.

• Restrict
|  expressions: [foo AS foo, bar AS bar]
|  records returned: 0

Extend

Adds a set of fields to each row in the result set. Used when adding fields to a document and as an intermediate node to support more complex operations.

• Extend
|  expressions: ["bar" AS foo]
|  records returned: 1

Filter

Selectively returns rows if and only if they match the supplied expression.

• Filter
|  expression: $eq(foo, "bar")
|  records returned: 1

Values

Produces a sequence of literal values to work on. Used primarily when a set list of documents is provided as the input to a query.

• Values
| expression: [{__key__=/col/1}, {__key__=/col/2}]

ReplaceWith

Replaces the fields of rows produced by the child node with the fields from the supplied map expression.

• ReplaceWith
|  map: map("full_name", str_concat(first_name, " ", last_name)), current_context())
|  records returned: 1

Unnest

Unnests the value produced by the child node.

• Unnest
|  expression: foo AS unnested_foo

Limit

Limits the number of rows returned to the parent node.

• Limit
|  limit: 10
|  records returned: 1

Offset

Skips a set number of rows produced by the child node.

• Offset
|  offset: 10
|  records returned: 1

Drop

Drops a specified set of fields from results produced by the child node.

• Drop
|  fields to drop: [__key__]
|  records returned: 1