Get behavioral analytics collections
Deprecated
Technical preview
Path parameters
-
name
array[string] Required A list of analytics collections to limit the returned information
curl \
--request GET 'http://api.example.com/_application/analytics/{name}' \
--header "Authorization: $API_KEY"
{
"my_analytics_collection": {
"event_data_stream": {
"name": "behavioral_analytics-events-my_analytics_collection"
}
},
"my_analytics_collection2": {
"event_data_stream": {
"name": "behavioral_analytics-events-my_analytics_collection2"
}
}
}
Create a behavioral analytics collection
Deprecated
Technical preview
Path parameters
-
name
string Required The name of the analytics collection to be created or updated.
curl \
--request PUT 'http://api.example.com/_application/analytics/{name}' \
--header "Authorization: $API_KEY"
Compact and aligned text (CAT)
The compact and aligned text (CAT) APIs aim are intended only for human consumption using the Kibana console or command line. They are not intended for use by applications. For application consumption, it's recommend to use a corresponding JSON API.
All the cat commands accept a query string parameter help
to see all the headers and info they provide, and the /_cat
command alone lists all the available commands.
Get aliases
Get the cluster's index aliases, including filter and routing information. This API does not return data stream aliases.
IMPORTANT: CAT APIs are only intended for human consumption using the command line or the Kibana console. They are not intended for use by applications. For application consumption, use the aliases API.
Query parameters
-
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
expand_wildcards
string | array[string] The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. It supports comma-separated values, such as
open,hidden
.Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
-
master_timeout
string The period to wait for a connection to the master node. If the master node is not available before the timeout expires, the request fails and returns an error. To indicated that the request should never timeout, you can set it to
-1
.
curl \
--request GET 'http://api.example.com/_cat/aliases' \
--header "Authorization: $API_KEY"
[
{
"alias": "alias1",
"index": "test1",
"filter": "-",
"routing.index": "-",
"routing.search": "-",
"is_write_index": "true"
},
{
"alias": "alias1",
"index": "test1",
"filter": "*",
"routing.index": "-",
"routing.search": "-",
"is_write_index": "true"
},
{
"alias": "alias3",
"index": "test1",
"filter": "-",
"routing.index": "1",
"routing.search": "1",
"is_write_index": "true"
},
{
"alias": "alias4",
"index": "test1",
"filter": "-",
"routing.index": "2",
"routing.search": "1,2",
"is_write_index": "true"
}
]
Get aliases
Get the cluster's index aliases, including filter and routing information. This API does not return data stream aliases.
IMPORTANT: CAT APIs are only intended for human consumption using the command line or the Kibana console. They are not intended for use by applications. For application consumption, use the aliases API.
Path parameters
-
name
string | array[string] Required A comma-separated list of aliases to retrieve. Supports wildcards (
*
). To retrieve all aliases, omit this parameter or use*
or_all
.
Query parameters
-
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
expand_wildcards
string | array[string] The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. It supports comma-separated values, such as
open,hidden
.Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
-
master_timeout
string The period to wait for a connection to the master node. If the master node is not available before the timeout expires, the request fails and returns an error. To indicated that the request should never timeout, you can set it to
-1
.
curl \
--request GET 'http://api.example.com/_cat/aliases/{name}' \
--header "Authorization: $API_KEY"
[
{
"alias": "alias1",
"index": "test1",
"filter": "-",
"routing.index": "-",
"routing.search": "-",
"is_write_index": "true"
},
{
"alias": "alias1",
"index": "test1",
"filter": "*",
"routing.index": "-",
"routing.search": "-",
"is_write_index": "true"
},
{
"alias": "alias3",
"index": "test1",
"filter": "-",
"routing.index": "1",
"routing.search": "1",
"is_write_index": "true"
},
{
"alias": "alias4",
"index": "test1",
"filter": "-",
"routing.index": "2",
"routing.search": "1,2",
"is_write_index": "true"
}
]
Get a document count
Get quick access to a document count for a data stream, an index, or an entire cluster. The document count only includes live documents, not deleted documents which have not yet been removed by the merge process.
IMPORTANT: CAT APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the count API.
Path parameters
-
index
string | array[string] Required A comma-separated list of data streams, indices, and aliases used to limit the request. It supports wildcards (
*
). To target all data streams and indices, omit this parameter or use*
or_all
.
curl \
--request GET 'http://api.example.com/_cat/count/{index}' \
--header "Authorization: $API_KEY"
[
{
"epoch": "1475868259",
"timestamp": "15:24:20",
"count": "120"
}
]
[
{
"epoch": "1475868259",
"timestamp": "15:24:20",
"count": "121"
}
]
Get index information
Get high-level information about indices in a cluster, including backing indices for data streams.
Use this request to get the following information for each index in a cluster:
- shard count
- document count
- deleted document count
- primary store size
- total store size of all shards, including shard replicas
These metrics are retrieved directly from Lucene, which Elasticsearch uses internally to power indexing and search. As a result, all document counts include hidden nested documents. To get an accurate count of Elasticsearch documents, use the cat count or count APIs.
CAT APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use an index endpoint.
Query parameters
-
bytes
string The unit used to display byte values.
Values are
b
,kb
,mb
,gb
,tb
, orpb
. -
expand_wildcards
string | array[string] The type of index that wildcard patterns can match.
Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
-
health
string The health status used to limit returned indices. By default, the response includes indices of any health status.
Supported values include:
green
(orGREEN
): All shards are assigned.yellow
(orYELLOW
): All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired.red
(orRED
): One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.
Values are
green
,GREEN
,yellow
,YELLOW
,red
, orRED
. -
include_unloaded_segments
boolean If true, the response includes information from segments that are not loaded into memory.
-
pri
boolean If true, the response only includes information from primary shards.
-
time
string The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
. -
master_timeout
string Period to wait for a connection to the master node.
-
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name.
curl \
--request GET 'http://api.example.com/_cat/indices' \
--header "Authorization: $API_KEY"
[
{
"health": "yellow",
"status": "open",
"index": "my-index-000001",
"uuid": "u8FNjxh8Rfy_awN11oDKYQ",
"pri": "1",
"rep": "1",
"docs.count": "1200",
"docs.deleted": "0",
"store.size": "88.1kb",
"pri.store.size": "88.1kb",
"dataset.size": "88.1kb"
},
{
"health": "green",
"status": "open",
"index": "my-index-000002",
"uuid": "nYFWZEO7TUiOjLQXBaYJpA ",
"pri": "1",
"rep": "0",
"docs.count": "0",
"docs.deleted": "0",
"store.size": "260b",
"pri.store.size": "260b",
"dataset.size": "260b"
}
]
Get datafeeds
Added in 7.7.0
Get configuration and usage information about datafeeds.
This API returns a maximum of 10,000 datafeeds.
If the Elasticsearch security features are enabled, you must have monitor_ml
, monitor
, manage_ml
, or manage
cluster privileges to use this API.
IMPORTANT: CAT APIs are only intended for human consumption using the Kibana console or command line. They are not intended for use by applications. For application consumption, use the get datafeed statistics API.
Path parameters
-
datafeed_id
string Required A numerical character string that uniquely identifies the datafeed.
Query parameters
-
allow_no_match
boolean Specifies what to do when the request:
- Contains wildcard expressions and there are no datafeeds that match.
- Contains the
_all
string or no identifiers and there are no matches. - Contains wildcard expressions and there are only partial matches.
If
true
, the API returns an empty datafeeds array when there are no matches and the subset of results when there are partial matches. Iffalse
, the API returns a 404 status code when there are no matches or only partial matches. -
h
string | array[string] Comma-separated list of column names to display.
Supported values include:
ae
(orassignment_explanation
): For started datafeeds only, contains messages relating to the selection of a node.bc
(orbuckets.count
,bucketsCount
): The number of buckets processed.id
: A numerical character string that uniquely identifies the datafeed.na
(ornode.address
,nodeAddress
): For started datafeeds only, the network address of the node where the datafeed is started.ne
(ornode.ephemeral_id
,nodeEphemeralId
): For started datafeeds only, the ephemeral ID of the node where the datafeed is started.ni
(ornode.id
,nodeId
): For started datafeeds only, the unique identifier of the node where the datafeed is started.nn
(ornode.name
,nodeName
): For started datafeeds only, the name of the node where the datafeed is started.sba
(orsearch.bucket_avg
,searchBucketAvg
): The average search time per bucket, in milliseconds.sc
(orsearch.count
,searchCount
): The number of searches run by the datafeed.seah
(orsearch.exp_avg_hour
,searchExpAvgHour
): The exponential average search time per hour, in milliseconds.st
(orsearch.time
,searchTime
): The total time the datafeed spent searching, in milliseconds.s
(orstate
): The status of the datafeed:starting
,started
,stopping
, orstopped
. Ifstarting
, the datafeed has been requested to start but has not yet started. Ifstarted
, the datafeed is actively receiving data. Ifstopping
, the datafeed has been requested to stop gracefully and is completing its final action. Ifstopped
, the datafeed is stopped and will not receive data until it is re-started.
-
s
string | array[string] Comma-separated list of column names or column aliases used to sort the response.
Supported values include:
ae
(orassignment_explanation
): For started datafeeds only, contains messages relating to the selection of a node.bc
(orbuckets.count
,bucketsCount
): The number of buckets processed.id
: A numerical character string that uniquely identifies the datafeed.na
(ornode.address
,nodeAddress
): For started datafeeds only, the network address of the node where the datafeed is started.ne
(ornode.ephemeral_id
,nodeEphemeralId
): For started datafeeds only, the ephemeral ID of the node where the datafeed is started.ni
(ornode.id
,nodeId
): For started datafeeds only, the unique identifier of the node where the datafeed is started.nn
(ornode.name
,nodeName
): For started datafeeds only, the name of the node where the datafeed is started.sba
(orsearch.bucket_avg
,searchBucketAvg
): The average search time per bucket, in milliseconds.sc
(orsearch.count
,searchCount
): The number of searches run by the datafeed.seah
(orsearch.exp_avg_hour
,searchExpAvgHour
): The exponential average search time per hour, in milliseconds.st
(orsearch.time
,searchTime
): The total time the datafeed spent searching, in milliseconds.s
(orstate
): The status of the datafeed:starting
,started
,stopping
, orstopped
. Ifstarting
, the datafeed has been requested to start but has not yet started. Ifstarted
, the datafeed is actively receiving data. Ifstopping
, the datafeed has been requested to stop gracefully and is completing its final action. Ifstopped
, the datafeed is stopped and will not receive data until it is re-started.
-
time
string The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
.
curl \
--request GET 'http://api.example.com/_cat/ml/datafeeds/{datafeed_id}' \
--header "Authorization: $API_KEY"
[
{
"id": "datafeed-high_sum_total_sales",
"state": "stopped",
"buckets.count": "743",
"search.count": "7"
},
{
"id": "datafeed-low_request_rate",
"state": "stopped",
"buckets.count": "1457",
"search.count": "3"
},
{
"id": "datafeed-response_code_rates",
"state": "stopped",
"buckets.count": "1460",
"search.count": "18"
},
{
"id": "datafeed-url_scanning",
"state": "stopped",
"buckets.count": "1460",
"search.count": "18"
}
]
Get anomaly detection jobs
Added in 7.7.0
Get configuration and usage information for anomaly detection jobs.
This API returns a maximum of 10,000 jobs.
If the Elasticsearch security features are enabled, you must have monitor_ml
,
monitor
, manage_ml
, or manage
cluster privileges to use this API.
IMPORTANT: CAT APIs are only intended for human consumption using the Kibana console or command line. They are not intended for use by applications. For application consumption, use the get anomaly detection job statistics API.
Query parameters
-
allow_no_match
boolean Specifies what to do when the request:
- Contains wildcard expressions and there are no jobs that match.
- Contains the
_all
string or no identifiers and there are no matches. - Contains wildcard expressions and there are only partial matches.
If
true
, the API returns an empty jobs array when there are no matches and the subset of results when there are partial matches. Iffalse
, the API returns a 404 status code when there are no matches or only partial matches. -
bytes
string The unit used to display byte values.
Values are
b
,kb
,mb
,gb
,tb
, orpb
. -
h
string | array[string] Comma-separated list of column names to display.
Supported values include:
assignment_explanation
(orae
): For open anomaly detection jobs only, contains messages relating to the selection of a node to run the job.buckets.count
(orbc
,bucketsCount
): The number of bucket results produced by the job.buckets.time.exp_avg
(orbtea
,bucketsTimeExpAvg
): Exponential moving average of all bucket processing times, in milliseconds.buckets.time.exp_avg_hour
(orbteah
,bucketsTimeExpAvgHour
): Exponentially-weighted moving average of bucket processing times calculated in a 1 hour time window, in milliseconds.buckets.time.max
(orbtmax
,bucketsTimeMax
): Maximum among all bucket processing times, in milliseconds.buckets.time.min
(orbtmin
,bucketsTimeMin
): Minimum among all bucket processing times, in milliseconds.buckets.time.total
(orbtt
,bucketsTimeTotal
): Sum of all bucket processing times, in milliseconds.data.buckets
(ordb
,dataBuckets
): The number of buckets processed.data.earliest_record
(order
,dataEarliestRecord
): The timestamp of the earliest chronologically input document.data.empty_buckets
(ordeb
,dataEmptyBuckets
): The number of buckets which did not contain any data.data.input_bytes
(ordib
,dataInputBytes
): The number of bytes of input data posted to the anomaly detection job.data.input_fields
(ordif
,dataInputFields
): The total number of fields in input documents posted to the anomaly detection job. This count includes fields that are not used in the analysis. However, be aware that if you are using a datafeed, it extracts only the required fields from the documents it retrieves before posting them to the job.data.input_records
(ordir
,dataInputRecords
): The number of input documents posted to the anomaly detection job.data.invalid_dates
(ordid
,dataInvalidDates
): The number of input documents with either a missing date field or a date that could not be parsed.data.last
(ordl
,dataLast
): The timestamp at which data was last analyzed, according to server time.data.last_empty_bucket
(ordleb
,dataLastEmptyBucket
): The timestamp of the last bucket that did not contain any data.data.last_sparse_bucket
(ordlsb
,dataLastSparseBucket
): The timestamp of the last bucket that was considered sparse.data.latest_record
(ordlr
,dataLatestRecord
): The timestamp of the latest chronologically input document.data.missing_fields
(ordmf
,dataMissingFields
): The number of input documents that are missing a field that the anomaly detection job is configured to analyze. Input documents with missing fields are still processed because it is possible that not all fields are missing.data.out_of_order_timestamps
(ordoot
,dataOutOfOrderTimestamps
): The number of input documents that have a timestamp chronologically preceding the start of the current anomaly detection bucket offset by the latency window. This information is applicable only when you provide data to the anomaly detection job by using the post data API. These out of order documents are discarded, since jobs require time series data to be in ascending chronological order.data.processed_fields
(ordpf
,dataProcessedFields
): The total number of fields in all the documents that have been processed by the anomaly detection job. Only fields that are specified in the detector configuration object contribute to this count. The timestamp is not included in this count.data.processed_records
(ordpr
,dataProcessedRecords
): The number of input documents that have been processed by the anomaly detection job. This value includes documents with missing fields, since they are nonetheless analyzed. If you use datafeeds and have aggregations in your search query, the processed record count is the number of aggregation results processed, not the number of Elasticsearch documents.data.sparse_buckets
(ordsb
,dataSparseBuckets
): The number of buckets that contained few data points compared to the expected number of data points.forecasts.memory.avg
(orfmavg
,forecastsMemoryAvg
): The average memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.max
(orfmmax
,forecastsMemoryMax
): The maximum memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.min
(orfmmin
,forecastsMemoryMin
): The minimum memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.total
(orfmt
,forecastsMemoryTotal
): The total memory usage in bytes for forecasts related to the anomaly detection job.forecasts.records.avg
(orfravg
,forecastsRecordsAvg
): The average number ofm
odel_forecast` documents written for forecasts related to the anomaly detection job.forecasts.records.max
(orfrmax
,forecastsRecordsMax
): The maximum number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.records.min
(orfrmin
,forecastsRecordsMin
): The minimum number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.records.total
(orfrt
,forecastsRecordsTotal
): The total number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.time.avg
(orftavg
,forecastsTimeAvg
): The average runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.max
(orftmax
,forecastsTimeMax
): The maximum runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.min
(orftmin
,forecastsTimeMin
): The minimum runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.total
(orftt
,forecastsTimeTotal
): The total runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.total
(orft
,forecastsTotal
): The number of individual forecasts currently available for the job.id
: Identifier for the anomaly detection job.model.bucket_allocation_failures
(ormbaf
,modelBucketAllocationFailures
): The number of buckets for which new entities in incoming data were not processed due to insufficient model memory.model.by_fields
(ormbf
,modelByFields
): The number of by field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.bytes
(ormb
,modelBytes
): The number of bytes of memory used by the models. This is the maximum value since the last time the model was persisted. If the job is closed, this value indicates the latest size.model.bytes_exceeded
(ormbe
,modelBytesExceeded
): The number of bytes over the high limit for memory usage at the last allocation failure.model.categorization_status
(ormcs
,modelCategorizationStatus
): The status of categorization for the job:ok
orwarn
. Ifok
, categorization is performing acceptably well (or not being used at all). Ifwarn
, categorization is detecting a distribution of categories that suggests the input data is inappropriate for categorization. Problems could be that there is only one category, more than 90% of categories are rare, the number of categories is greater than 50% of the number of categorized documents, there are no frequently matched categories, or more than 50% of categories are dead.model.categorized_doc_count
(ormcdc
,modelCategorizedDocCount
): The number of documents that have had a field categorized.model.dead_category_count
(ormdcc
,modelDeadCategoryCount
): The number of categories created by categorization that will never be assigned again because another category’s definition makes it a superset of the dead category. Dead categories are a side effect of the way categorization has no prior training.model.failed_category_count
(ormdcc
,modelFailedCategoryCount
): The number of times that categorization wanted to create a new category but couldn’t because the job had hit its model memory limit. This count does not track which specific categories failed to be created. Therefore, you cannot use this value to determine the number of unique categories that were missed.model.frequent_category_count
(ormfcc
,modelFrequentCategoryCount
): The number of categories that match more than 1% of categorized documents.model.log_time
(ormlt
,modelLogTime
): The timestamp when the model stats were gathered, according to server time.model.memory_limit
(ormml
,modelMemoryLimit
): The timestamp when the model stats were gathered, according to server time.model.memory_status
(ormms
,modelMemoryStatus
): The status of the mathematical models:ok
,soft_limit
, orhard_limit
. Ifok
, the models stayed below the configured value. Ifsoft_limit
, the models used more than 60% of the configured memory limit and older unused models will be pruned to free up space. Additionally, in categorization jobs no further category examples will be stored. Ifhard_limit
, the models used more space than the configured memory limit. As a result, not all incoming data was processed.model.over_fields
(ormof
,modelOverFields
): The number of over field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.partition_fields
(ormpf
,modelPartitionFields
): The number of partition field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.rare_category_count
(ormrcc
,modelRareCategoryCount
): The number of categories that match just one categorized document.model.timestamp
(ormt
,modelTimestamp
): The timestamp of the last record when the model stats were gathered.model.total_category_count
(ormtcc
,modelTotalCategoryCount
): The number of categories created by categorization.node.address
(orna
,nodeAddress
): The network address of the node that runs the job. This information is available only for open jobs.node.ephemeral_id
(orne
,nodeEphemeralId
): The ephemeral ID of the node that runs the job. This information is available only for open jobs.node.id
(orni
,nodeId
): The unique identifier of the node that runs the job. This information is available only for open jobs.node.name
(ornn
,nodeName
): The name of the node that runs the job. This information is available only for open jobs.opened_time
(orot
): For open jobs only, the elapsed time for which the job has been open.state
(ors
): The status of the anomaly detection job:closed
,closing
,failed
,opened
, oropening
. Ifclosed
, the job finished successfully with its model state persisted. The job must be opened before it can accept further data. Ifclosing
, the job close action is in progress and has not yet completed. A closing job cannot accept further data. Iffailed
, the job did not finish successfully due to an error. This situation can occur due to invalid input data, a fatal error occurring during the analysis, or an external interaction such as the process being killed by the Linux out of memory (OOM) killer. If the job had irrevocably failed, it must be force closed and then deleted. If the datafeed can be corrected, the job can be closed and then re-opened. Ifopened
, the job is available to receive and process data. Ifopening
, the job open action is in progress and has not yet completed.
-
s
string | array[string] Comma-separated list of column names or column aliases used to sort the response.
Supported values include:
assignment_explanation
(orae
): For open anomaly detection jobs only, contains messages relating to the selection of a node to run the job.buckets.count
(orbc
,bucketsCount
): The number of bucket results produced by the job.buckets.time.exp_avg
(orbtea
,bucketsTimeExpAvg
): Exponential moving average of all bucket processing times, in milliseconds.buckets.time.exp_avg_hour
(orbteah
,bucketsTimeExpAvgHour
): Exponentially-weighted moving average of bucket processing times calculated in a 1 hour time window, in milliseconds.buckets.time.max
(orbtmax
,bucketsTimeMax
): Maximum among all bucket processing times, in milliseconds.buckets.time.min
(orbtmin
,bucketsTimeMin
): Minimum among all bucket processing times, in milliseconds.buckets.time.total
(orbtt
,bucketsTimeTotal
): Sum of all bucket processing times, in milliseconds.data.buckets
(ordb
,dataBuckets
): The number of buckets processed.data.earliest_record
(order
,dataEarliestRecord
): The timestamp of the earliest chronologically input document.data.empty_buckets
(ordeb
,dataEmptyBuckets
): The number of buckets which did not contain any data.data.input_bytes
(ordib
,dataInputBytes
): The number of bytes of input data posted to the anomaly detection job.data.input_fields
(ordif
,dataInputFields
): The total number of fields in input documents posted to the anomaly detection job. This count includes fields that are not used in the analysis. However, be aware that if you are using a datafeed, it extracts only the required fields from the documents it retrieves before posting them to the job.data.input_records
(ordir
,dataInputRecords
): The number of input documents posted to the anomaly detection job.data.invalid_dates
(ordid
,dataInvalidDates
): The number of input documents with either a missing date field or a date that could not be parsed.data.last
(ordl
,dataLast
): The timestamp at which data was last analyzed, according to server time.data.last_empty_bucket
(ordleb
,dataLastEmptyBucket
): The timestamp of the last bucket that did not contain any data.data.last_sparse_bucket
(ordlsb
,dataLastSparseBucket
): The timestamp of the last bucket that was considered sparse.data.latest_record
(ordlr
,dataLatestRecord
): The timestamp of the latest chronologically input document.data.missing_fields
(ordmf
,dataMissingFields
): The number of input documents that are missing a field that the anomaly detection job is configured to analyze. Input documents with missing fields are still processed because it is possible that not all fields are missing.data.out_of_order_timestamps
(ordoot
,dataOutOfOrderTimestamps
): The number of input documents that have a timestamp chronologically preceding the start of the current anomaly detection bucket offset by the latency window. This information is applicable only when you provide data to the anomaly detection job by using the post data API. These out of order documents are discarded, since jobs require time series data to be in ascending chronological order.data.processed_fields
(ordpf
,dataProcessedFields
): The total number of fields in all the documents that have been processed by the anomaly detection job. Only fields that are specified in the detector configuration object contribute to this count. The timestamp is not included in this count.data.processed_records
(ordpr
,dataProcessedRecords
): The number of input documents that have been processed by the anomaly detection job. This value includes documents with missing fields, since they are nonetheless analyzed. If you use datafeeds and have aggregations in your search query, the processed record count is the number of aggregation results processed, not the number of Elasticsearch documents.data.sparse_buckets
(ordsb
,dataSparseBuckets
): The number of buckets that contained few data points compared to the expected number of data points.forecasts.memory.avg
(orfmavg
,forecastsMemoryAvg
): The average memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.max
(orfmmax
,forecastsMemoryMax
): The maximum memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.min
(orfmmin
,forecastsMemoryMin
): The minimum memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.total
(orfmt
,forecastsMemoryTotal
): The total memory usage in bytes for forecasts related to the anomaly detection job.forecasts.records.avg
(orfravg
,forecastsRecordsAvg
): The average number ofm
odel_forecast` documents written for forecasts related to the anomaly detection job.forecasts.records.max
(orfrmax
,forecastsRecordsMax
): The maximum number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.records.min
(orfrmin
,forecastsRecordsMin
): The minimum number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.records.total
(orfrt
,forecastsRecordsTotal
): The total number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.time.avg
(orftavg
,forecastsTimeAvg
): The average runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.max
(orftmax
,forecastsTimeMax
): The maximum runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.min
(orftmin
,forecastsTimeMin
): The minimum runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.total
(orftt
,forecastsTimeTotal
): The total runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.total
(orft
,forecastsTotal
): The number of individual forecasts currently available for the job.id
: Identifier for the anomaly detection job.model.bucket_allocation_failures
(ormbaf
,modelBucketAllocationFailures
): The number of buckets for which new entities in incoming data were not processed due to insufficient model memory.model.by_fields
(ormbf
,modelByFields
): The number of by field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.bytes
(ormb
,modelBytes
): The number of bytes of memory used by the models. This is the maximum value since the last time the model was persisted. If the job is closed, this value indicates the latest size.model.bytes_exceeded
(ormbe
,modelBytesExceeded
): The number of bytes over the high limit for memory usage at the last allocation failure.model.categorization_status
(ormcs
,modelCategorizationStatus
): The status of categorization for the job:ok
orwarn
. Ifok
, categorization is performing acceptably well (or not being used at all). Ifwarn
, categorization is detecting a distribution of categories that suggests the input data is inappropriate for categorization. Problems could be that there is only one category, more than 90% of categories are rare, the number of categories is greater than 50% of the number of categorized documents, there are no frequently matched categories, or more than 50% of categories are dead.model.categorized_doc_count
(ormcdc
,modelCategorizedDocCount
): The number of documents that have had a field categorized.model.dead_category_count
(ormdcc
,modelDeadCategoryCount
): The number of categories created by categorization that will never be assigned again because another category’s definition makes it a superset of the dead category. Dead categories are a side effect of the way categorization has no prior training.model.failed_category_count
(ormdcc
,modelFailedCategoryCount
): The number of times that categorization wanted to create a new category but couldn’t because the job had hit its model memory limit. This count does not track which specific categories failed to be created. Therefore, you cannot use this value to determine the number of unique categories that were missed.model.frequent_category_count
(ormfcc
,modelFrequentCategoryCount
): The number of categories that match more than 1% of categorized documents.model.log_time
(ormlt
,modelLogTime
): The timestamp when the model stats were gathered, according to server time.model.memory_limit
(ormml
,modelMemoryLimit
): The timestamp when the model stats were gathered, according to server time.model.memory_status
(ormms
,modelMemoryStatus
): The status of the mathematical models:ok
,soft_limit
, orhard_limit
. Ifok
, the models stayed below the configured value. Ifsoft_limit
, the models used more than 60% of the configured memory limit and older unused models will be pruned to free up space. Additionally, in categorization jobs no further category examples will be stored. Ifhard_limit
, the models used more space than the configured memory limit. As a result, not all incoming data was processed.model.over_fields
(ormof
,modelOverFields
): The number of over field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.partition_fields
(ormpf
,modelPartitionFields
): The number of partition field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.rare_category_count
(ormrcc
,modelRareCategoryCount
): The number of categories that match just one categorized document.model.timestamp
(ormt
,modelTimestamp
): The timestamp of the last record when the model stats were gathered.model.total_category_count
(ormtcc
,modelTotalCategoryCount
): The number of categories created by categorization.node.address
(orna
,nodeAddress
): The network address of the node that runs the job. This information is available only for open jobs.node.ephemeral_id
(orne
,nodeEphemeralId
): The ephemeral ID of the node that runs the job. This information is available only for open jobs.node.id
(orni
,nodeId
): The unique identifier of the node that runs the job. This information is available only for open jobs.node.name
(ornn
,nodeName
): The name of the node that runs the job. This information is available only for open jobs.opened_time
(orot
): For open jobs only, the elapsed time for which the job has been open.state
(ors
): The status of the anomaly detection job:closed
,closing
,failed
,opened
, oropening
. Ifclosed
, the job finished successfully with its model state persisted. The job must be opened before it can accept further data. Ifclosing
, the job close action is in progress and has not yet completed. A closing job cannot accept further data. Iffailed
, the job did not finish successfully due to an error. This situation can occur due to invalid input data, a fatal error occurring during the analysis, or an external interaction such as the process being killed by the Linux out of memory (OOM) killer. If the job had irrevocably failed, it must be force closed and then deleted. If the datafeed can be corrected, the job can be closed and then re-opened. Ifopened
, the job is available to receive and process data. Ifopening
, the job open action is in progress and has not yet completed.
-
time
string The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
.
curl \
--request GET 'http://api.example.com/_cat/ml/anomaly_detectors' \
--header "Authorization: $API_KEY"
[
{
"id": "high_sum_total_sales",
"s": "closed",
"dpr": "14022",
"mb": "1.5mb"
},
{
"id": "low_request_rate",
"s": "closed",
"dpr": "1216",
"mb": "40.5kb"
},
{
"id": "response_code_rates",
"s": "closed",
"dpr": "28146",
"mb": "132.7kb"
},
{
"id": "url_scanning",
"s": "closed",
"dpr": "28146",
"mb": "501.6kb"
}
]
Get anomaly detection jobs
Added in 7.7.0
Get configuration and usage information for anomaly detection jobs.
This API returns a maximum of 10,000 jobs.
If the Elasticsearch security features are enabled, you must have monitor_ml
,
monitor
, manage_ml
, or manage
cluster privileges to use this API.
IMPORTANT: CAT APIs are only intended for human consumption using the Kibana console or command line. They are not intended for use by applications. For application consumption, use the get anomaly detection job statistics API.
Path parameters
-
job_id
string Required Identifier for the anomaly detection job.
Query parameters
-
allow_no_match
boolean Specifies what to do when the request:
- Contains wildcard expressions and there are no jobs that match.
- Contains the
_all
string or no identifiers and there are no matches. - Contains wildcard expressions and there are only partial matches.
If
true
, the API returns an empty jobs array when there are no matches and the subset of results when there are partial matches. Iffalse
, the API returns a 404 status code when there are no matches or only partial matches. -
bytes
string The unit used to display byte values.
Values are
b
,kb
,mb
,gb
,tb
, orpb
. -
h
string | array[string] Comma-separated list of column names to display.
Supported values include:
assignment_explanation
(orae
): For open anomaly detection jobs only, contains messages relating to the selection of a node to run the job.buckets.count
(orbc
,bucketsCount
): The number of bucket results produced by the job.buckets.time.exp_avg
(orbtea
,bucketsTimeExpAvg
): Exponential moving average of all bucket processing times, in milliseconds.buckets.time.exp_avg_hour
(orbteah
,bucketsTimeExpAvgHour
): Exponentially-weighted moving average of bucket processing times calculated in a 1 hour time window, in milliseconds.buckets.time.max
(orbtmax
,bucketsTimeMax
): Maximum among all bucket processing times, in milliseconds.buckets.time.min
(orbtmin
,bucketsTimeMin
): Minimum among all bucket processing times, in milliseconds.buckets.time.total
(orbtt
,bucketsTimeTotal
): Sum of all bucket processing times, in milliseconds.data.buckets
(ordb
,dataBuckets
): The number of buckets processed.data.earliest_record
(order
,dataEarliestRecord
): The timestamp of the earliest chronologically input document.data.empty_buckets
(ordeb
,dataEmptyBuckets
): The number of buckets which did not contain any data.data.input_bytes
(ordib
,dataInputBytes
): The number of bytes of input data posted to the anomaly detection job.data.input_fields
(ordif
,dataInputFields
): The total number of fields in input documents posted to the anomaly detection job. This count includes fields that are not used in the analysis. However, be aware that if you are using a datafeed, it extracts only the required fields from the documents it retrieves before posting them to the job.data.input_records
(ordir
,dataInputRecords
): The number of input documents posted to the anomaly detection job.data.invalid_dates
(ordid
,dataInvalidDates
): The number of input documents with either a missing date field or a date that could not be parsed.data.last
(ordl
,dataLast
): The timestamp at which data was last analyzed, according to server time.data.last_empty_bucket
(ordleb
,dataLastEmptyBucket
): The timestamp of the last bucket that did not contain any data.data.last_sparse_bucket
(ordlsb
,dataLastSparseBucket
): The timestamp of the last bucket that was considered sparse.data.latest_record
(ordlr
,dataLatestRecord
): The timestamp of the latest chronologically input document.data.missing_fields
(ordmf
,dataMissingFields
): The number of input documents that are missing a field that the anomaly detection job is configured to analyze. Input documents with missing fields are still processed because it is possible that not all fields are missing.data.out_of_order_timestamps
(ordoot
,dataOutOfOrderTimestamps
): The number of input documents that have a timestamp chronologically preceding the start of the current anomaly detection bucket offset by the latency window. This information is applicable only when you provide data to the anomaly detection job by using the post data API. These out of order documents are discarded, since jobs require time series data to be in ascending chronological order.data.processed_fields
(ordpf
,dataProcessedFields
): The total number of fields in all the documents that have been processed by the anomaly detection job. Only fields that are specified in the detector configuration object contribute to this count. The timestamp is not included in this count.data.processed_records
(ordpr
,dataProcessedRecords
): The number of input documents that have been processed by the anomaly detection job. This value includes documents with missing fields, since they are nonetheless analyzed. If you use datafeeds and have aggregations in your search query, the processed record count is the number of aggregation results processed, not the number of Elasticsearch documents.data.sparse_buckets
(ordsb
,dataSparseBuckets
): The number of buckets that contained few data points compared to the expected number of data points.forecasts.memory.avg
(orfmavg
,forecastsMemoryAvg
): The average memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.max
(orfmmax
,forecastsMemoryMax
): The maximum memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.min
(orfmmin
,forecastsMemoryMin
): The minimum memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.total
(orfmt
,forecastsMemoryTotal
): The total memory usage in bytes for forecasts related to the anomaly detection job.forecasts.records.avg
(orfravg
,forecastsRecordsAvg
): The average number ofm
odel_forecast` documents written for forecasts related to the anomaly detection job.forecasts.records.max
(orfrmax
,forecastsRecordsMax
): The maximum number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.records.min
(orfrmin
,forecastsRecordsMin
): The minimum number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.records.total
(orfrt
,forecastsRecordsTotal
): The total number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.time.avg
(orftavg
,forecastsTimeAvg
): The average runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.max
(orftmax
,forecastsTimeMax
): The maximum runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.min
(orftmin
,forecastsTimeMin
): The minimum runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.total
(orftt
,forecastsTimeTotal
): The total runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.total
(orft
,forecastsTotal
): The number of individual forecasts currently available for the job.id
: Identifier for the anomaly detection job.model.bucket_allocation_failures
(ormbaf
,modelBucketAllocationFailures
): The number of buckets for which new entities in incoming data were not processed due to insufficient model memory.model.by_fields
(ormbf
,modelByFields
): The number of by field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.bytes
(ormb
,modelBytes
): The number of bytes of memory used by the models. This is the maximum value since the last time the model was persisted. If the job is closed, this value indicates the latest size.model.bytes_exceeded
(ormbe
,modelBytesExceeded
): The number of bytes over the high limit for memory usage at the last allocation failure.model.categorization_status
(ormcs
,modelCategorizationStatus
): The status of categorization for the job:ok
orwarn
. Ifok
, categorization is performing acceptably well (or not being used at all). Ifwarn
, categorization is detecting a distribution of categories that suggests the input data is inappropriate for categorization. Problems could be that there is only one category, more than 90% of categories are rare, the number of categories is greater than 50% of the number of categorized documents, there are no frequently matched categories, or more than 50% of categories are dead.model.categorized_doc_count
(ormcdc
,modelCategorizedDocCount
): The number of documents that have had a field categorized.model.dead_category_count
(ormdcc
,modelDeadCategoryCount
): The number of categories created by categorization that will never be assigned again because another category’s definition makes it a superset of the dead category. Dead categories are a side effect of the way categorization has no prior training.model.failed_category_count
(ormdcc
,modelFailedCategoryCount
): The number of times that categorization wanted to create a new category but couldn’t because the job had hit its model memory limit. This count does not track which specific categories failed to be created. Therefore, you cannot use this value to determine the number of unique categories that were missed.model.frequent_category_count
(ormfcc
,modelFrequentCategoryCount
): The number of categories that match more than 1% of categorized documents.model.log_time
(ormlt
,modelLogTime
): The timestamp when the model stats were gathered, according to server time.model.memory_limit
(ormml
,modelMemoryLimit
): The timestamp when the model stats were gathered, according to server time.model.memory_status
(ormms
,modelMemoryStatus
): The status of the mathematical models:ok
,soft_limit
, orhard_limit
. Ifok
, the models stayed below the configured value. Ifsoft_limit
, the models used more than 60% of the configured memory limit and older unused models will be pruned to free up space. Additionally, in categorization jobs no further category examples will be stored. Ifhard_limit
, the models used more space than the configured memory limit. As a result, not all incoming data was processed.model.over_fields
(ormof
,modelOverFields
): The number of over field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.partition_fields
(ormpf
,modelPartitionFields
): The number of partition field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.rare_category_count
(ormrcc
,modelRareCategoryCount
): The number of categories that match just one categorized document.model.timestamp
(ormt
,modelTimestamp
): The timestamp of the last record when the model stats were gathered.model.total_category_count
(ormtcc
,modelTotalCategoryCount
): The number of categories created by categorization.node.address
(orna
,nodeAddress
): The network address of the node that runs the job. This information is available only for open jobs.node.ephemeral_id
(orne
,nodeEphemeralId
): The ephemeral ID of the node that runs the job. This information is available only for open jobs.node.id
(orni
,nodeId
): The unique identifier of the node that runs the job. This information is available only for open jobs.node.name
(ornn
,nodeName
): The name of the node that runs the job. This information is available only for open jobs.opened_time
(orot
): For open jobs only, the elapsed time for which the job has been open.state
(ors
): The status of the anomaly detection job:closed
,closing
,failed
,opened
, oropening
. Ifclosed
, the job finished successfully with its model state persisted. The job must be opened before it can accept further data. Ifclosing
, the job close action is in progress and has not yet completed. A closing job cannot accept further data. Iffailed
, the job did not finish successfully due to an error. This situation can occur due to invalid input data, a fatal error occurring during the analysis, or an external interaction such as the process being killed by the Linux out of memory (OOM) killer. If the job had irrevocably failed, it must be force closed and then deleted. If the datafeed can be corrected, the job can be closed and then re-opened. Ifopened
, the job is available to receive and process data. Ifopening
, the job open action is in progress and has not yet completed.
-
s
string | array[string] Comma-separated list of column names or column aliases used to sort the response.
Supported values include:
assignment_explanation
(orae
): For open anomaly detection jobs only, contains messages relating to the selection of a node to run the job.buckets.count
(orbc
,bucketsCount
): The number of bucket results produced by the job.buckets.time.exp_avg
(orbtea
,bucketsTimeExpAvg
): Exponential moving average of all bucket processing times, in milliseconds.buckets.time.exp_avg_hour
(orbteah
,bucketsTimeExpAvgHour
): Exponentially-weighted moving average of bucket processing times calculated in a 1 hour time window, in milliseconds.buckets.time.max
(orbtmax
,bucketsTimeMax
): Maximum among all bucket processing times, in milliseconds.buckets.time.min
(orbtmin
,bucketsTimeMin
): Minimum among all bucket processing times, in milliseconds.buckets.time.total
(orbtt
,bucketsTimeTotal
): Sum of all bucket processing times, in milliseconds.data.buckets
(ordb
,dataBuckets
): The number of buckets processed.data.earliest_record
(order
,dataEarliestRecord
): The timestamp of the earliest chronologically input document.data.empty_buckets
(ordeb
,dataEmptyBuckets
): The number of buckets which did not contain any data.data.input_bytes
(ordib
,dataInputBytes
): The number of bytes of input data posted to the anomaly detection job.data.input_fields
(ordif
,dataInputFields
): The total number of fields in input documents posted to the anomaly detection job. This count includes fields that are not used in the analysis. However, be aware that if you are using a datafeed, it extracts only the required fields from the documents it retrieves before posting them to the job.data.input_records
(ordir
,dataInputRecords
): The number of input documents posted to the anomaly detection job.data.invalid_dates
(ordid
,dataInvalidDates
): The number of input documents with either a missing date field or a date that could not be parsed.data.last
(ordl
,dataLast
): The timestamp at which data was last analyzed, according to server time.data.last_empty_bucket
(ordleb
,dataLastEmptyBucket
): The timestamp of the last bucket that did not contain any data.data.last_sparse_bucket
(ordlsb
,dataLastSparseBucket
): The timestamp of the last bucket that was considered sparse.data.latest_record
(ordlr
,dataLatestRecord
): The timestamp of the latest chronologically input document.data.missing_fields
(ordmf
,dataMissingFields
): The number of input documents that are missing a field that the anomaly detection job is configured to analyze. Input documents with missing fields are still processed because it is possible that not all fields are missing.data.out_of_order_timestamps
(ordoot
,dataOutOfOrderTimestamps
): The number of input documents that have a timestamp chronologically preceding the start of the current anomaly detection bucket offset by the latency window. This information is applicable only when you provide data to the anomaly detection job by using the post data API. These out of order documents are discarded, since jobs require time series data to be in ascending chronological order.data.processed_fields
(ordpf
,dataProcessedFields
): The total number of fields in all the documents that have been processed by the anomaly detection job. Only fields that are specified in the detector configuration object contribute to this count. The timestamp is not included in this count.data.processed_records
(ordpr
,dataProcessedRecords
): The number of input documents that have been processed by the anomaly detection job. This value includes documents with missing fields, since they are nonetheless analyzed. If you use datafeeds and have aggregations in your search query, the processed record count is the number of aggregation results processed, not the number of Elasticsearch documents.data.sparse_buckets
(ordsb
,dataSparseBuckets
): The number of buckets that contained few data points compared to the expected number of data points.forecasts.memory.avg
(orfmavg
,forecastsMemoryAvg
): The average memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.max
(orfmmax
,forecastsMemoryMax
): The maximum memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.min
(orfmmin
,forecastsMemoryMin
): The minimum memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.total
(orfmt
,forecastsMemoryTotal
): The total memory usage in bytes for forecasts related to the anomaly detection job.forecasts.records.avg
(orfravg
,forecastsRecordsAvg
): The average number ofm
odel_forecast` documents written for forecasts related to the anomaly detection job.forecasts.records.max
(orfrmax
,forecastsRecordsMax
): The maximum number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.records.min
(orfrmin
,forecastsRecordsMin
): The minimum number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.records.total
(orfrt
,forecastsRecordsTotal
): The total number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.time.avg
(orftavg
,forecastsTimeAvg
): The average runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.max
(orftmax
,forecastsTimeMax
): The maximum runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.min
(orftmin
,forecastsTimeMin
): The minimum runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.total
(orftt
,forecastsTimeTotal
): The total runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.total
(orft
,forecastsTotal
): The number of individual forecasts currently available for the job.id
: Identifier for the anomaly detection job.model.bucket_allocation_failures
(ormbaf
,modelBucketAllocationFailures
): The number of buckets for which new entities in incoming data were not processed due to insufficient model memory.model.by_fields
(ormbf
,modelByFields
): The number of by field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.bytes
(ormb
,modelBytes
): The number of bytes of memory used by the models. This is the maximum value since the last time the model was persisted. If the job is closed, this value indicates the latest size.model.bytes_exceeded
(ormbe
,modelBytesExceeded
): The number of bytes over the high limit for memory usage at the last allocation failure.model.categorization_status
(ormcs
,modelCategorizationStatus
): The status of categorization for the job:ok
orwarn
. Ifok
, categorization is performing acceptably well (or not being used at all). Ifwarn
, categorization is detecting a distribution of categories that suggests the input data is inappropriate for categorization. Problems could be that there is only one category, more than 90% of categories are rare, the number of categories is greater than 50% of the number of categorized documents, there are no frequently matched categories, or more than 50% of categories are dead.model.categorized_doc_count
(ormcdc
,modelCategorizedDocCount
): The number of documents that have had a field categorized.model.dead_category_count
(ormdcc
,modelDeadCategoryCount
): The number of categories created by categorization that will never be assigned again because another category’s definition makes it a superset of the dead category. Dead categories are a side effect of the way categorization has no prior training.model.failed_category_count
(ormdcc
,modelFailedCategoryCount
): The number of times that categorization wanted to create a new category but couldn’t because the job had hit its model memory limit. This count does not track which specific categories failed to be created. Therefore, you cannot use this value to determine the number of unique categories that were missed.model.frequent_category_count
(ormfcc
,modelFrequentCategoryCount
): The number of categories that match more than 1% of categorized documents.model.log_time
(ormlt
,modelLogTime
): The timestamp when the model stats were gathered, according to server time.model.memory_limit
(ormml
,modelMemoryLimit
): The timestamp when the model stats were gathered, according to server time.model.memory_status
(ormms
,modelMemoryStatus
): The status of the mathematical models:ok
,soft_limit
, orhard_limit
. Ifok
, the models stayed below the configured value. Ifsoft_limit
, the models used more than 60% of the configured memory limit and older unused models will be pruned to free up space. Additionally, in categorization jobs no further category examples will be stored. Ifhard_limit
, the models used more space than the configured memory limit. As a result, not all incoming data was processed.model.over_fields
(ormof
,modelOverFields
): The number of over field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.partition_fields
(ormpf
,modelPartitionFields
): The number of partition field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.rare_category_count
(ormrcc
,modelRareCategoryCount
): The number of categories that match just one categorized document.model.timestamp
(ormt
,modelTimestamp
): The timestamp of the last record when the model stats were gathered.model.total_category_count
(ormtcc
,modelTotalCategoryCount
): The number of categories created by categorization.node.address
(orna
,nodeAddress
): The network address of the node that runs the job. This information is available only for open jobs.node.ephemeral_id
(orne
,nodeEphemeralId
): The ephemeral ID of the node that runs the job. This information is available only for open jobs.node.id
(orni
,nodeId
): The unique identifier of the node that runs the job. This information is available only for open jobs.node.name
(ornn
,nodeName
): The name of the node that runs the job. This information is available only for open jobs.opened_time
(orot
): For open jobs only, the elapsed time for which the job has been open.state
(ors
): The status of the anomaly detection job:closed
,closing
,failed
,opened
, oropening
. Ifclosed
, the job finished successfully with its model state persisted. The job must be opened before it can accept further data. Ifclosing
, the job close action is in progress and has not yet completed. A closing job cannot accept further data. Iffailed
, the job did not finish successfully due to an error. This situation can occur due to invalid input data, a fatal error occurring during the analysis, or an external interaction such as the process being killed by the Linux out of memory (OOM) killer. If the job had irrevocably failed, it must be force closed and then deleted. If the datafeed can be corrected, the job can be closed and then re-opened. Ifopened
, the job is available to receive and process data. Ifopening
, the job open action is in progress and has not yet completed.
-
time
string The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
.
curl \
--request GET 'http://api.example.com/_cat/ml/anomaly_detectors/{job_id}' \
--header "Authorization: $API_KEY"
[
{
"id": "high_sum_total_sales",
"s": "closed",
"dpr": "14022",
"mb": "1.5mb"
},
{
"id": "low_request_rate",
"s": "closed",
"dpr": "1216",
"mb": "40.5kb"
},
{
"id": "response_code_rates",
"s": "closed",
"dpr": "28146",
"mb": "132.7kb"
},
{
"id": "url_scanning",
"s": "closed",
"dpr": "28146",
"mb": "501.6kb"
}
]
Get trained models
Added in 7.7.0
Get configuration and usage information about inference trained models.
IMPORTANT: CAT APIs are only intended for human consumption using the Kibana console or command line. They are not intended for use by applications. For application consumption, use the get trained models statistics API.
Query parameters
-
allow_no_match
boolean Specifies what to do when the request: contains wildcard expressions and there are no models that match; contains the
_all
string or no identifiers and there are no matches; contains wildcard expressions and there are only partial matches. Iftrue
, the API returns an empty array when there are no matches and the subset of results when there are partial matches. Iffalse
, the API returns a 404 status code when there are no matches or only partial matches. -
bytes
string The unit used to display byte values.
Values are
b
,kb
,mb
,gb
,tb
, orpb
. -
h
string | array[string] A comma-separated list of column names to display.
Supported values include:
create_time
(orct
): The time when the trained model was created.created_by
(orc
,createdBy
): Information on the creator of the trained model.data_frame_analytics_id
(ordf
,dataFrameAnalytics
,dfid
): Identifier for the data frame analytics job that created the model. Only displayed if it is still available.description
(ord
): The description of the trained model.heap_size
(orhs
,modelHeapSize
): The estimated heap size to keep the trained model in memory.id
: Identifier for the trained model.ingest.count
(oric
,ingestCount
): The total number of documents that are processed by the model.ingest.current
(oricurr
,ingestCurrent
): The total number of document that are currently being handled by the trained model.ingest.failed
(orif
,ingestFailed
): The total number of failed ingest attempts with the trained model.ingest.pipelines
(orip
,ingestPipelines
): The total number of ingest pipelines that are referencing the trained model.ingest.time
(orit
,ingestTime
): The total time that is spent processing documents with the trained model.license
(orl
): The license level of the trained model.operations
(oro
,modelOperations
): The estimated number of operations to use the trained model. This number helps measuring the computational complexity of the model.version
(orv
): The Elasticsearch version number in which the trained model was created.
-
s
string | array[string] A comma-separated list of column names or aliases used to sort the response.
Supported values include:
create_time
(orct
): The time when the trained model was created.created_by
(orc
,createdBy
): Information on the creator of the trained model.data_frame_analytics_id
(ordf
,dataFrameAnalytics
,dfid
): Identifier for the data frame analytics job that created the model. Only displayed if it is still available.description
(ord
): The description of the trained model.heap_size
(orhs
,modelHeapSize
): The estimated heap size to keep the trained model in memory.id
: Identifier for the trained model.ingest.count
(oric
,ingestCount
): The total number of documents that are processed by the model.ingest.current
(oricurr
,ingestCurrent
): The total number of document that are currently being handled by the trained model.ingest.failed
(orif
,ingestFailed
): The total number of failed ingest attempts with the trained model.ingest.pipelines
(orip
,ingestPipelines
): The total number of ingest pipelines that are referencing the trained model.ingest.time
(orit
,ingestTime
): The total time that is spent processing documents with the trained model.license
(orl
): The license level of the trained model.operations
(oro
,modelOperations
): The estimated number of operations to use the trained model. This number helps measuring the computational complexity of the model.version
(orv
): The Elasticsearch version number in which the trained model was created.
-
from
number Skips the specified number of transforms.
-
size
number The maximum number of transforms to display.
-
time
string Unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
.
curl \
--request GET 'http://api.example.com/_cat/ml/trained_models' \
--header "Authorization: $API_KEY"
[
{
"id": "ddddd-1580216177138",
"heap_size": "0b",
"operations": "196",
"create_time": "2025-03-25T00:01:38.662Z",
"type": "pytorch",
"ingest.pipelines": "0",
"data_frame.id": "__none__"
},
{
"id": "lang_ident_model_1",
"heap_size": "1mb",
"operations": "39629",
"create_time": "2019-12-05T12:28:34.594Z",
"type": "lang_ident",
"ingest.pipelines": "0",
"data_frame.id": "__none__"
}
]
Path parameters
-
target
string | array[string] Required Limits the information returned to the specific target. Supports a comma-separated list, such as http,ingest.
Supported values include:
_all
,http
,ingest
,thread_pool
,script
curl \
--request GET 'http://api.example.com/_info/{target}' \
--header "Authorization: $API_KEY"
curl \
--request HEAD 'http://api.example.com/' \
--header "Authorization: $API_KEY"
Connector
The connector and sync jobs APIs provide a convenient way to create and manage Elastic connectors and sync jobs in an internal index.
Connectors are Elasticsearch integrations for syncing content from third-party data sources, which can be deployed on Elastic Cloud or hosted on your own infrastructure.
This API provides an alternative to relying solely on Kibana UI for connector and sync job management. The API comes with a set of validations and assertions to ensure that the state representation in the internal index remains valid.
This API requires the manage_connector
privilege or, for read-only endpoints, the monitor_connector
privilege.
Check in a connector
Technical preview
Update the last_seen
field in the connector and set it to the current timestamp.
Path parameters
-
connector_id
string Required The unique identifier of the connector to be checked in
curl \
--request PUT 'http://api.example.com/_connector/{connector_id}/_check_in' \
--header "Authorization: $API_KEY"
{
"result": "updated"
}
Path parameters
-
connector_id
string Required The unique identifier of the connector
Query parameters
-
include_deleted
boolean A flag to indicate if the desired connector should be fetched, even if it was soft-deleted.
curl \
--request GET 'http://api.example.com/_connector/{connector_id}' \
--header "Authorization: $API_KEY"
Path parameters
-
connector_id
string Required The unique identifier of the connector to be created or updated. ID is auto-generated if not provided.
Body
-
description
string -
index_name
string -
is_native
boolean -
language
string -
name
string -
service_type
string
curl \
--request PUT 'http://api.example.com/_connector/{connector_id}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"index_name\": \"search-google-drive\",\n \"name\": \"My Connector\",\n \"service_type\": \"google_drive\"\n}"'
{
"index_name": "search-google-drive",
"name": "My Connector",
"service_type": "google_drive"
}
{
"index_name": "search-google-drive",
"name": "My Connector",
"description": "My Connector to sync data to Elastic index from Google Drive",
"service_type": "google_drive",
"language": "english"
}
{
"result": "created",
"id": "my-connector"
}
Query parameters
-
from
number Starting offset (default: 0)
-
size
number Specifies a max number of results to get
-
index_name
string | array[string] A comma-separated list of connector index names to fetch connector documents for
-
connector_name
string | array[string] A comma-separated list of connector names to fetch connector documents for
-
service_type
string | array[string] A comma-separated list of connector service types to fetch connector documents for
-
include_deleted
boolean A flag to indicate if the desired connector should be fetched, even if it was soft-deleted.
-
query
string A wildcard query string that filters connectors with matching name, description or index name
curl \
--request GET 'http://api.example.com/_connector' \
--header "Authorization: $API_KEY"
Get data streams
Added in 7.9.0
Get information about one or more data streams.
Path parameters
-
name
string | array[string] Required Comma-separated list of data stream names used to limit the request. Wildcard (
*
) expressions are supported. If omitted, all data streams are returned.
Query parameters
-
expand_wildcards
string | array[string] Type of data stream that wildcard patterns can match. Supports comma-separated values, such as
open,hidden
.Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
-
include_defaults
boolean If true, returns all relevant default configurations for the index template.
-
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
-
verbose
boolean Whether the maximum timestamp for each data stream should be calculated and returned.
curl \
--request GET 'http://api.example.com/_data_stream/{name}' \
--header "Authorization: $API_KEY"
{
"data_streams": [
{
"name": "my-data-stream",
"timestamp_field": {
"name": "@timestamp"
},
"indices": [
{
"index_name": ".ds-my-data-stream-2099.03.07-000001",
"index_uuid": "xCEhwsp8Tey0-FLNFYVwSg",
"prefer_ilm": true,
"ilm_policy": "my-lifecycle-policy",
"managed_by": "Index Lifecycle Management"
},
{
"index_name": ".ds-my-data-stream-2099.03.08-000002",
"index_uuid": "PA_JquKGSiKcAKBA8DJ5gw",
"prefer_ilm": true,
"ilm_policy": "my-lifecycle-policy",
"managed_by": "Index Lifecycle Management"
}
],
"generation": 2,
"_meta": {
"my-meta-field": "foo"
},
"status": "GREEN",
"next_generation_managed_by": "Index Lifecycle Management",
"prefer_ilm": true,
"template": "my-index-template",
"ilm_policy": "my-lifecycle-policy",
"hidden": false,
"system": false,
"allow_custom_routing": false,
"replicated": false,
"rollover_on_write": false
},
{
"name": "my-data-stream-two",
"timestamp_field": {
"name": "@timestamp"
},
"indices": [
{
"index_name": ".ds-my-data-stream-two-2099.03.08-000001",
"index_uuid": "3liBu2SYS5axasRt6fUIpA",
"prefer_ilm": true,
"ilm_policy": "my-lifecycle-policy",
"managed_by": "Index Lifecycle Management"
}
],
"generation": 1,
"_meta": {
"my-meta-field": "foo"
},
"status": "YELLOW",
"next_generation_managed_by": "Index Lifecycle Management",
"prefer_ilm": true,
"template": "my-index-template",
"ilm_policy": "my-lifecycle-policy",
"hidden": false,
"system": false,
"allow_custom_routing": false,
"replicated": false,
"rollover_on_write": false
}
]
}
Bulk index or delete documents
Perform multiple index
, create
, delete
, and update
actions in a single request.
This reduces overhead and can greatly increase indexing speed.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To use the
create
action, you must have thecreate_doc
,create
,index
, orwrite
index privilege. Data streams support only thecreate
action. - To use the
index
action, you must have thecreate
,index
, orwrite
index privilege. - To use the
delete
action, you must have thedelete
orwrite
index privilege. - To use the
update
action, you must have theindex
orwrite
index privilege. - To automatically create a data stream or index with a bulk API request, you must have the
auto_configure
,create_index
, ormanage
index privilege. - To make the result of a bulk operation visible to search using the
refresh
parameter, you must have themaintenance
ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n
The index
and create
actions expect a source on the next line and have the same semantics as the op_type
parameter in the standard index API.
A create
action fails if a document with the same ID already exists in the target
An index
action adds or replaces a document as necessary.
NOTE: Data streams support only the create
action.
To update or delete a document in a data stream, you must target the backing index containing the document.
An update
action expects that the partial doc, upsert, and script and its options are specified on the next line.
A delete
action does not expect a source on the next line and has the same semantics as the standard delete API.
NOTE: The final line of data must end with a newline character (\n
).
Each newline character may be preceded by a carriage return (\r
).
When sending NDJSON data to the _bulk
endpoint, use a Content-Type
header of application/json
or application/x-ndjson
.
Because this format uses literal newline characters (\n
) as delimiters, make sure that the JSON actions and sources are not pretty printed.
If you provide a target in the request path, it is used for any actions that don't explicitly specify an _index
argument.
A note on the format: the idea here is to make processing as fast as possible.
As some of the actions are redirected to other shards on other nodes, only action_meta_data
is parsed on the receiving node side.
Client libraries using this protocol should try and strive to do something similar on the client side, and reduce buffering as much as possible.
There is no "correct" number of actions to perform in a single bulk request. Experiment with different settings to find the optimal size for your particular workload. Note that Elasticsearch limits the maximum size of a HTTP request to 100mb by default so clients must ensure that no request exceeds this size. It is not possible to index a single document that exceeds the size limit, so you must pre-process any such documents into smaller pieces before sending them to Elasticsearch. For instance, split documents into pages or chapters before indexing them, or store raw binary data in a system outside Elasticsearch and replace the raw data with a link to the external system in the documents that you send to Elasticsearch.
Client suppport for bulk requests
Some of the officially supported clients provide helpers to assist with bulk requests and reindexing:
- Go: Check out
esutil.BulkIndexer
- Perl: Check out
Search::Elasticsearch::Client::5_0::Bulk
andSearch::Elasticsearch::Client::5_0::Scroll
- Python: Check out
elasticsearch.helpers.*
- JavaScript: Check out
client.helpers.*
- .NET: Check out
BulkAllObservable
- PHP: Check out bulk indexing.
Submitting bulk requests with cURL
If you're providing text file input to curl
, you must use the --data-binary
flag instead of plain -d
.
The latter doesn't preserve newlines. For example:
$ cat requests
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
$ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo
{"took":7, "errors": false, "items":[{"index":{"_index":"test","_id":"1","_version":1,"result":"created","forced_refresh":false}}]}
Optimistic concurrency control
Each index
and delete
action within a bulk API call may include the if_seq_no
and if_primary_term
parameters in their respective action and meta data lines.
The if_seq_no
and if_primary_term
parameters control how operations are run, based on the last modification to existing documents. See Optimistic concurrency control for more details.
Versioning
Each bulk item can include the version value using the version
field.
It automatically follows the behavior of the index or delete operation based on the _version
mapping.
It also support the version_type
.
Routing
Each bulk item can include the routing value using the routing
field.
It automatically follows the behavior of the index or delete operation based on the _routing
mapping.
NOTE: Data streams do not support custom routing unless they were created with the allow_custom_routing
setting enabled in the template.
Wait for active shards
When making bulk calls, you can set the wait_for_active_shards
parameter to require a minimum number of shard copies to be active before starting to process the bulk request.
Refresh
Control when the changes made by this request are visible to search.
NOTE: Only the shards that receive the bulk request will be affected by refresh.
Imagine a _bulk?refresh=wait_for
request with three documents in it that happen to be routed to different shards in an index with five shards.
The request will only wait for those three shards to refresh.
The other two shards that make up the index do not participate in the _bulk
request at all.
Path parameters
-
index
string Required The name of the data stream, index, or index alias to perform bulk actions on.
Query parameters
-
include_source_on_error
boolean True or false if to include the document source in the error message in case of parsing errors.
-
list_executed_pipelines
boolean If
true
, the response will include the ingest pipelines that were run for each index or create. -
pipeline
string The pipeline identifier to use to preprocess incoming documents. If the index has a default ingest pipeline specified, setting the value to
_none
turns off the default ingest pipeline for this request. If a final pipeline is configured, it will always run regardless of the value of this parameter. -
refresh
string If
true
, Elasticsearch refreshes the affected shards to make this operation visible to search. Ifwait_for
, wait for a refresh to make this operation visible to search. Iffalse
, do nothing with refreshes. Valid values:true
,false
,wait_for
.Values are
true
,false
, orwait_for
. -
routing
string A custom value that is used to route operations to a specific shard.
-
_source
boolean | string | array[string] Indicates whether to return the
_source
field (true
orfalse
) or contains a list of fields to return. -
_source_excludes
string | array[string] A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in
_source_includes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
_source_includes
string | array[string] A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the
_source_excludes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
timeout
string The period each action waits for the following operations: automatic index creation, dynamic mapping updates, and waiting for active shards. The default is
1m
(one minute), which guarantees Elasticsearch waits for at least the timeout before failing. The actual wait time could be longer, particularly when multiple waits occur. -
wait_for_active_shards
number | string The number of shard copies that must be active before proceeding with the operation. Set to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). The default is1
, which waits for each primary shard to be active. -
require_alias
boolean If
true
, the request's actions must target an index alias. -
require_data_stream
boolean If
true
, the request's actions must target a data stream (existing or to be created).
curl \
--request PUT 'http://api.example.com/{index}/_bulk' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{ \"index\" : { \"_index\" : \"test\", \"_id\" : \"1\" } }\n{ \"field1\" : \"value1\" }\n{ \"delete\" : { \"_index\" : \"test\", \"_id\" : \"2\" } }\n{ \"create\" : { \"_index\" : \"test\", \"_id\" : \"3\" } }\n{ \"field1\" : \"value3\" }\n{ \"update\" : {\"_id\" : \"1\", \"_index\" : \"test\"} }\n{ \"doc\" : {\"field2\" : \"value2\"} }"'
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
{ "update" : {"_id" : "1", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_index" : "index1", "retry_on_conflict" : 3} }
{ "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
{ "update" : {"_id" : "2", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
{ "update" : {"_id" : "3", "_index" : "index1", "_source" : true} }
{ "doc" : {"field" : "value"} }
{ "update" : {"_id" : "4", "_index" : "index1"} }
{ "doc" : {"field" : "value"}, "_source": true}
{ "update": {"_id": "5", "_index": "index1"} }
{ "doc": {"my_field": "foo"} }
{ "update": {"_id": "6", "_index": "index1"} }
{ "doc": {"my_field": "foo"} }
{ "create": {"_id": "7", "_index": "index1"} }
{ "my_field": "foo" }
{ "index" : { "_index" : "my_index", "_id" : "1", "dynamic_templates": {"work_location": "geo_point"}} }
{ "field" : "value1", "work_location": "41.12,-71.34", "raw_location": "41.12,-71.34"}
{ "create" : { "_index" : "my_index", "_id" : "2", "dynamic_templates": {"home_location": "geo_point"}} }
{ "field" : "value2", "home_location": "41.12,-71.34"}
{
"took": 30,
"errors": false,
"items": [
{
"index": {
"_index": "test",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 201,
"_seq_no" : 0,
"_primary_term": 1
}
},
{
"delete": {
"_index": "test",
"_id": "2",
"_version": 1,
"result": "not_found",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 404,
"_seq_no" : 1,
"_primary_term" : 2
}
},
{
"create": {
"_index": "test",
"_id": "3",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 201,
"_seq_no" : 2,
"_primary_term" : 3
}
},
{
"update": {
"_index": "test",
"_id": "1",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 200,
"_seq_no" : 3,
"_primary_term" : 4
}
}
]
}
{
"took": 486,
"errors": true,
"items": [
{
"update": {
"_index": "index1",
"_id": "5",
"status": 404,
"error": {
"type": "document_missing_exception",
"reason": "[5]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
},
{
"update": {
"_index": "index1",
"_id": "6",
"status": 404,
"error": {
"type": "document_missing_exception",
"reason": "[6]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
},
{
"create": {
"_index": "index1",
"_id": "7",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
}
]
}
{
"items": [
{
"update": {
"error": {
"type": "document_missing_exception",
"reason": "[5]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
},
{
"update": {
"error": {
"type": "document_missing_exception",
"reason": "[6]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
}
]
}
Create or update a document in an index
Add a JSON document to the specified data stream or index and make it searchable. If the target is an index and the document already exists, the request updates the document and increments its version.
NOTE: You cannot use this API to send update requests for existing documents in a data stream.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To add or overwrite a document using the
PUT /<target>/_doc/<_id>
request format, you must have thecreate
,index
, orwrite
index privilege. - To add a document using the
POST /<target>/_doc/
request format, you must have thecreate_doc
,create
,index
, orwrite
index privilege. - To automatically create a data stream or index with this API request, you must have the
auto_configure
,create_index
, ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
NOTE: Replica shards might not all be started when an indexing operation returns successfully.
By default, only the primary is required. Set wait_for_active_shards
to change this default behavior.
Automatically create data streams and indices
If the request's target doesn't exist and matches an index template with a data_stream
definition, the index operation automatically creates the data stream.
If the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.
NOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.
If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed.
Automatic index creation is controlled by the action.auto_create_index
setting.
If it is true
, any index can be created automatically.
You can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to false
to turn off automatic index creation entirely.
Specify a comma-separated list of patterns you want to allow or prefix each pattern with +
or -
to indicate whether it should be allowed or blocked.
When a list is specified, the default behaviour is to disallow.
NOTE: The action.auto_create_index
setting affects the automatic creation of indices only.
It does not affect the creation of data streams.
Optimistic concurrency control
Index operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the if_seq_no
and if_primary_term
parameters.
If a mismatch is detected, the operation will result in a VersionConflictException
and a status code of 409
.
Routing
By default, shard placement — or routing — is controlled by using a hash of the document's ID value.
For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the routing
parameter.
When setting up explicit mapping, you can also use the _routing
field to direct the index operation to extract the routing value from the document itself.
This does come at the (very minimal) cost of an additional document parsing pass.
If the _routing
mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.
NOTE: Data streams do not support custom routing unless they were created with the allow_custom_routing
setting enabled in the template.
Distributed
The index operation is directed to the primary shard based on its route and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.
Active shards
To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.
If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.
By default, write operations only wait for the primary shards to be active before proceeding (that is to say wait_for_active_shards
is 1
).
This default can be overridden in the index settings dynamically by setting index.write.wait_for_active_shards
.
To alter this behavior per operation, use the wait_for_active_shards request
parameter.
Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is number_of_replicas
+1).
Specifying a negative value or a number greater than the number of shard copies will throw an error.
For example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).
If you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.
This means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.
If wait_for_active_shards
is set on the request to 3
(and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.
This requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.
However, if you set wait_for_active_shards
to all
(or to 4
, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.
The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.
It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.
After the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.
The _shards
section of the API response reveals the number of shard copies on which replication succeeded and failed.
No operation (noop) updates
When updating a document by using this API, a new version of the document is always created even if the document hasn't changed.
If this isn't acceptable use the _update
API with detect_noop
set to true
.
The detect_noop
option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.
There isn't a definitive rule for when noop updates aren't acceptable. It's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.
Versioning
Each indexed document is given a version number.
By default, internal versioning is used that starts at 1 and increments with each update, deletes included.
Optionally, the version number can be set to an external value (for example, if maintained in a database).
To enable this functionality, version_type
should be set to external
.
The value provided must be a numeric, long value greater than or equal to 0, and less than around 9.2e+18
.
NOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations. If no version is provided, the operation runs without any version checks.
When using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document. If true, the document will be indexed and the new version number used. If the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:
PUT my-index-000001/_doc/1?version=2&version_type=external
{
"user": {
"id": "elkbee"
}
}
In this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.
If the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).
A nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.
Even the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.
Path parameters
-
index
string Required The name of the data stream or index to target. If the target doesn't exist and matches the name or wildcard (
*
) pattern of an index template with adata_stream
definition, this request creates the data stream. If the target doesn't exist and doesn't match a data stream template, this request creates the index. You can check for existing targets with the resolve index API. -
id
string Required A unique identifier for the document. To automatically generate a document ID, use the
POST /<target>/_doc/
request format and omit this parameter.
Query parameters
-
if_primary_term
number Only perform the operation if the document has this primary term.
-
if_seq_no
number Only perform the operation if the document has this sequence number.
-
include_source_on_error
boolean True or false if to include the document source in the error message in case of parsing errors.
-
op_type
string Set to
create
to only index the document if it does not already exist (put if absent). If a document with the specified_id
already exists, the indexing operation will fail. The behavior is the same as using the<index>/_create
endpoint. If a document ID is specified, this paramater defaults toindex
. Otherwise, it defaults tocreate
. If the request targets a data stream, anop_type
ofcreate
is required.Supported values include:
index
: Overwrite any documents that already exist.create
: Only index documents that do not already exist.
Values are
index
orcreate
. -
pipeline
string The ID of the pipeline to use to preprocess incoming documents. If the index has a default ingest pipeline specified, then setting the value to
_none
disables the default ingest pipeline for this request. If a final pipeline is configured it will always run, regardless of the value of this parameter. -
refresh
string If
true
, Elasticsearch refreshes the affected shards to make this operation visible to search. Ifwait_for
, it waits for a refresh to make this operation visible to search. Iffalse
, it does nothing with refreshes.Values are
true
,false
, orwait_for
. -
routing
string A custom value that is used to route operations to a specific shard.
-
timeout
string The period the request waits for the following operations: automatic index creation, dynamic mapping updates, waiting for active shards.
This parameter is useful for situations where the primary shard assigned to perform the operation might not be available when the operation runs. Some reasons for this might be that the primary shard is currently recovering from a gateway or undergoing relocation. By default, the operation will wait on the primary shard to become available for at least 1 minute before failing and responding with an error. The actual wait time could be longer, particularly when multiple waits occur.
-
version
number An explicit version number for concurrency control. It must be a non-negative long number.
-
version_type
string The version type.
Supported values include:
internal
: Use internal versioning that starts at 1 and increments with each update or delete.external
: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.external_gte
: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: Theexternal_gte
version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.force
: This option is deprecated because it can cause primary and replica shards to diverge.
Values are
internal
,external
,external_gte
, orforce
. -
wait_for_active_shards
number | string The number of shard copies that must be active before proceeding with the operation. You can set it to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). The default value of1
means it waits for each primary shard to be active. -
require_alias
boolean If
true
, the destination must be an index alias.
curl \
--request POST 'http://api.example.com/{index}/_doc/{id}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"@timestamp\": \"2099-11-15T13:12:00\",\n \"message\": \"GET /search HTTP/1.1 200 1070000\",\n \"user\": {\n \"id\": \"kimchy\"\n }\n}"'
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
{
"_shards": {
"total": 2,
"failed": 0,
"successful": 2
},
"_index": "my-index-000001",
"_id": "W0tpsmIBdwcYyG50zbta",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"result": "created"
}
{
"_shards": {
"total": 2,
"failed": 0,
"successful": 2
},
"_index": "my-index-000001",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"result": "created"
}
Get a document's source
Get the source of a document. For example:
GET my-index-000001/_source/1
You can use the source filtering parameters to control which parts of the _source
are returned:
GET my-index-000001/_source/1/?_source_includes=*.id&_source_excludes=entities
Query parameters
-
preference
string The node or shard the operation should be performed on. By default, the operation is randomized between the shard replicas.
-
realtime
boolean If
true
, the request is real-time as opposed to near-real-time. -
refresh
boolean If
true
, the request refreshes the relevant shards before retrieving the document. Setting it totrue
should be done after careful thought and verification that this does not cause a heavy load on the system (and slow down indexing). -
routing
string A custom value used to route operations to a specific shard.
-
_source
boolean | string | array[string] Indicates whether to return the
_source
field (true
orfalse
) or lists the fields to return. -
_source_excludes
string | array[string] A comma-separated list of source fields to exclude in the response.
-
_source_includes
string | array[string] A comma-separated list of source fields to include in the response.
-
stored_fields
string | array[string] A comma-separated list of stored fields to return as part of a hit.
-
version
number The version number for concurrency control. It must match the current version of the document for the request to succeed.
-
version_type
string The version type.
Supported values include:
internal
: Use internal versioning that starts at 1 and increments with each update or delete.external
: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.external_gte
: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: Theexternal_gte
version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.force
: This option is deprecated because it can cause primary and replica shards to diverge.
Values are
internal
,external
,external_gte
, orforce
.
curl \
--request GET 'http://api.example.com/{index}/_source/{id}' \
--header "Authorization: $API_KEY"
Create or update a document in an index
Add a JSON document to the specified data stream or index and make it searchable. If the target is an index and the document already exists, the request updates the document and increments its version.
NOTE: You cannot use this API to send update requests for existing documents in a data stream.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To add or overwrite a document using the
PUT /<target>/_doc/<_id>
request format, you must have thecreate
,index
, orwrite
index privilege. - To add a document using the
POST /<target>/_doc/
request format, you must have thecreate_doc
,create
,index
, orwrite
index privilege. - To automatically create a data stream or index with this API request, you must have the
auto_configure
,create_index
, ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
NOTE: Replica shards might not all be started when an indexing operation returns successfully.
By default, only the primary is required. Set wait_for_active_shards
to change this default behavior.
Automatically create data streams and indices
If the request's target doesn't exist and matches an index template with a data_stream
definition, the index operation automatically creates the data stream.
If the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.
NOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.
If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed.
Automatic index creation is controlled by the action.auto_create_index
setting.
If it is true
, any index can be created automatically.
You can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to false
to turn off automatic index creation entirely.
Specify a comma-separated list of patterns you want to allow or prefix each pattern with +
or -
to indicate whether it should be allowed or blocked.
When a list is specified, the default behaviour is to disallow.
NOTE: The action.auto_create_index
setting affects the automatic creation of indices only.
It does not affect the creation of data streams.
Optimistic concurrency control
Index operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the if_seq_no
and if_primary_term
parameters.
If a mismatch is detected, the operation will result in a VersionConflictException
and a status code of 409
.
Routing
By default, shard placement — or routing — is controlled by using a hash of the document's ID value.
For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the routing
parameter.
When setting up explicit mapping, you can also use the _routing
field to direct the index operation to extract the routing value from the document itself.
This does come at the (very minimal) cost of an additional document parsing pass.
If the _routing
mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.
NOTE: Data streams do not support custom routing unless they were created with the allow_custom_routing
setting enabled in the template.
Distributed
The index operation is directed to the primary shard based on its route and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.
Active shards
To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.
If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.
By default, write operations only wait for the primary shards to be active before proceeding (that is to say wait_for_active_shards
is 1
).
This default can be overridden in the index settings dynamically by setting index.write.wait_for_active_shards
.
To alter this behavior per operation, use the wait_for_active_shards request
parameter.
Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is number_of_replicas
+1).
Specifying a negative value or a number greater than the number of shard copies will throw an error.
For example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).
If you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.
This means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.
If wait_for_active_shards
is set on the request to 3
(and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.
This requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.
However, if you set wait_for_active_shards
to all
(or to 4
, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.
The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.
It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.
After the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.
The _shards
section of the API response reveals the number of shard copies on which replication succeeded and failed.
No operation (noop) updates
When updating a document by using this API, a new version of the document is always created even if the document hasn't changed.
If this isn't acceptable use the _update
API with detect_noop
set to true
.
The detect_noop
option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.
There isn't a definitive rule for when noop updates aren't acceptable. It's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.
Versioning
Each indexed document is given a version number.
By default, internal versioning is used that starts at 1 and increments with each update, deletes included.
Optionally, the version number can be set to an external value (for example, if maintained in a database).
To enable this functionality, version_type
should be set to external
.
The value provided must be a numeric, long value greater than or equal to 0, and less than around 9.2e+18
.
NOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations. If no version is provided, the operation runs without any version checks.
When using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document. If true, the document will be indexed and the new version number used. If the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:
PUT my-index-000001/_doc/1?version=2&version_type=external
{
"user": {
"id": "elkbee"
}
}
In this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.
If the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).
A nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.
Even the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.
Path parameters
-
index
string Required The name of the data stream or index to target. If the target doesn't exist and matches the name or wildcard (
*
) pattern of an index template with adata_stream
definition, this request creates the data stream. If the target doesn't exist and doesn't match a data stream template, this request creates the index. You can check for existing targets with the resolve index API.
Query parameters
-
if_primary_term
number Only perform the operation if the document has this primary term.
-
if_seq_no
number Only perform the operation if the document has this sequence number.
-
include_source_on_error
boolean True or false if to include the document source in the error message in case of parsing errors.
-
op_type
string Set to
create
to only index the document if it does not already exist (put if absent). If a document with the specified_id
already exists, the indexing operation will fail. The behavior is the same as using the<index>/_create
endpoint. If a document ID is specified, this paramater defaults toindex
. Otherwise, it defaults tocreate
. If the request targets a data stream, anop_type
ofcreate
is required.Supported values include:
index
: Overwrite any documents that already exist.create
: Only index documents that do not already exist.
Values are
index
orcreate
. -
pipeline
string The ID of the pipeline to use to preprocess incoming documents. If the index has a default ingest pipeline specified, then setting the value to
_none
disables the default ingest pipeline for this request. If a final pipeline is configured it will always run, regardless of the value of this parameter. -
refresh
string If
true
, Elasticsearch refreshes the affected shards to make this operation visible to search. Ifwait_for
, it waits for a refresh to make this operation visible to search. Iffalse
, it does nothing with refreshes.Values are
true
,false
, orwait_for
. -
routing
string A custom value that is used to route operations to a specific shard.
-
timeout
string The period the request waits for the following operations: automatic index creation, dynamic mapping updates, waiting for active shards.
This parameter is useful for situations where the primary shard assigned to perform the operation might not be available when the operation runs. Some reasons for this might be that the primary shard is currently recovering from a gateway or undergoing relocation. By default, the operation will wait on the primary shard to become available for at least 1 minute before failing and responding with an error. The actual wait time could be longer, particularly when multiple waits occur.
-
version
number An explicit version number for concurrency control. It must be a non-negative long number.
-
version_type
string The version type.
Supported values include:
internal
: Use internal versioning that starts at 1 and increments with each update or delete.external
: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.external_gte
: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: Theexternal_gte
version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.force
: This option is deprecated because it can cause primary and replica shards to diverge.
Values are
internal
,external
,external_gte
, orforce
. -
wait_for_active_shards
number | string The number of shard copies that must be active before proceeding with the operation. You can set it to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). The default value of1
means it waits for each primary shard to be active. -
require_alias
boolean If
true
, the destination must be an index alias.
curl \
--request POST 'http://api.example.com/{index}/_doc' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"@timestamp\": \"2099-11-15T13:12:00\",\n \"message\": \"GET /search HTTP/1.1 200 1070000\",\n \"user\": {\n \"id\": \"kimchy\"\n }\n}"'
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
{
"_shards": {
"total": 2,
"failed": 0,
"successful": 2
},
"_index": "my-index-000001",
"_id": "W0tpsmIBdwcYyG50zbta",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"result": "created"
}
{
"_shards": {
"total": 2,
"failed": 0,
"successful": 2
},
"_index": "my-index-000001",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"result": "created"
}
Get multiple documents
Added in 1.3.0
Get multiple JSON documents by ID from one or more indices. If you specify an index in the request URI, you only need to specify the document IDs in the request body. To ensure fast responses, this multi get (mget) API responds with partial results if one or more shards fail.
Filter source fields
By default, the _source
field is returned for every document (if stored).
Use the _source
and _source_include
or source_exclude
attributes to filter what fields are returned for a particular document.
You can include the _source
, _source_includes
, and _source_excludes
query parameters in the request URI to specify the defaults to use when there are no per-document instructions.
Get stored fields
Use the stored_fields
attribute to specify the set of stored fields you want to retrieve.
Any requested fields that are not stored are ignored.
You can include the stored_fields
query parameter in the request URI to specify the defaults to use when there are no per-document instructions.
Query parameters
-
preference
string Specifies the node or shard the operation should be performed on. Random by default.
-
realtime
boolean If
true
, the request is real-time as opposed to near-real-time. -
refresh
boolean If
true
, the request refreshes relevant shards before retrieving documents. -
routing
string Custom value used to route operations to a specific shard.
-
_source
boolean | string | array[string] True or false to return the
_source
field or not, or a list of fields to return. -
_source_excludes
string | array[string] A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in
_source_includes
query parameter. -
_source_includes
string | array[string] A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the
_source_excludes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
stored_fields
string | array[string] If
true
, retrieves the document fields stored in the index rather than the document_source
.
curl \
--request GET 'http://api.example.com/_mget' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"docs\": [\n {\n \"_id\": \"1\"\n },\n {\n \"_id\": \"2\"\n }\n ]\n}"'
{
"docs": [
{
"_id": "1"
},
{
"_id": "2"
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"_source": false
},
{
"_index": "test",
"_id": "2",
"_source": [ "field3", "field4" ]
},
{
"_index": "test",
"_id": "3",
"_source": {
"include": [ "user" ],
"exclude": [ "user.location" ]
}
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"stored_fields": [ "field1", "field2" ]
},
{
"_index": "test",
"_id": "2",
"stored_fields": [ "field3", "field4" ]
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"routing": "key2"
},
{
"_index": "test",
"_id": "2"
}
]
}
Get multiple documents
Added in 1.3.0
Get multiple JSON documents by ID from one or more indices. If you specify an index in the request URI, you only need to specify the document IDs in the request body. To ensure fast responses, this multi get (mget) API responds with partial results if one or more shards fail.
Filter source fields
By default, the _source
field is returned for every document (if stored).
Use the _source
and _source_include
or source_exclude
attributes to filter what fields are returned for a particular document.
You can include the _source
, _source_includes
, and _source_excludes
query parameters in the request URI to specify the defaults to use when there are no per-document instructions.
Get stored fields
Use the stored_fields
attribute to specify the set of stored fields you want to retrieve.
Any requested fields that are not stored are ignored.
You can include the stored_fields
query parameter in the request URI to specify the defaults to use when there are no per-document instructions.
Path parameters
-
index
string Required Name of the index to retrieve documents from when
ids
are specified, or when a document in thedocs
array does not specify an index.
Query parameters
-
preference
string Specifies the node or shard the operation should be performed on. Random by default.
-
realtime
boolean If
true
, the request is real-time as opposed to near-real-time. -
refresh
boolean If
true
, the request refreshes relevant shards before retrieving documents. -
routing
string Custom value used to route operations to a specific shard.
-
_source
boolean | string | array[string] True or false to return the
_source
field or not, or a list of fields to return. -
_source_excludes
string | array[string] A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in
_source_includes
query parameter. -
_source_includes
string | array[string] A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the
_source_excludes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
stored_fields
string | array[string] If
true
, retrieves the document fields stored in the index rather than the document_source
.
curl \
--request GET 'http://api.example.com/{index}/_mget' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"docs\": [\n {\n \"_id\": \"1\"\n },\n {\n \"_id\": \"2\"\n }\n ]\n}"'
{
"docs": [
{
"_id": "1"
},
{
"_id": "2"
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"_source": false
},
{
"_index": "test",
"_id": "2",
"_source": [ "field3", "field4" ]
},
{
"_index": "test",
"_id": "3",
"_source": {
"include": [ "user" ],
"exclude": [ "user.location" ]
}
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"stored_fields": [ "field1", "field2" ]
},
{
"_index": "test",
"_id": "2",
"stored_fields": [ "field3", "field4" ]
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"routing": "key2"
},
{
"_index": "test",
"_id": "2"
}
]
}
Get multiple documents
Added in 1.3.0
Get multiple JSON documents by ID from one or more indices. If you specify an index in the request URI, you only need to specify the document IDs in the request body. To ensure fast responses, this multi get (mget) API responds with partial results if one or more shards fail.
Filter source fields
By default, the _source
field is returned for every document (if stored).
Use the _source
and _source_include
or source_exclude
attributes to filter what fields are returned for a particular document.
You can include the _source
, _source_includes
, and _source_excludes
query parameters in the request URI to specify the defaults to use when there are no per-document instructions.
Get stored fields
Use the stored_fields
attribute to specify the set of stored fields you want to retrieve.
Any requested fields that are not stored are ignored.
You can include the stored_fields
query parameter in the request URI to specify the defaults to use when there are no per-document instructions.
Path parameters
-
index
string Required Name of the index to retrieve documents from when
ids
are specified, or when a document in thedocs
array does not specify an index.
Query parameters
-
preference
string Specifies the node or shard the operation should be performed on. Random by default.
-
realtime
boolean If
true
, the request is real-time as opposed to near-real-time. -
refresh
boolean If
true
, the request refreshes relevant shards before retrieving documents. -
routing
string Custom value used to route operations to a specific shard.
-
_source
boolean | string | array[string] True or false to return the
_source
field or not, or a list of fields to return. -
_source_excludes
string | array[string] A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in
_source_includes
query parameter. -
_source_includes
string | array[string] A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the
_source_excludes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
stored_fields
string | array[string] If
true
, retrieves the document fields stored in the index rather than the document_source
.
curl \
--request POST 'http://api.example.com/{index}/_mget' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"docs\": [\n {\n \"_id\": \"1\"\n },\n {\n \"_id\": \"2\"\n }\n ]\n}"'
{
"docs": [
{
"_id": "1"
},
{
"_id": "2"
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"_source": false
},
{
"_index": "test",
"_id": "2",
"_source": [ "field3", "field4" ]
},
{
"_index": "test",
"_id": "3",
"_source": {
"include": [ "user" ],
"exclude": [ "user.location" ]
}
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"stored_fields": [ "field1", "field2" ]
},
{
"_index": "test",
"_id": "2",
"stored_fields": [ "field3", "field4" ]
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"routing": "key2"
},
{
"_index": "test",
"_id": "2"
}
]
}
Get multiple term vectors
Get multiple term vectors with a single request.
You can specify existing documents by index and ID or provide artificial documents in the body of the request.
You can specify the index in the request body or request URI.
The response contains a docs
array with all the fetched termvectors.
Each element has the structure provided by the termvectors API.
Artificial documents
You can also use mtermvectors
to generate term vectors for artificial documents provided in the body of the request.
The mapping used is determined by the specified _index
.
Path parameters
-
index
string Required The name of the index that contains the documents.
Query parameters
-
ids
array[string] A comma-separated list of documents ids. You must define ids as parameter or set "ids" or "docs" in the request body
-
fields
string | array[string] A comma-separated list or wildcard expressions of fields to include in the statistics. It is used as the default list unless a specific field list is provided in the
completion_fields
orfielddata_fields
parameters. -
field_statistics
boolean If
true
, the response includes the document count, sum of document frequencies, and sum of total term frequencies. -
offsets
boolean If
true
, the response includes term offsets. -
payloads
boolean If
true
, the response includes term payloads. -
positions
boolean If
true
, the response includes term positions. -
preference
string The node or shard the operation should be performed on. It is random by default.
-
realtime
boolean If true, the request is real-time as opposed to near-real-time.
-
routing
string A custom value used to route operations to a specific shard.
-
term_statistics
boolean If true, the response includes term frequency and document frequency.
-
version
number If
true
, returns the document version as part of a hit. -
version_type
string The version type.
Supported values include:
internal
: Use internal versioning that starts at 1 and increments with each update or delete.external
: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.external_gte
: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: Theexternal_gte
version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.force
: This option is deprecated because it can cause primary and replica shards to diverge.
Values are
internal
,external
,external_gte
, orforce
.
curl \
--request POST 'http://api.example.com/{index}/_mtermvectors' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"docs\": [\n {\n \"_id\": \"2\",\n \"fields\": [\n \"message\"\n ],\n \"term_statistics\": true\n },\n {\n \"_id\": \"1\"\n }\n ]\n}"'
{
"docs": [
{
"_id": "2",
"fields": [
"message"
],
"term_statistics": true
},
{
"_id": "1"
}
]
}
{
"ids": [ "1", "2" ],
"fields": [
"message"
],
"term_statistics": true
}
{
"docs": [
{
"_index": "my-index-000001",
"doc" : {
"message" : "test test test"
}
},
{
"_index": "my-index-000001",
"doc" : {
"message" : "Another test ..."
}
}
]
}
Get term vector information
Get information and statistics about terms in the fields of a particular document.
You can retrieve term vectors for documents stored in the index or for artificial documents passed in the body of the request.
You can specify the fields you are interested in through the fields
parameter or by adding the fields to the request body.
For example:
GET /my-index-000001/_termvectors/1?fields=message
Fields can be specified using wildcards, similar to the multi match query.
Term vectors are real-time by default, not near real-time.
This can be changed by setting realtime
parameter to false
.
You can request three types of values: term information, term statistics, and field statistics. By default, all term information and field statistics are returned for all fields but term statistics are excluded.
Term information
- term frequency in the field (always returned)
- term positions (
positions: true
) - start and end offsets (
offsets: true
) - term payloads (
payloads: true
), as base64 encoded bytes
If the requested information wasn't stored in the index, it will be computed on the fly if possible. Additionally, term vectors could be computed for documents not even existing in the index, but instead provided by the user.
Start and end offsets assume UTF-16 encoding is being used. If you want to use these offsets in order to get the original text that produced this token, you should make sure that the string you are taking a sub-string of is also encoded using UTF-16.
Behaviour
The term and field statistics are not accurate.
Deleted documents are not taken into account.
The information is only retrieved for the shard the requested document resides in.
The term and field statistics are therefore only useful as relative measures whereas the absolute numbers have no meaning in this context.
By default, when requesting term vectors of artificial documents, a shard to get the statistics from is randomly selected.
Use routing
only to hit a particular shard.
Path parameters
-
index
string Required The name of the index that contains the document.
Query parameters
-
fields
string | array[string] A comma-separated list or wildcard expressions of fields to include in the statistics. It is used as the default list unless a specific field list is provided in the
completion_fields
orfielddata_fields
parameters. -
field_statistics
boolean If
true
, the response includes:- The document count (how many documents contain this field).
- The sum of document frequencies (the sum of document frequencies for all terms in this field).
- The sum of total term frequencies (the sum of total term frequencies of each term in this field).
-
offsets
boolean If
true
, the response includes term offsets. -
payloads
boolean If
true
, the response includes term payloads. -
positions
boolean If
true
, the response includes term positions. -
preference
string The node or shard the operation should be performed on. It is random by default.
-
realtime
boolean If true, the request is real-time as opposed to near-real-time.
-
routing
string A custom value that is used to route operations to a specific shard.
-
term_statistics
boolean If
true
, the response includes:- The total term frequency (how often a term occurs in all documents).
- The document frequency (the number of documents containing the current term).
By default these values are not returned since term statistics can have a serious performance impact.
-
version
number If
true
, returns the document version as part of a hit. -
version_type
string The version type.
Supported values include:
internal
: Use internal versioning that starts at 1 and increments with each update or delete.external
: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.external_gte
: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: Theexternal_gte
version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.force
: This option is deprecated because it can cause primary and replica shards to diverge.
Values are
internal
,external
,external_gte
, orforce
.
Body
-
doc
object An artificial document (a document not present in the index) for which you want to retrieve term vectors.
-
filter
object -
per_field_analyzer
object Override the default per-field analyzer. This is useful in order to generate term vectors in any fashion, especially when using artificial documents. When providing an analyzer for a field that already stores term vectors, the term vectors will be regenerated.
-
fields
string | array[string] -
field_statistics
boolean If
true
, the response includes:- The document count (how many documents contain this field).
- The sum of document frequencies (the sum of document frequencies for all terms in this field).
- The sum of total term frequencies (the sum of total term frequencies of each term in this field).
-
offsets
boolean If
true
, the response includes term offsets. -
payloads
boolean If
true
, the response includes term payloads. -
positions
boolean If
true
, the response includes term positions. -
term_statistics
boolean If
true
, the response includes:- The total term frequency (how often a term occurs in all documents).
- The document frequency (the number of documents containing the current term).
By default these values are not returned since term statistics can have a serious performance impact.
-
routing
string -
version
number -
version_type
string Values are
internal
,external
,external_gte
, orforce
.
curl \
--request POST 'http://api.example.com/{index}/_termvectors' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"fields\" : [\"text\"],\n \"offsets\" : true,\n \"payloads\" : true,\n \"positions\" : true,\n \"term_statistics\" : true,\n \"field_statistics\" : true\n}"'
{
"fields" : ["text"],
"offsets" : true,
"payloads" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
{
"doc" : {
"fullname" : "John Doe",
"text" : "test test test"
},
"fields": ["fullname"],
"per_field_analyzer" : {
"fullname": "keyword"
}
}
{
"doc": {
"plot": "When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil."
},
"term_statistics": true,
"field_statistics": true,
"positions": false,
"offsets": false,
"filter": {
"max_num_terms": 3,
"min_term_freq": 1,
"min_doc_freq": 1
}
}
{
"fields" : ["text", "some_field_without_term_vectors"],
"offsets" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
{
"doc" : {
"fullname" : "John Doe",
"text" : "test test test"
}
}
{
"_index": "my-index-000001",
"_id": "1",
"_version": 1,
"found": true,
"took": 6,
"term_vectors": {
"text": {
"field_statistics": {
"sum_doc_freq": 4,
"doc_count": 2,
"sum_ttf": 6
},
"terms": {
"test": {
"doc_freq": 2,
"ttf": 4,
"term_freq": 3,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 4,
"payload": "d29yZA=="
},
{
"position": 1,
"start_offset": 5,
"end_offset": 9,
"payload": "d29yZA=="
},
{
"position": 2,
"start_offset": 10,
"end_offset": 14,
"payload": "d29yZA=="
}
]
}
}
}
}
}
{
"_index": "my-index-000001",
"_version": 0,
"found": true,
"took": 6,
"term_vectors": {
"fullname": {
"field_statistics": {
"sum_doc_freq": 2,
"doc_count": 4,
"sum_ttf": 4
},
"terms": {
"John Doe": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 8
}
]
}
}
}
}
}
{
"_index": "imdb",
"_version": 0,
"found": true,
"term_vectors": {
"plot": {
"field_statistics": {
"sum_doc_freq": 3384269,
"doc_count": 176214,
"sum_ttf": 3753460
},
"terms": {
"armored": {
"doc_freq": 27,
"ttf": 27,
"term_freq": 1,
"score": 9.74725
},
"industrialist": {
"doc_freq": 88,
"ttf": 88,
"term_freq": 1,
"score": 8.590818
},
"stark": {
"doc_freq": 44,
"ttf": 47,
"term_freq": 1,
"score": 9.272792
}
}
}
}
}
Index
Index APIs enable you to manage individual indices, index settings, aliases, mappings, and index templates.
Get tokens from text analysis
The analyze API performs analysis on a text string and returns the resulting tokens.
Generating excessive amount of tokens may cause a node to run out of memory.
The index.analyze.max_token_count
setting enables you to limit the number of tokens that can be produced.
If more than this limit of tokens gets generated, an error occurs.
The _analyze
endpoint without a specified index will always use 10000
as its limit.
Path parameters
-
index
string Required Index used to derive the analyzer. If specified, the
analyzer
or field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.
Query parameters
-
index
string Index used to derive the analyzer. If specified, the
analyzer
or field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.
Body
-
analyzer
string The name of the analyzer that should be applied to the provided
text
. This could be a built-in analyzer, or an analyzer that’s been configured in the index. -
attributes
array[string] Array of token attributes used to filter the output of the
explain
parameter. -
char_filter
array Array of character filters used to preprocess characters before the tokenizer.
External documentation -
explain
boolean If
true
, the response includes token attributes and additional details. -
field
string Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
filter
array Array of token filters used to apply after the tokenizer.
External documentation -
normalizer
string Normalizer to use to convert text into a single token.
text
string | array[string]
curl \
--request POST 'http://api.example.com/{index}/_analyze' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"analyzer\": \"standard\",\n \"text\": \"this is a test\"\n}"'
{
"analyzer": "standard",
"text": "this is a test"
}
{
"analyzer": "standard",
"text": [
"this is a test",
"the second text"
]
}
{
"tokenizer": "keyword",
"filter": [
"lowercase"
],
"char_filter": [
"html_strip"
],
"text": "this is a <b>test</b>"
}
{
"tokenizer": "whitespace",
"filter": [
"lowercase",
{
"type": "stop",
"stopwords": [
"a",
"is",
"this"
]
}
],
"text": "this is a test"
}
{
"field": "obj1.field1",
"text": "this is a test"
}
{
"normalizer": "my_normalizer",
"text": "BaR"
}
{
"tokenizer": "standard",
"filter": [
"snowball"
],
"text": "detailed output",
"explain": true,
"attributes": [
"keyword"
]
}
{
"detail": {
"custom_analyzer": true,
"charfilters": [],
"tokenizer": {
"name": "standard",
"tokens": [
{
"token": "detailed",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "output",
"start_offset": 9,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 1
}
]
},
"tokenfilters": [
{
"name": "snowball",
"tokens": [
{
"token": "detail",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0,
"keyword": false
},
{
"token": "output",
"start_offset": 9,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 1,
"keyword": false
}
]
}
]
}
}
Get aliases
Retrieves information for one or more data stream or index aliases.
Path parameters
-
index
string | array[string] Required Comma-separated list of data streams or indices used to limit the request. Supports wildcards (
*
). To target all data streams and indices, omit this parameter or use*
or_all
. -
name
string | array[string] Required Comma-separated list of aliases to retrieve. Supports wildcards (
*
). To retrieve all aliases, omit this parameter or use*
or_all
.
Query parameters
-
allow_no_indices
boolean If
false
, the request returns an error if any wildcard expression, index alias, or_all
value targets only missing or closed indices. This behavior applies even if the request targets other open indices. -
expand_wildcards
string | array[string] Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as
open,hidden
. Valid values are:all
,open
,closed
,hidden
,none
.Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
-
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
curl \
--request GET 'http://api.example.com/{index}/_alias/{name}' \
--header "Authorization: $API_KEY"
Create or update an index template
Added in 7.9.0
Index templates define settings, mappings, and aliases that can be applied automatically to new indices.
Elasticsearch applies templates to new indices based on an wildcard pattern that matches the index name. Index templates are applied during data stream or index creation. For data streams, these settings and mappings are applied when the stream's backing indices are created. Settings and mappings specified in a create index API request override any settings or mappings specified in an index template. Changes to index templates do not affect existing indices, including the existing backing indices of a data stream.
You can use C-style /* *\/
block comments in index templates.
You can include comments anywhere in the request body, except before the opening curly bracket.
Multiple matching templates
If multiple index templates match the name of a new index or data stream, the template with the highest priority is used.
Multiple templates with overlapping index patterns at the same priority are not allowed and an error will be thrown when attempting to create a template matching an existing index template at identical priorities.
Composing aliases, mappings, and settings
When multiple component templates are specified in the composed_of
field for an index template, they are merged in the order specified, meaning that later component templates override earlier component templates.
Any mappings, settings, or aliases from the parent index template are merged in next.
Finally, any configuration on the index request itself is merged.
Mapping definitions are merged recursively, which means that later mapping components can introduce new field mappings and update the mapping configuration.
If a field mapping is already contained in an earlier component, its definition will be completely overwritten by the later one.
This recursive merging strategy applies not only to field mappings, but also root options like dynamic_templates
and meta
.
If an earlier component contains a dynamic_templates
block, then by default new dynamic_templates
entries are appended onto the end.
If an entry already exists with the same key, then it is overwritten by the new definition.
Path parameters
-
name
string Required Index or template name
Query parameters
-
create
boolean If
true
, this request cannot replace or update existing index templates. -
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
-
cause
string User defined reason for creating/updating the index template
Body
Required
-
index_patterns
string | array[string] -
composed_of
array[string] An ordered list of component template names. Component templates are merged in the order specified, meaning that the last component template specified has the highest precedence.
-
template
object -
data_stream
object -
priority
number Priority to determine index template precedence when a new data stream or index is created. The index template with the highest priority is chosen. If no priority is specified the template is treated as though it is of priority 0 (lowest priority). This number is not automatically generated by Elasticsearch.
-
version
number -
_meta
object -
allow_auto_create
boolean This setting overrides the value of the
action.auto_create_index
cluster setting. If set totrue
in a template, then indices can be automatically created using that template even if auto-creation of indices is disabled viaactions.auto_create_index
. If set tofalse
, then indices or data streams matching the template must always be explicitly created, and may never be automatically created. -
ignore_missing_component_templates
array[string] The configuration option ignore_missing_component_templates can be used when an index template references a component template that might not exist
-
deprecated
boolean Marks this index template as deprecated. When creating or updating a non-deprecated index template that uses deprecated components, Elasticsearch will emit a deprecation warning.
curl \
--request PUT 'http://api.example.com/_index_template/{name}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"index_patterns\" : [\"template*\"],\n \"priority\" : 1,\n \"template\": {\n \"settings\" : {\n \"number_of_shards\" : 2\n }\n }\n}"'
{
"index_patterns" : ["template*"],
"priority" : 1,
"template": {
"settings" : {
"number_of_shards" : 2
}
}
}
{
"index_patterns": [
"template*"
],
"template": {
"settings": {
"number_of_shards": 1
},
"aliases": {
"alias1": {},
"alias2": {
"filter": {
"term": {
"user.id": "kimchy"
}
},
"routing": "shard-1"
},
"{index}-alias": {}
}
}
}
Refresh an index
A refresh makes recent operations performed on one or more indices available for search. For data streams, the API runs the refresh operation on the stream’s backing indices.
By default, Elasticsearch periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds.
You can change this default interval with the index.refresh_interval
setting.
Refresh requests are synchronous and do not return a response until the refresh operation completes.
Refreshes are resource-intensive. To ensure good cluster performance, it's recommended to wait for Elasticsearch's periodic refresh rather than performing an explicit refresh when possible.
If your application workflow indexes documents and then runs a search to retrieve the indexed document, it's recommended to use the index API's refresh=wait_for
query parameter option.
This option ensures the indexing operation waits for a periodic refresh before running the search.
Path parameters
-
index
string | array[string] Required Comma-separated list of data streams, indices, and aliases used to limit the request. Supports wildcards (
*
). To target all data streams and indices, omit this parameter or use*
or_all
.
Query parameters
-
allow_no_indices
boolean If
false
, the request returns an error if any wildcard expression, index alias, or_all
value targets only missing or closed indices. This behavior applies even if the request targets other open indices. -
expand_wildcards
string | array[string] Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as
open,hidden
. Valid values are:all
,open
,closed
,hidden
,none
.Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
curl \
--request GET 'http://api.example.com/{index}/_refresh' \
--header "Authorization: $API_KEY"
Refresh an index
A refresh makes recent operations performed on one or more indices available for search. For data streams, the API runs the refresh operation on the stream’s backing indices.
By default, Elasticsearch periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds.
You can change this default interval with the index.refresh_interval
setting.
Refresh requests are synchronous and do not return a response until the refresh operation completes.
Refreshes are resource-intensive. To ensure good cluster performance, it's recommended to wait for Elasticsearch's periodic refresh rather than performing an explicit refresh when possible.
If your application workflow indexes documents and then runs a search to retrieve the indexed document, it's recommended to use the index API's refresh=wait_for
query parameter option.
This option ensures the indexing operation waits for a periodic refresh before running the search.
Path parameters
-
index
string | array[string] Required Comma-separated list of data streams, indices, and aliases used to limit the request. Supports wildcards (
*
). To target all data streams and indices, omit this parameter or use*
or_all
.
Query parameters
-
allow_no_indices
boolean If
false
, the request returns an error if any wildcard expression, index alias, or_all
value targets only missing or closed indices. This behavior applies even if the request targets other open indices. -
expand_wildcards
string | array[string] Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as
open,hidden
. Valid values are:all
,open
,closed
,hidden
,none
.Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
curl \
--request POST 'http://api.example.com/{index}/_refresh' \
--header "Authorization: $API_KEY"
Roll over to a new index
Added in 5.0.0
TIP: It is recommended to use the index lifecycle rollover action to automate rollovers.
The rollover API creates a new index for a data stream or index alias. The API behavior depends on the rollover target.
Roll over a data stream
If you roll over a data stream, the API creates a new write index for the stream. The stream's previous write index becomes a regular backing index. A rollover also increments the data stream's generation.
Roll over an index alias with a write index
TIP: Prior to Elasticsearch 7.9, you'd typically use an index alias with a write index to manage time series data. Data streams replace this functionality, require less maintenance, and automatically integrate with data tiers.
If an index alias points to multiple indices, one of the indices must be a write index.
The rollover API creates a new write index for the alias with is_write_index
set to true
.
The API also sets is_write_index
to false
for the previous write index.
Roll over an index alias with one index
If you roll over an index alias that points to only one index, the API creates a new index for the alias and removes the original index from the alias.
NOTE: A rollover creates a new index and is subject to the wait_for_active_shards
setting.
Increment index names for an alias
When you roll over an index alias, you can specify a name for the new index.
If you don't specify a name and the current index ends with -
and a number, such as my-index-000001
or my-index-3
, the new index name increments that number.
For example, if you roll over an alias with a current index of my-index-000001
, the rollover creates a new index named my-index-000002
.
This number is always six characters and zero-padded, regardless of the previous index's name.
If you use an index alias for time series data, you can use date math in the index name to track the rollover date.
For example, you can create an alias that points to an index named <my-index-{now/d}-000001>
.
If you create the index on May 6, 2099, the index's name is my-index-2099.05.06-000001
.
If you roll over the alias on May 7, 2099, the new index's name is my-index-2099.05.07-000002
.
Query parameters
-
dry_run
boolean If
true
, checks whether the current index satisfies the specified conditions but does not perform a rollover. -
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
-
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
-
wait_for_active_shards
number | string The number of shard copies that must be active before proceeding with the operation. Set to all or any positive integer up to the total number of shards in the index (
number_of_replicas+1
). -
lazy
boolean If set to true, the rollover action will only mark a data stream to signal that it needs to be rolled over at the next write. Only allowed on data streams.
Body
-
aliases
object Aliases for the target index. Data streams do not support this parameter.
-
conditions
object -
mappings
object -
settings
object Configuration options for the index. Data streams do not support this parameter.
curl \
--request POST 'http://api.example.com/{alias}/_rollover/{new_index}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"conditions\": {\n \"max_age\": \"7d\",\n \"max_docs\": 1000,\n \"max_primary_shard_size\": \"50gb\",\n \"max_primary_shard_docs\": \"2000\"\n }\n}"'
{
"conditions": {
"max_age": "7d",
"max_docs": 1000,
"max_primary_shard_size": "50gb",
"max_primary_shard_docs": "2000"
}
}
{
"_shards": {},
"indices": {
"test": {
"shards": {
"0": [
{
"routing": {
"state": "STARTED",
"primary": true,
"node": "zDC_RorJQCao9xf9pg3Fvw"
},
"num_committed_segments": 0,
"num_search_segments": 1,
"segments": {
"_0": {
"generation": 0,
"num_docs": 1,
"deleted_docs": 0,
"size_in_bytes": 3800,
"committed": false,
"search": true,
"version": "7.0.0",
"compound": true,
"attributes": {}
}
}
}
]
}
}
}
}
Create or update a pipeline
Added in 5.0.0
Changes made using this API take effect immediately.
Path parameters
-
id
string Required ID of the ingest pipeline to create or update.
Query parameters
-
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
-
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
-
if_version
number Required version for optimistic concurrency control for pipeline updates
Body
Required
-
_meta
object -
description
string Description of the ingest pipeline.
-
on_failure
array[object] Processors to run immediately after a processor failure. Each processor supports a processor-level
on_failure
value. If a processor without anon_failure
value fails, Elasticsearch uses this pipeline-level parameter as a fallback. The processors in this parameter run sequentially in the order specified. Elasticsearch will not attempt to run the pipeline's remaining processors. -
processors
array[object] Processors used to perform transformations on documents before indexing. Processors run sequentially in the order specified.
-
version
number -
deprecated
boolean Marks this ingest pipeline as deprecated. When a deprecated ingest pipeline is referenced as the default or final pipeline when creating or updating a non-deprecated index template, Elasticsearch will emit a deprecation warning.
curl \
--request PUT 'http://api.example.com/_ingest/pipeline/{id}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"description\" : \"My optional pipeline description\",\n \"processors\" : [\n {\n \"set\" : {\n \"description\" : \"My optional processor description\",\n \"field\": \"my-keyword-field\",\n \"value\": \"foo\"\n }\n }\n ]\n}"'
{
"description" : "My optional pipeline description",
"processors" : [
{
"set" : {
"description" : "My optional processor description",
"field": "my-keyword-field",
"value": "foo"
}
}
]
}
{
"description" : "My optional pipeline description",
"processors" : [
{
"set" : {
"description" : "My optional processor description",
"field": "my-keyword-field",
"value": "foo"
}
}
],
"_meta": {
"reason": "set my-keyword-field to foo",
"serialization": {
"class": "MyPipeline",
"id": 10
}
}
}
Delete pipelines
Added in 5.0.0
Delete one or more ingest pipelines.
Path parameters
-
id
string Required Pipeline ID or wildcard expression of pipeline IDs used to limit the request. To delete all ingest pipelines in a cluster, use a value of
*
.
Query parameters
-
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
-
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
curl \
--request DELETE 'http://api.example.com/_ingest/pipeline/{id}' \
--header "Authorization: $API_KEY"
Get pipelines
Added in 5.0.0
Get information about one or more ingest pipelines. This API returns a local reference of the pipeline.
Query parameters
-
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
-
summary
boolean Return pipelines without their definitions (default: false)
curl \
--request GET 'http://api.example.com/_ingest/pipeline' \
--header "Authorization: $API_KEY"
{
"my-pipeline-id" : {
"description" : "describe pipeline",
"version" : 123,
"processors" : [
{
"set" : {
"field" : "foo",
"value" : "bar"
}
}
]
}
}
Run a grok processor
Added in 6.1.0
Extract structured fields out of a single text field within a document. You must choose which field to extract matched fields from, as well as the grok pattern you expect will match. A grok pattern is like a regular expression that supports aliased expressions that can be reused.
curl \
--request GET 'http://api.example.com/_ingest/processor/grok' \
--header "Authorization: $API_KEY"
Delete an async search
Added in 7.7.0
If the asynchronous search is still running, it is cancelled.
Otherwise, the saved search results are deleted.
If the Elasticsearch security features are enabled, the deletion of a specific async search is restricted to: the authenticated user that submitted the original search request; users that have the cancel_task
cluster privilege.
Path parameters
-
id
string Required A unique identifier for the async search.
curl \
--request DELETE 'http://api.example.com/_async_search/{id}' \
--header "Authorization: $API_KEY"
Run an async search
Added in 7.7.0
When the primary sort of the results is an indexed field, shards get sorted based on minimum and maximum value that they hold for that field. Partial results become available following the sort criteria that was requested.
Warning: Asynchronous search does not support scroll or search requests that include only the suggest section.
By default, Elasticsearch does not allow you to store an async search response larger than 10Mb and an attempt to do this results in an error.
The maximum allowed size for a stored async search response can be set by changing the search.max_async_search_response_size
cluster level setting.
Path parameters
-
index
string | array[string] Required A comma-separated list of index names to search; use
_all
or empty string to perform the operation on all indices
Query parameters
-
wait_for_completion_timeout
string Blocks and waits until the search is completed up to a certain timeout. When the async search completes within the timeout, the response won’t include the ID as the results are not stored in the cluster.
-
keep_alive
string Specifies how long the async search needs to be available. Ongoing async searches and any saved search results are deleted after this period.
-
keep_on_completion
boolean If
true
, results are stored for later retrieval when the search completes within thewait_for_completion_timeout
. -
allow_no_indices
boolean Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes
_all
string or when no indices have been specified) -
allow_partial_search_results
boolean Indicate if an error should be returned if there is a partial search failure or timeout
-
analyzer
string The analyzer to use for the query string
-
analyze_wildcard
boolean Specify whether wildcard and prefix queries should be analyzed (default: false)
-
batched_reduce_size
number Affects how often partial results become available, which happens whenever shard results are reduced. A partial reduction is performed every time the coordinating node has received a certain number of new shard responses (5 by default).
-
ccs_minimize_roundtrips
boolean The default value is the only supported value.
-
default_operator
string The default operator for query string query (AND or OR)
Values are
and
,AND
,or
, orOR
. -
df
string The field to use as default where no field prefix is given in the query string
-
docvalue_fields
string | array[string] A comma-separated list of fields to return as the docvalue representation of a field for each hit
-
expand_wildcards
string | array[string] Whether to expand wildcard expression to concrete indices that are open, closed or both.
Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
-
explain
boolean Specify whether to return detailed information about score computation as part of a hit
-
ignore_throttled
boolean Whether specified concrete, expanded or aliased indices should be ignored when throttled
-
lenient
boolean Specify whether format-based query failures (such as providing text to a numeric field) should be ignored
-
The number of concurrent shard requests per node this search executes concurrently. This value should be used to limit the impact of the search on the cluster in order to limit the number of concurrent shard requests
-
preference
string Specify the node or shard the operation should be performed on (default: random)
-
request_cache
boolean Specify if request cache should be used for this request or not, defaults to true
-
routing
string A comma-separated list of specific routing values
-
search_type
string Search operation type
Supported values include:
query_then_fetch
: Documents are scored using local term and document frequencies for the shard. This is usually faster but less accurate.dfs_query_then_fetch
: Documents are scored using global term and document frequencies across all shards. This is usually slower but more accurate.
Values are
query_then_fetch
ordfs_query_then_fetch
. -
stats
array[string] Specific 'tag' of the request for logging and statistical purposes
-
stored_fields
string | array[string] A comma-separated list of stored fields to return as part of a hit
-
suggest_field
string Specifies which field to use for suggestions.
-
suggest_mode
string Specify suggest mode
Supported values include:
missing
: Only generate suggestions for terms that are not in the shard.popular
: Only suggest terms that occur in more docs on the shard than the original term.always
: Suggest any matching suggestions based on terms in the suggest text.
Values are
missing
,popular
, oralways
. -
suggest_size
number How many suggestions to return in response
-
suggest_text
string The source text for which the suggestions should be returned.
-
terminate_after
number The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early.
-
timeout
string Explicit operation timeout
-
track_total_hits
boolean | number Indicate if the number of documents that match the query should be tracked. A number can also be specified, to accurately track the total hit count up to the number.
-
track_scores
boolean Whether to calculate and return scores even if they are not used for sorting
-
typed_keys
boolean Specify whether aggregation and suggester names should be prefixed by their respective types in the response
-
rest_total_hits_as_int
boolean Indicates whether hits.total should be rendered as an integer or an object in the rest search response
-
version
boolean Specify whether to return document version as part of a hit
-
_source
boolean | string | array[string] True or false to return the _source field or not, or a list of fields to return
-
_source_excludes
string | array[string] A list of fields to exclude from the returned _source field
-
_source_includes
string | array[string] A list of fields to extract and return from the _source field
-
seq_no_primary_term
boolean Specify whether to return sequence number and primary term of the last modification of each hit
-
q
string Query in the Lucene query string syntax
-
size
number Number of hits to return (default: 10)
-
from
number Starting offset (default: 0)
-
sort
string | array[string] A comma-separated list of : pairs
Body
-
aggregations
object -
collapse
object External documentation -
explain
boolean If true, returns detailed information about score computation as part of a hit.
-
ext
object Configuration of search extensions defined by Elasticsearch plugins.
-
from
number Starting document offset. By default, you cannot page through more than 10,000 hits using the from and size parameters. To page through more hits, use the search_after parameter.
-
highlight
object -
track_total_hits
boolean | number Number of hits matching the query to count accurately. If true, the exact number of hits is returned at the cost of some performance. If false, the response does not include the total number of hits matching the query. Defaults to 10,000 hits.
-
indices_boost
array[object] Boosts the _score of documents from specified indices.
-
docvalue_fields
array[object] Array of wildcard (*) patterns. The request returns doc values for field names matching these patterns in the hits.fields property of the response.
knn
object | array[object] Defines the approximate kNN search to run.
-
min_score
number Minimum _score for matching documents. Documents with a lower _score are not included in search results and results collected by aggregations.
-
post_filter
object An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
External documentation -
profile
boolean -
query
object An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
External documentation rescore
object | array[object] -
script_fields
object Retrieve a script evaluation (based on different fields) for each hit.
-
search_after
array[number | string | boolean | null] A field value.
-
size
number The number of hits to return. By default, you cannot page through more than 10,000 hits using the from and size parameters. To page through more hits, use the search_after parameter.
-
slice
object _source
boolean | object Defines how to fetch a source. Fetching can be disabled entirely, or the source can be filtered.
-
fields
array[object] Array of wildcard (*) patterns. The request returns values for field names matching these patterns in the hits.fields property of the response.
-
suggest
object -
terminate_after
number Maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting. Defaults to 0, which does not terminate query execution early.
-
timeout
string Specifies the period of time to wait for a response from each shard. If no response is received before the timeout expires, the request fails and returns an error. Defaults to no timeout.
-
track_scores
boolean If true, calculate and return document scores, even if the scores are not used for sorting.
-
version
boolean If true, returns document version as part of a hit.
-
seq_no_primary_term
boolean If true, returns sequence number and primary term of the last modification of each hit. See Optimistic concurrency control.
-
stored_fields
string | array[string] -
pit
object -
runtime_mappings
object -
stats
array[string] Stats groups to associate with the search. Each group maintains a statistics aggregation for its associated searches. You can retrieve these stats using the indices stats API.
curl \
--request POST 'http://api.example.com/{index}/_async_search' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"sort\": [\n { \"date\": { \"order\": \"asc\" } }\n ],\n \"aggs\": {\n \"sale_date\": {\n \"date_histogram\": {\n \"field\": \"date\",\n \"calendar_interval\": \"1d\"\n }\n }\n }\n}"'
{
"sort": [
{ "date": { "order": "asc" } }
],
"aggs": {
"sale_date": {
"date_histogram": {
"field": "date",
"calendar_interval": "1d"
}
}
}
}
{
"id" : "FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=",
"is_partial" : true,
"is_running" : true,
"start_time_in_millis" : 1583945890986,
"expiration_time_in_millis" : 1584377890986,
"response" : {
"took" : 1122,
"timed_out" : false,
"num_reduce_phases" : 0,
"_shards" : {
"total" : 562,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 157483,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
}
}
}
Run a scrolling search
IMPORTANT: The scroll API is no longer recommend for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after
parameter with a point in time (PIT).
The scroll API gets large sets of results from a single scrolling search request.
To get the necessary scroll ID, submit a search API request that includes an argument for the scroll
query parameter.
The scroll
parameter indicates how long Elasticsearch should retain the search context for the request.
The search response returns a scroll ID in the _scroll_id
response body parameter.
You can then use the scroll ID with the scroll API to retrieve the next batch of results for the request.
If the Elasticsearch security features are enabled, the access to the results of a specific scroll ID is restricted to the user or API key that submitted the search.
You can also use the scroll API to specify a new scroll parameter that extends or shortens the retention period for the search context.
IMPORTANT: Results from a scrolling search reflect the state of the index at the time of the initial search request. Subsequent indexing or document changes only affect later search and scroll requests.
Query parameters
-
scroll
string The period to retain the search context for scrolling.
-
scroll_id
string Deprecated The scroll ID for scrolled search
-
rest_total_hits_as_int
boolean If true, the API response’s hit.total property is returned as an integer. If false, the API response’s hit.total property is returned as an object.
curl \
--request GET 'http://api.example.com/_search/scroll' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"scroll_id\" : \"DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==\"\n}"'
{
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}
Run a scrolling search
IMPORTANT: The scroll API is no longer recommend for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after
parameter with a point in time (PIT).
The scroll API gets large sets of results from a single scrolling search request.
To get the necessary scroll ID, submit a search API request that includes an argument for the scroll
query parameter.
The scroll
parameter indicates how long Elasticsearch should retain the search context for the request.
The search response returns a scroll ID in the _scroll_id
response body parameter.
You can then use the scroll ID with the scroll API to retrieve the next batch of results for the request.
If the Elasticsearch security features are enabled, the access to the results of a specific scroll ID is restricted to the user or API key that submitted the search.
You can also use the scroll API to specify a new scroll parameter that extends or shortens the retention period for the search context.
IMPORTANT: Results from a scrolling search reflect the state of the index at the time of the initial search request. Subsequent indexing or document changes only affect later search and scroll requests.
Path parameters
-
scroll_id
string Required Deprecated The scroll ID
Query parameters
-
scroll
string The period to retain the search context for scrolling.
-
scroll_id
string Deprecated The scroll ID for scrolled search
-
rest_total_hits_as_int
boolean If true, the API response’s hit.total property is returned as an integer. If false, the API response’s hit.total property is returned as an object.
curl \
--request GET 'http://api.example.com/_search/scroll/{scroll_id}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"scroll_id\" : \"DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==\"\n}"'
{
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}
Run multiple searches
Added in 1.3.0
The format of the request is similar to the bulk API format and makes use of the newline delimited JSON (NDJSON) format. The structure is as follows:
header\n
body\n
header\n
body\n
This structure is specifically optimized to reduce parsing if a specific search ends up redirected to another node.
IMPORTANT: The final line of data must end with a newline character \n
.
Each newline character may be preceded by a carriage return \r
.
When sending requests to this endpoint the Content-Type
header should be set to application/x-ndjson
.
Path parameters
-
index
string | array[string] Required Comma-separated list of data streams, indices, and index aliases to search.
Query parameters
-
allow_no_indices
boolean If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.
-
ccs_minimize_roundtrips
boolean If true, network roundtrips between the coordinating node and remote clusters are minimized for cross-cluster search requests.
-
expand_wildcards
string | array[string] Type of index that wildcard expressions can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams.
Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
-
ignore_throttled
boolean If true, concrete, expanded or aliased indices are ignored when frozen.
-
include_named_queries_score
boolean Indicates whether hit.matched_queries should be rendered as a map that includes the name of the matched query associated with its score (true) or as an array containing the name of the matched queries (false) This functionality reruns each named query on every hit in a search response. Typically, this adds a small overhead to a request. However, using computationally expensive named queries on a large number of hits may add significant overhead.
-
max_concurrent_searches
number Maximum number of concurrent searches the multi search API can execute. Defaults to
max(1, (# of data nodes * min(search thread pool size, 10)))
. -
Maximum number of concurrent shard requests that each sub-search request executes per node.
-
pre_filter_shard_size
number Defines a threshold that enforces a pre-filter roundtrip to prefilter search shards based on query rewriting if the number of shards the search request expands to exceeds the threshold. This filter roundtrip can limit the number of shards significantly if for instance a shard can not match any documents based on its rewrite method i.e., if date filters are mandatory to match but the shard bounds and the query are disjoint.
-
rest_total_hits_as_int
boolean If true, hits.total are returned as an integer in the response. Defaults to false, which returns an object.
-
routing
string Custom routing value used to route search operations to a specific shard.
-
search_type
string Indicates whether global term and document frequencies should be used when scoring returned documents.
Supported values include:
query_then_fetch
: Documents are scored using local term and document frequencies for the shard. This is usually faster but less accurate.dfs_query_then_fetch
: Documents are scored using global term and document frequencies across all shards. This is usually slower but more accurate.
Values are
query_then_fetch
ordfs_query_then_fetch
. -
typed_keys
boolean Specifies whether aggregation and suggester names should be prefixed by their respective types in the response.
Body
object
Required
-
allow_no_indices
boolean -
expand_wildcards
string | array[string] -
index
string | array[string] -
preference
string -
request_cache
boolean -
routing
string -
search_type
string Values are
query_then_fetch
ordfs_query_then_fetch
. -
ccs_minimize_roundtrips
boolean -
allow_partial_search_results
boolean -
ignore_throttled
boolean
curl \
--request POST 'http://api.example.com/{index}/_msearch' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '[{"allow_no_indices":true,"expand_wildcards":"string","ignore_unavailable":true,"index":"string","preference":"string","request_cache":true,"routing":"string","search_type":"query_then_fetch","ccs_minimize_roundtrips":true,"allow_partial_search_results":true,"ignore_throttled":true}]'
Evaluate ranked search results
Added in 6.2.0
Evaluate the quality of ranked search results over a set of typical search queries.
Path parameters
-
index
string | array[string] Required A comma-separated list of data streams, indices, and index aliases used to limit the request. Wildcard (
*
) expressions are supported. To target all data streams and indices in a cluster, omit this parameter or use_all
or*
.
Query parameters
-
allow_no_indices
boolean If
false
, the request returns an error if any wildcard expression, index alias, or_all
value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targetingfoo*,bar*
returns an error if an index starts withfoo
but no index starts withbar
. -
expand_wildcards
string | array[string] Whether to expand wildcard expression to concrete indices that are open, closed or both.
Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
-
search_type
string Search operation type
curl \
--request GET 'http://api.example.com/{index}/_rank_eval' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '{"requests":[{"id":"string","request":{"query":{},"size":42.0},"ratings":[{"_id":"string","_index":"string","rating":42.0}],"template_id":"string","params":{"additionalProperty1":{},"additionalProperty2":{}}}],"metric":{"precision":{"k":42.0,"relevant_rating_threshold":42.0,"ignore_unlabeled":true},"recall":{"k":42.0,"relevant_rating_threshold":42.0},"mean_reciprocal_rank":{"k":42.0,"relevant_rating_threshold":42.0},"dcg":{"k":42.0,"normalize":true},"expected_reciprocal_rank":{"k":42.0,"maximum_relevance":42.0}}}'
curl \
--request POST 'http://api.example.com/_render/template' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"id\": \"my-search-template\",\n \"params\": {\n \"query_string\": \"hello world\",\n \"from\": 20,\n \"size\": 10\n }\n}"'
{
"id": "my-search-template",
"params": {
"query_string": "hello world",
"from": 20,
"size": 10
}
}
Run a search
Get search hits that match the query defined in the request.
You can provide search queries using the q
query string parameter or the request body.
If both are specified, only the query parameter is used.
If the Elasticsearch security features are enabled, you must have the read index privilege for the target data stream, index, or alias. For cross-cluster search, refer to the documentation about configuring CCS privileges.
To search a point in time (PIT) for an alias, you must have the read
index privilege for the alias's data streams or indices.
Search slicing
When paging through a large number of documents, it can be helpful to split the search into multiple slices to consume them independently with the slice
and pit
properties.
By default the splitting is done first on the shards, then locally on each shard.
The local splitting partitions the shard into contiguous ranges based on Lucene document IDs.
For instance if the number of shards is equal to 2 and you request 4 slices, the slices 0 and 2 are assigned to the first shard and the slices 1 and 3 are assigned to the second shard.
IMPORTANT: The same point-in-time ID should be used for all slices. If different PIT IDs are used, slices can overlap and miss documents. This situation can occur because the splitting criterion is based on Lucene document IDs, which are not stable across changes to the index.
Query parameters
-
allow_no_indices
boolean If
false
, the request returns an error if any wildcard expression, index alias, or_all
value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targetingfoo*,bar*
returns an error if an index starts withfoo
but no index starts withbar
. -
allow_partial_search_results
boolean If
true
and there are shard request timeouts or shard failures, the request returns partial results. Iffalse
, it returns an error with no partial results.To override the default behavior, you can set the
search.default_allow_partial_results
cluster setting tofalse
. -
analyzer
string The analyzer to use for the query string. This parameter can be used only when the
q
query string parameter is specified. -
analyze_wildcard
boolean If
true
, wildcard and prefix queries are analyzed. This parameter can be used only when theq
query string parameter is specified. -
batched_reduce_size
number The number of shard results that should be reduced at once on the coordinating node. If the potential number of shards in the request can be large, this value should be used as a protection mechanism to reduce the memory overhead per search request.
-
ccs_minimize_roundtrips
boolean If
true
, network round-trips between the coordinating node and the remote clusters are minimized when running cross-cluster search (CCS) requests. -
default_operator
string The default operator for the query string query:
AND
orOR
. This parameter can be used only when theq
query string parameter is specified.Values are
and
,AND
,or
, orOR
. -
df
string The field to use as a default when no field prefix is given in the query string. This parameter can be used only when the
q
query string parameter is specified. -
docvalue_fields
string | array[string] A comma-separated list of fields to return as the docvalue representation of a field for each hit.
-
expand_wildcards
string | array[string] The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. It supports comma-separated values such as
open,hidden
.Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
-
explain
boolean If
true
, the request returns detailed information about score computation as part of a hit. -
ignore_throttled
boolean Deprecated If
true
, concrete, expanded or aliased indices will be ignored when frozen. -
include_named_queries_score
boolean If
true
, the response includes the score contribution from any named queries.This functionality reruns each named query on every hit in a search response. Typically, this adds a small overhead to a request. However, using computationally expensive named queries on a large number of hits may add significant overhead.
-
lenient
boolean If
true
, format-based query failures (such as providing text to a numeric field) in the query string will be ignored. This parameter can be used only when theq
query string parameter is specified. -
The number of concurrent shard requests per node that the search runs concurrently. This value should be used to limit the impact of the search on the cluster in order to limit the number of concurrent shard requests.
-
preference
string The nodes and shards used for the search. By default, Elasticsearch selects from eligible nodes and shards using adaptive replica selection, accounting for allocation awareness. Valid values are:
_only_local
to run the search only on shards on the local node._local
to, if possible, run the search on shards on the local node, or if not, select shards using the default method._only_nodes:<node-id>,<node-id>
to run the search on only the specified nodes IDs. If suitable shards exist on more than one selected node, use shards on those nodes using the default method. If none of the specified nodes are available, select shards from any available node using the default method._prefer_nodes:<node-id>,<node-id>
to if possible, run the search on the specified nodes IDs. If not, select shards using the default method._shards:<shard>,<shard>
to run the search only on the specified shards. You can combine this value with otherpreference
values. However, the_shards
value must come first. For example:_shards:2,3|_local
.<custom-string>
(any string that does not start with_
) to route searches with the same<custom-string>
to the same shards in the same order.
-
pre_filter_shard_size
number A threshold that enforces a pre-filter roundtrip to prefilter search shards based on query rewriting if the number of shards the search request expands to exceeds the threshold. This filter roundtrip can limit the number of shards significantly if for instance a shard can not match any documents based on its rewrite method (if date filters are mandatory to match but the shard bounds and the query are disjoint). When unspecified, the pre-filter phase is executed if any of these conditions is met:
- The request targets more than 128 shards.
- The request targets one or more read-only index.
- The primary sort of the query targets an indexed field.
-
request_cache
boolean If
true
, the caching of search results is enabled for requests wheresize
is0
. It defaults to index level settings. -
routing
string A custom value that is used to route operations to a specific shard.
-
scroll
string The period to retain the search context for scrolling. By default, this value cannot exceed
1d
(24 hours). You can change this limit by using thesearch.max_keep_alive
cluster-level setting. -
search_type
string Indicates how distributed term frequencies are calculated for relevance scoring.
Supported values include:
query_then_fetch
: Documents are scored using local term and document frequencies for the shard. This is usually faster but less accurate.dfs_query_then_fetch
: Documents are scored using global term and document frequencies across all shards. This is usually slower but more accurate.
Values are
query_then_fetch
ordfs_query_then_fetch
. -
stats
array[string] Specific
tag
of the request for logging and statistical purposes. -
stored_fields
string | array[string] A comma-separated list of stored fields to return as part of a hit. If no fields are specified, no stored fields are included in the response. If this field is specified, the
_source
parameter defaults tofalse
. You can pass_source: true
to return both source fields and stored fields in the search response. -
suggest_field
string The field to use for suggestions.
-
suggest_mode
string The suggest mode. This parameter can be used only when the
suggest_field
andsuggest_text
query string parameters are specified.Supported values include:
missing
: Only generate suggestions for terms that are not in the shard.popular
: Only suggest terms that occur in more docs on the shard than the original term.always
: Suggest any matching suggestions based on terms in the suggest text.
Values are
missing
,popular
, oralways
. -
suggest_size
number The number of suggestions to return. This parameter can be used only when the
suggest_field
andsuggest_text
query string parameters are specified. -
suggest_text
string The source text for which the suggestions should be returned. This parameter can be used only when the
suggest_field
andsuggest_text
query string parameters are specified. -
terminate_after
number The maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting.
IMPORTANT: Use with caution. Elasticsearch applies this parameter to each shard handling the request. When possible, let Elasticsearch perform early termination automatically. Avoid specifying this parameter for requests that target data streams with backing indices across multiple data tiers. If set to
0
(default), the query does not terminate early. -
timeout
string The period of time to wait for a response from each shard. If no response is received before the timeout expires, the request fails and returns an error. It defaults to no timeout.
-
track_total_hits
boolean | number The number of hits matching the query to count accurately. If
true
, the exact number of hits is returned at the cost of some performance. Iffalse
, the response does not include the total number of hits matching the query. -
track_scores
boolean If
true
, the request calculates and returns document scores, even if the scores are not used for sorting. -
typed_keys
boolean If
true
, aggregation and suggester names are be prefixed by their respective types in the response. -
rest_total_hits_as_int
boolean Indicates whether
hits.total
should be rendered as an integer or an object in the rest search response. -
version
boolean If
true
, the request returns the document version as part of a hit. -
_source
boolean | string | array[string] The source fields that are returned for matching documents. These fields are returned in the
hits._source
property of the search response. Valid values are:true
to return the entire document source.false
to not return the document source.<string>
to return the source fields that are specified as a comma-separated list that supports wildcard (*
) patterns.
-
_source_excludes
string | array[string] A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in
_source_includes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
_source_includes
string | array[string] A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the
_source_excludes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
seq_no_primary_term
boolean If
true
, the request returns the sequence number and primary term of the last modification of each hit. -
q
string A query in the Lucene query string syntax. Query parameter searches do not support the full Elasticsearch Query DSL but are handy for testing.
IMPORTANT: This parameter overrides the query parameter in the request body. If both parameters are specified, documents matching the query request body parameter are not returned.
-
size
number The number of hits to return. By default, you cannot page through more than 10,000 hits using the
from
andsize
parameters. To page through more hits, use thesearch_after
parameter. -
from
number The starting document offset, which must be non-negative. By default, you cannot page through more than 10,000 hits using the
from
andsize
parameters. To page through more hits, use thesearch_after
parameter. -
sort
string | array[string] A comma-separated list of
<field>:<direction>
pairs.
Body
-
aggregations
object Defines the aggregations that are run as part of the search request.
External documentation -
collapse
object External documentation -
explain
boolean If
true
, the request returns detailed information about score computation as part of a hit. -
ext
object Configuration of search extensions defined by Elasticsearch plugins.
-
from
number The starting document offset, which must be non-negative. By default, you cannot page through more than 10,000 hits using the
from
andsize
parameters. To page through more hits, use thesearch_after
parameter. -
highlight
object -
track_total_hits
boolean | number Number of hits matching the query to count accurately. If true, the exact number of hits is returned at the cost of some performance. If false, the response does not include the total number of hits matching the query. Defaults to 10,000 hits.
-
indices_boost
array[object] Boost the
_score
of documents from specified indices. The boost value is the factor by which scores are multiplied. A boost value greater than1.0
increases the score. A boost value between0
and1.0
decreases the score.External documentation -
docvalue_fields
array[object] An array of wildcard (
*
) field patterns. The request returns doc values for field names matching these patterns in thehits.fields
property of the response.External documentation knn
object | array[object] The approximate kNN search to run.
-
min_score
number The minimum
_score
for matching documents. Documents with a lower_score
are not included in search results and results collected by aggregations. -
post_filter
object An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
External documentation -
profile
boolean Set to
true
to return detailed timing information about the execution of individual components in a search request. NOTE: This is a debugging tool and adds significant overhead to search execution. -
query
object An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
External documentation rescore
object | array[object] Can be used to improve precision by reordering just the top (for example 100 - 500) documents returned by the
query
andpost_filter
phases.-
retriever
object -
script_fields
object Retrieve a script evaluation (based on different fields) for each hit.
-
search_after
array[number | string | boolean | null] A field value.
-
size
number The number of hits to return, which must not be negative. By default, you cannot page through more than 10,000 hits using the
from
andsize
parameters. To page through more hits, use thesearch_after
property. -
slice
object _source
boolean | object Defines how to fetch a source. Fetching can be disabled entirely, or the source can be filtered.
-
fields
array[object] An array of wildcard (
*
) field patterns. The request returns values for field names matching these patterns in thehits.fields
property of the response. -
suggest
object -
terminate_after
number The maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting.
IMPORTANT: Use with caution. Elasticsearch applies this property to each shard handling the request. When possible, let Elasticsearch perform early termination automatically. Avoid specifying this property for requests that target data streams with backing indices across multiple data tiers.
If set to
0
(default), the query does not terminate early. -
timeout
string The period of time to wait for a response from each shard. If no response is received before the timeout expires, the request fails and returns an error. Defaults to no timeout.
-
track_scores
boolean If
true
, calculate and return document scores, even if the scores are not used for sorting. -
version
boolean If
true
, the request returns the document version as part of a hit. -
seq_no_primary_term
boolean If
true
, the request returns sequence number and primary term of the last modification of each hit.External documentation -
stored_fields
string | array[string] -
pit
object -
runtime_mappings
object -
stats
array[string] The stats groups to associate with the search. Each group maintains a statistics aggregation for its associated searches. You can retrieve these stats using the indices stats API.
curl \
--request POST 'http://api.example.com/_search' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"query\": {\n \"term\": {\n \"user.id\": \"kimchy\"\n }\n }\n}"'
{
"query": {
"term": {
"user.id": "kimchy"
}
}
}
{
"size": 100,
"query": {
"match" : {
"title" : "elasticsearch"
}
},
"pit": {
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "1m"
}
}
{
"slice": {
"id": 0,
"max": 2
},
"query": {
"match": {
"message": "foo"
}
},
"pit": {
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="
}
}
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 20,
"relation": "eq"
},
"max_score": 1.3862942,
"hits": [
{
"_index": "my-index-000001",
"_id": "0",
"_score": 1.3862942,
"_source": {
"@timestamp": "2099-11-15T14:12:12",
"http": {
"request": {
"method": "get"
},
"response": {
"status_code": 200,
"bytes": 1070000
},
"version": "1.1"
},
"source": {
"ip": "127.0.0.1"
},
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
}
]
}
}
Run a search
Get search hits that match the query defined in the request.
You can provide search queries using the q
query string parameter or the request body.
If both are specified, only the query parameter is used.
If the Elasticsearch security features are enabled, you must have the read index privilege for the target data stream, index, or alias. For cross-cluster search, refer to the documentation about configuring CCS privileges.
To search a point in time (PIT) for an alias, you must have the read
index privilege for the alias's data streams or indices.
Search slicing
When paging through a large number of documents, it can be helpful to split the search into multiple slices to consume them independently with the slice
and pit
properties.
By default the splitting is done first on the shards, then locally on each shard.
The local splitting partitions the shard into contiguous ranges based on Lucene document IDs.
For instance if the number of shards is equal to 2 and you request 4 slices, the slices 0 and 2 are assigned to the first shard and the slices 1 and 3 are assigned to the second shard.
IMPORTANT: The same point-in-time ID should be used for all slices. If different PIT IDs are used, slices can overlap and miss documents. This situation can occur because the splitting criterion is based on Lucene document IDs, which are not stable across changes to the index.
Path parameters
-
index
string | array[string] Required A comma-separated list of data streams, indices, and aliases to search. It supports wildcards (
*
). To search all data streams and indices, omit this parameter or use*
or_all
.
Query parameters
-
allow_no_indices
boolean If
false
, the request returns an error if any wildcard expression, index alias, or_all
value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targetingfoo*,bar*
returns an error if an index starts withfoo
but no index starts withbar
. -
allow_partial_search_results
boolean If
true
and there are shard request timeouts or shard failures, the request returns partial results. Iffalse
, it returns an error with no partial results.To override the default behavior, you can set the
search.default_allow_partial_results
cluster setting tofalse
. -
analyzer
string The analyzer to use for the query string. This parameter can be used only when the
q
query string parameter is specified. -
analyze_wildcard
boolean If
true
, wildcard and prefix queries are analyzed. This parameter can be used only when theq
query string parameter is specified. -
batched_reduce_size
number The number of shard results that should be reduced at once on the coordinating node. If the potential number of shards in the request can be large, this value should be used as a protection mechanism to reduce the memory overhead per search request.
-
ccs_minimize_roundtrips
boolean If
true
, network round-trips between the coordinating node and the remote clusters are minimized when running cross-cluster search (CCS) requests. -
default_operator
string The default operator for the query string query:
AND
orOR
. This parameter can be used only when theq
query string parameter is specified.Values are
and
,AND
,or
, orOR
. -
df
string The field to use as a default when no field prefix is given in the query string. This parameter can be used only when the
q
query string parameter is specified. -
docvalue_fields
string | array[string] A comma-separated list of fields to return as the docvalue representation of a field for each hit.
-
expand_wildcards
string | array[string] The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. It supports comma-separated values such as
open,hidden
.Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
-
explain
boolean If
true
, the request returns detailed information about score computation as part of a hit. -
ignore_throttled
boolean Deprecated If
true
, concrete, expanded or aliased indices will be ignored when frozen. -
include_named_queries_score
boolean If
true
, the response includes the score contribution from any named queries.This functionality reruns each named query on every hit in a search response. Typically, this adds a small overhead to a request. However, using computationally expensive named queries on a large number of hits may add significant overhead.
-
lenient
boolean If
true
, format-based query failures (such as providing text to a numeric field) in the query string will be ignored. This parameter can be used only when theq
query string parameter is specified. -
The number of concurrent shard requests per node that the search runs concurrently. This value should be used to limit the impact of the search on the cluster in order to limit the number of concurrent shard requests.
-
preference
string The nodes and shards used for the search. By default, Elasticsearch selects from eligible nodes and shards using adaptive replica selection, accounting for allocation awareness. Valid values are:
_only_local
to run the search only on shards on the local node._local
to, if possible, run the search on shards on the local node, or if not, select shards using the default method._only_nodes:<node-id>,<node-id>
to run the search on only the specified nodes IDs. If suitable shards exist on more than one selected node, use shards on those nodes using the default method. If none of the specified nodes are available, select shards from any available node using the default method._prefer_nodes:<node-id>,<node-id>
to if possible, run the search on the specified nodes IDs. If not, select shards using the default method._shards:<shard>,<shard>
to run the search only on the specified shards. You can combine this value with otherpreference
values. However, the_shards
value must come first. For example:_shards:2,3|_local
.<custom-string>
(any string that does not start with_
) to route searches with the same<custom-string>
to the same shards in the same order.
-
pre_filter_shard_size
number A threshold that enforces a pre-filter roundtrip to prefilter search shards based on query rewriting if the number of shards the search request expands to exceeds the threshold. This filter roundtrip can limit the number of shards significantly if for instance a shard can not match any documents based on its rewrite method (if date filters are mandatory to match but the shard bounds and the query are disjoint). When unspecified, the pre-filter phase is executed if any of these conditions is met:
- The request targets more than 128 shards.
- The request targets one or more read-only index.
- The primary sort of the query targets an indexed field.
-
request_cache
boolean If
true
, the caching of search results is enabled for requests wheresize
is0
. It defaults to index level settings. -
routing
string A custom value that is used to route operations to a specific shard.
-
scroll
string The period to retain the search context for scrolling. By default, this value cannot exceed
1d
(24 hours). You can change this limit by using thesearch.max_keep_alive
cluster-level setting. -
search_type
string Indicates how distributed term frequencies are calculated for relevance scoring.
Supported values include:
query_then_fetch
: Documents are scored using local term and document frequencies for the shard. This is usually faster but less accurate.dfs_query_then_fetch
: Documents are scored using global term and document frequencies across all shards. This is usually slower but more accurate.
Values are
query_then_fetch
ordfs_query_then_fetch
. -
stats
array[string] Specific
tag
of the request for logging and statistical purposes. -
stored_fields
string | array[string] A comma-separated list of stored fields to return as part of a hit. If no fields are specified, no stored fields are included in the response. If this field is specified, the
_source
parameter defaults tofalse
. You can pass_source: true
to return both source fields and stored fields in the search response. -
suggest_field
string The field to use for suggestions.
-
suggest_mode
string The suggest mode. This parameter can be used only when the
suggest_field
andsuggest_text
query string parameters are specified.Supported values include:
missing
: Only generate suggestions for terms that are not in the shard.popular
: Only suggest terms that occur in more docs on the shard than the original term.always
: Suggest any matching suggestions based on terms in the suggest text.
Values are
missing
,popular
, oralways
. -
suggest_size
number The number of suggestions to return. This parameter can be used only when the
suggest_field
andsuggest_text
query string parameters are specified. -
suggest_text
string The source text for which the suggestions should be returned. This parameter can be used only when the
suggest_field
andsuggest_text
query string parameters are specified. -
terminate_after
number The maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting.
IMPORTANT: Use with caution. Elasticsearch applies this parameter to each shard handling the request. When possible, let Elasticsearch perform early termination automatically. Avoid specifying this parameter for requests that target data streams with backing indices across multiple data tiers. If set to
0
(default), the query does not terminate early. -
timeout
string The period of time to wait for a response from each shard. If no response is received before the timeout expires, the request fails and returns an error. It defaults to no timeout.
-
track_total_hits
boolean | number The number of hits matching the query to count accurately. If
true
, the exact number of hits is returned at the cost of some performance. Iffalse
, the response does not include the total number of hits matching the query. -
track_scores
boolean If
true
, the request calculates and returns document scores, even if the scores are not used for sorting. -
typed_keys
boolean If
true
, aggregation and suggester names are be prefixed by their respective types in the response. -
rest_total_hits_as_int
boolean Indicates whether
hits.total
should be rendered as an integer or an object in the rest search response. -
version
boolean If
true
, the request returns the document version as part of a hit. -
_source
boolean | string | array[string] The source fields that are returned for matching documents. These fields are returned in the
hits._source
property of the search response. Valid values are:true
to return the entire document source.false
to not return the document source.<string>
to return the source fields that are specified as a comma-separated list that supports wildcard (*
) patterns.
-
_source_excludes
string | array[string] A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in
_source_includes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
_source_includes
string | array[string] A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the
_source_excludes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
seq_no_primary_term
boolean If
true
, the request returns the sequence number and primary term of the last modification of each hit. -
q
string A query in the Lucene query string syntax. Query parameter searches do not support the full Elasticsearch Query DSL but are handy for testing.
IMPORTANT: This parameter overrides the query parameter in the request body. If both parameters are specified, documents matching the query request body parameter are not returned.
-
size
number The number of hits to return. By default, you cannot page through more than 10,000 hits using the
from
andsize
parameters. To page through more hits, use thesearch_after
parameter. -
from
number The starting document offset, which must be non-negative. By default, you cannot page through more than 10,000 hits using the
from
andsize
parameters. To page through more hits, use thesearch_after
parameter. -
sort
string | array[string] A comma-separated list of
<field>:<direction>
pairs.
Body
-
aggregations
object Defines the aggregations that are run as part of the search request.
External documentation -
collapse
object External documentation -
explain
boolean If
true
, the request returns detailed information about score computation as part of a hit. -
ext
object Configuration of search extensions defined by Elasticsearch plugins.
-
from
number The starting document offset, which must be non-negative. By default, you cannot page through more than 10,000 hits using the
from
andsize
parameters. To page through more hits, use thesearch_after
parameter. -
highlight
object -
track_total_hits
boolean | number Number of hits matching the query to count accurately. If true, the exact number of hits is returned at the cost of some performance. If false, the response does not include the total number of hits matching the query. Defaults to 10,000 hits.
-
indices_boost
array[object] Boost the
_score
of documents from specified indices. The boost value is the factor by which scores are multiplied. A boost value greater than1.0
increases the score. A boost value between0
and1.0
decreases the score.External documentation -
docvalue_fields
array[object] An array of wildcard (
*
) field patterns. The request returns doc values for field names matching these patterns in thehits.fields
property of the response.External documentation knn
object | array[object] The approximate kNN search to run.
-
min_score
number The minimum
_score
for matching documents. Documents with a lower_score
are not included in search results and results collected by aggregations. -
post_filter
object An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
External documentation -
profile
boolean Set to
true
to return detailed timing information about the execution of individual components in a search request. NOTE: This is a debugging tool and adds significant overhead to search execution. -
query
object An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
External documentation rescore
object | array[object] Can be used to improve precision by reordering just the top (for example 100 - 500) documents returned by the
query
andpost_filter
phases.-
retriever
object -
script_fields
object Retrieve a script evaluation (based on different fields) for each hit.
-
search_after
array[number | string | boolean | null] A field value.
-
size
number The number of hits to return, which must not be negative. By default, you cannot page through more than 10,000 hits using the
from
andsize
parameters. To page through more hits, use thesearch_after
property. -
slice
object _source
boolean | object Defines how to fetch a source. Fetching can be disabled entirely, or the source can be filtered.
-
fields
array[object] An array of wildcard (
*
) field patterns. The request returns values for field names matching these patterns in thehits.fields
property of the response. -
suggest
object -
terminate_after
number The maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting.
IMPORTANT: Use with caution. Elasticsearch applies this property to each shard handling the request. When possible, let Elasticsearch perform early termination automatically. Avoid specifying this property for requests that target data streams with backing indices across multiple data tiers.
If set to
0
(default), the query does not terminate early. -
timeout
string The period of time to wait for a response from each shard. If no response is received before the timeout expires, the request fails and returns an error. Defaults to no timeout.
-
track_scores
boolean If
true
, calculate and return document scores, even if the scores are not used for sorting. -
version
boolean If
true
, the request returns the document version as part of a hit. -
seq_no_primary_term
boolean If
true
, the request returns sequence number and primary term of the last modification of each hit.External documentation -
stored_fields
string | array[string] -
pit
object -
runtime_mappings
object -
stats
array[string] The stats groups to associate with the search. Each group maintains a statistics aggregation for its associated searches. You can retrieve these stats using the indices stats API.
curl \
--request POST 'http://api.example.com/{index}/_search' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"query\": {\n \"term\": {\n \"user.id\": \"kimchy\"\n }\n }\n}"'
{
"query": {
"term": {
"user.id": "kimchy"
}
}
}
{
"size": 100,
"query": {
"match" : {
"title" : "elasticsearch"
}
},
"pit": {
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "1m"
}
}
{
"slice": {
"id": 0,
"max": 2
},
"query": {
"match": {
"message": "foo"
}
},
"pit": {
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="
}
}
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 20,
"relation": "eq"
},
"max_score": 1.3862942,
"hits": [
{
"_index": "my-index-000001",
"_id": "0",
"_score": 1.3862942,
"_source": {
"@timestamp": "2099-11-15T14:12:12",
"http": {
"request": {
"method": "get"
},
"response": {
"status_code": 200,
"bytes": 1070000
},
"version": "1.1"
},
"source": {
"ip": "127.0.0.1"
},
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
}
]
}
}
Get terms in an index
Added in 7.14.0
Discover terms that match a partial string in an index. This API is designed for low-latency look-ups used in auto-complete scenarios.
The terms enum API may return terms from deleted documents. Deleted documents are initially only marked as deleted. It is not until their segments are merged that documents are actually deleted. Until that happens, the terms enum API will return terms from these documents.
Path parameters
-
index
string Required A comma-separated list of data streams, indices, and index aliases to search. Wildcard (
*
) expressions are supported. To search all data streams or indices, omit this parameter or use*
or_all
.
Body
-
field
string Required Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
size
number The number of matching terms to return.
-
timeout
string A duration. Units can be
nanos
,micros
,ms
(milliseconds),s
(seconds),m
(minutes),h
(hours) andd
(days). Also accepts "0" without a unit and "-1" to indicate an unspecified value. -
case_insensitive
boolean When
true
, the provided search string is matched against index terms without case sensitivity. -
index_filter
object An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
External documentation -
string
string The string to match at the start of indexed terms. If it is not provided, all terms in the field are considered.
The prefix string cannot be larger than the largest possible keyword value, which is Lucene's term byte-length limit of 32766.
-
search_after
string The string after which terms in the index should be returned. It allows for a form of pagination if the last result from one request is passed as the
search_after
parameter for a subsequent request.
curl \
--request GET 'http://api.example.com/{index}/_terms_enum' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"field\" : \"tags\",\n \"string\" : \"kiba\"\n}"'
{
"field" : "tags",
"string" : "kiba"
}
{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"terms": [
"kibana"
],
"complete" : true
}
Start a transform
Added in 7.5.0
When you start a transform, it creates the destination index if it does not already exist. The number_of_shards
is
set to 1
and the auto_expand_replicas
is set to 0-1
. If it is a pivot transform, it deduces the mapping
definitions for the destination index from the source indices and the transform aggregations. If fields in the
destination index are derived from scripts (as in the case of scripted_metric
or bucket_script
aggregations),
the transform uses dynamic mappings unless an index template exists. If it is a latest transform, it does not deduce
mapping definitions; it uses dynamic mappings. To use explicit mappings, create the destination index before you
start the transform. Alternatively, you can create an index template, though it does not affect the deduced mappings
in a pivot transform.
When the transform starts, a series of validations occur to ensure its success. If you deferred validation when you created the transform, they occur when you start the transform—with the exception of privilege checks. When Elasticsearch security features are enabled, the transform remembers which roles the user that created it had at the time of creation and uses those same roles. If those roles do not have the required privileges on the source and destination indices, the transform fails when it attempts unauthorized operations.
Path parameters
-
transform_id
string Required Identifier for the transform.
Query parameters
-
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
-
from
string Restricts the set of transformed entities to those changed after this time. Relative times like now-30d are supported. Only applicable for continuous transforms.
curl \
--request POST 'http://api.example.com/_transform/{transform_id}/_start' \
--header "Authorization: $API_KEY"
{
"acknowledged": true
}