0% found this document useful (0 votes)
8 views

elasticsearch_monitoring_cheatsheet

Uploaded by

Andriy Bilokin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

elasticsearch_monitoring_cheatsheet

Uploaded by

Andriy Bilokin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Cheatsheet: Elasticsearch Monitoring

Note:
ʒʒ Windows users should download cURL to use the commands below.
ʒʒ Some commands require jq to parse JSON for relevant metrics.
ʒʒ For more info, visit dtdg.co/monitoring-elasticsearch

General monitoring API endpoints Thread pool queues & rejections—more info
METRIC DESCRIPTION COMMAND METRIC DESCRIPTION COMMAND
Stats from all nodes curl 'localhost:9200/_nodes/stats' Number of queued threads in a thread pool curl 'localhost:9200/_nodes/stats/thread_pool' | jq '.nodes[]
| {node_name: .name, bulk_queue: .thread_pool.bulk.queue,
Stats from specific nodes curl 'localhost:9200/_nodes/­
node1,node2/stats' search_queue: .thread_pool.search.queue, index_queue:
Stats from a specific index curl 'localhost:9200/<INDEX_NAME>/_stats' .thread_pool.index.queue}'

curl 'localhost:9200/_cluster/stats' Number of rejected threads in a thread pool curl 'localhost:9200/_nodes/stats/thread_pool' |


Cluster-wide stats jq '.nodes[] | {node_name: .name, bulk_rejected:
.thread_pool.bulk.rejected, search_rejected:
.thread_pool.search.rejected, index_rejected:
.thread_pool.index.rejected}'
Cluster health—more info
METRIC DESCRIPTION COMMAND
Cluster status & unassigned shards curl 'localhost:9200/_cat/health?v' Fielddata cache usage
METRIC DESCRIPTION COMMAND
Size of the fielddata cache (bytes) curl 'localhost:9200/_cat/nodes?v&h=name,fielddataMemory'
Search performance—more info Number of evictions from the fielddata cache curl 'localhost:9200/_cat/nodes?v&h=name,fielddataEvictions'
METRIC DESCRIPTION COMMAND Number of times the fielddata circuit breaker curl 'localhost:9200/_nodes/stats/breaker' | jq '.nodes[] |
has been tripped (ES version >=1.3) {node_name: .name, fielddata: .breakers.fielddata}'
Total number of queries curl 'localhost:9200/_cat/nodes?v&h=name,searchQueryTotal'
Total time spent on queries curl 'localhost:9200/_cat/nodes?v&h=name,searchQueryTime'
Number of queries currently in progress curl 'localhost:9200/_cat/nodes?v&h=name,searchQueryCurrent'
Total number of fetches curl 'localhost:9200/_cat/nodes?v&h=name,searchFetchTotal' Host-level network and system metrics—more info
Total time spent on fetches curl 'localhost:9200/_cat/nodes?v&h=name,searchFetchTime' METRIC DESCRIPTION COMMAND
curl 'localhost:9200/_cat/nodes?v&h=name,searchFetchCurrent' Disk space total, free, available curl 'localhost:9200/_nodes/stats/fs' | jq '.nodes[] | {node_name:
Number of fetches currently in progress .name, disk_total_in_bytes: .fs.total.total_in_bytes,
disk_free_in_bytes: .fs.total.free_in_bytes, disk_available_in_bytes:
.fs.total.available_in_bytes}'
Percent of disk in use curl 'localhost:9200/_cat/allocation?v'
Indexing performance—more info curl 'localhost:9200/_nodes/​
stats/os'
Memory
METRIC DESCRIPTION COMMAND
CPU curl 'localhost:9200/_nodes/stats/os'
Total number of documents indexed curl 'localhost:9200/_cat/nodes?v&h=name,indexingIndexTotal'
I/O utilization Consult a tool like iostat
Total time spent indexing documents curl 'localhost:9200/_cat/nodes?v&h=name,indexingIndexTime'
Used file descriptors percentage curl 'localhost:9200/_cat/nodes?v&h=host,name,​
fileDescriptorPercent'
Number of documents currently being indexed curl 'localhost:9200/_cat/nodes?v&h=name,indexingIndexCurrent'
Network bytes sent/received curl 'localhost:9200/_nodes/stats/transport' | jq '.nodes[] |
Total number of index flushes to disk curl 'localhost:9200/_cat/nodes?v&h=name,flushTotal' {node_name: .name, network_bytes_sent: .transport.tx_size_in_bytes,
network_bytes_received: .transport.rx_size_in_bytes}'
Total time spent on flushing indices to disk curl 'localhost:9200/_cat/nodes?v&h=name,flushTotalTime'
HTTP connections currently curl 'localhost:9200/_nodes/stats/http' | jq '.nodes[] | {node_name:
open & total opened over time .name, http_current_open: .http.current_open, http_total_opened:
.http.total_opened}'

JVM heap usage—more info


METRIC DESCRIPTION COMMAND
Garbage collection frequency and duration curl 'localhost:9200/_nodes/stats/jvm' | jq '.nodes[] | Default directories
{node_name: .name, young_gc_count: DEBIAN/UBUNTU RHEL/CENTOS ZIP OR TAR INSTALLATION
.jvm.gc.collectors.young.collection_count, young_gc_time:
.jvm.gc.collectors.young.collection_time_in_millis, Configuration /etc /etc <ELASTICSEARCH INSTALLATION HOME
old_gc_count: .jvm.gc.collectors.old.collection_count, ↳/elasticsearch ↳/elasticsearch DIRECTORY>/config
old_gc_time: /var/log /var/log <ELASTICSEARCH INSTALLATION HOME
.jvm.gc.collectors.old.collection_time_in_millis}'
Logs
↳/elasticsearch ↳/elasticsearch DIRECTORY>/logs
Percent of JVM heap currently in use curl 'localhost:9200/_cat/nodes?v&h=name,heapPercent' /var/lib /var/lib <ELASTICSEARCH INSTALLATION HOME
Data
↳/elasticsearch ↳/elasticsearch DIRECTORY>/data
↳/data

Pending tasks
METRIC DESCRIPTION COMMAND
Number of pending tasks curl 'localhost:9200/_cluster/pending_tasks'
Cheatsheet: Elasticsearch Tuning
Note:
ʒʒ Windows users should download cURL to use the commands below.
Results of each suggested action may vary depending on your particular use case and setup.
Please test them out before implementing in production. For more info, visit dtdg.co/tuning-elasticsearch

Unassigned shards—more info Tune the JVM heap size


Check which shards are unassigned: Note: The Elasticsearch docs recommend setting your heap size below 50% of a node's available memory (and never going above
curl 'localhost:9200/_cat/shards' | grep UNASSIGNED 32GB), to leave more memory for the file system cache.
SUGGESTED ACTION COMMAND SUGGESTED ACTION COMMAND
Reduce number of replicas for an index (master curl -XPUT 'localhost:9200/<INDEX_NAME>/_settings' -d Set heap size upon starting up Elasticsearch ES_HEAP_SIZE=DESIRED_SIZE (e.g. "3g")
will not assign multiple copies of a shard on the '{"number_of_replicas": <DESIRED NUMBER OF REPLICAS>}' ./bin/elasticsearch
same node) export ES_HEAP_SIZE=DESIRED_SIZE (e.g. 3g)
Set heap as an environment variable (requires
Re-enable shard allocation curl -XPUT 'localhost:9200/_cluster/settings' -d Elasticsearch restart)
'{"transient": {"cluster.routing.allocation.enable": "all"}}'
Manually allocate an unassigned shard curl -XPOST 'localhost:9200/_cluster/reroute' -d
'{"commands": [{"allocate": {"index": "<INDEX_NAME>",
"shard": <SHARD_NUMBER>, "node": "<NODE_NAME>"}}]}' Bulk rejections—more info
Check disk usage; master node will not assign curl 'localhost:9200/_cat/allocation?v' Implement a linear or exponential backoff strategy until the bulk rejections decrease.
shards to any node using >85% of disk
Check that every node is running the same curl 'localhost:9200/_cat/nodes?v&h=host,name,version'
version of Elasticsearch; master node will not Backlog of pending tasks
assign to older version ʒʒ Allocate more resources to master-eligible nodes.
ʒʒ Create a new cluster if you suspect that the current cluster's demands have outgrown the master's capabilities.
ʒʒ Make sure your mappings do not allow users to create an unlimited number of new fields in documents.

Search performance—more info


Log slow queries in slow search log (replace with your desired thresholds):
curl -XPUT 'localhost:9200/<INDEX_NAME>/_settings' -d '{ Fielddata usage
"index.search.slowlog.threshold.query.warn" : "10s",
"index.search.slowlog.threshold.fetch.debug": "500ms", SUGGESTED ACTION COMMAND
"index.indexing.slowlog.threshold.index.info": "5s" Enable doc values for a non-analyzed string field curl -XPUT 'localhost:9200/<INDEX_NAME>/_mapping/<DOC_TYPE>'
}' (enabled by default for ES versions 2.0+) -d '{"properties": {"<FIELD_NAME>": {"type": "string",
"index": "not_analyzed", "doc_values": true }}}'
SUGGESTED ACTION COMMAND
Route high-priority, low-volume documents of curl -XPUT 'localhost:9200/<INDEX_NAME>' -d '{"mappings":
a <DOC_TYPE> to the same place so only one {"<DOC_TYPE>": {"_routing": {"required": true}}}}'
shard will be queried
Merge segments in an index ES versions 2.1.0+: Low disk space—more info
curl -XPOST 'localhost:9200/<INDEX_NAME>/_forcemerge' ʒʒ General actions:
ES versions prior to 2.1.0: ʒʒ Turn off replication for outdated data
curl -XPOST 'localhost:9200/<INDEX_NAME>/_optimize' ʒʒ Store old data off-cluster
ʒʒ If all nodes are running out of disk space:
ʒʒ Add more data-eligible nodes
ʒʒ If specific nodes are running out of disk space:
ʒʒ Reindex the data into a new index with a greater number of primary shards, and make sure you have
Indexing performance—more info enough data nodes to evenly distribute the shards
SUGGESTED ACTION COMMAND ʒʒ Upgrade the hardware on those nodes (scale vertically)
Bulk index documents from a JSON file curl -XPOST 'localhost:9200/<INDEX_NAME>/<MY_TYPE>/_bulk?pretty'
--data-binary "@<YOUR_FILE>.json"
Increase refresh interval to optimize curl -XPUT 'localhost:9200/<INDEX_NAME>/_settings' -d '{"index":
indexing, rather than making new data {"refresh_interval": DESIRED_INTERVAL, e.g. "30s"}}'
immediately searchable
Disable merge throttling to leave more curl -XPUT 'localhost:9200/_cluster/settings' -d '{"transient":
resources for indexing, not merging {"indices.store.throttle.type": "none"}}'
Disable shard replication curl -XPUT 'localhost:9200/<INDEX_NAME>/_settings' -d
'{"number_of_replicas": 0}'
Commit translog to disk less frequently curl -XPUT 'localhost:9200/<INDEX_NAME>/_settings' -d '{"index":
{"translog": {"durability": "async"}}}'
Cheatsheet: Elasticsearch Monitoring with Datadog
Note:
ʒʒ For metric descriptions and more info: dtdg.co/monitoring-elasticsearch

4. JVM heap usage—more info


METRIC DESCRIPTION DATADOG METRIC NAME
Garbage collection frequency and duration jvm.gc.collectors.young.count
jvm.gc.collectors.young.collection_time
jvm.gc.collectors.old.count
jvm.gc.collectors.old.collection_time
Percent of JVM heap currently in use jvm.mem.heap_in_use

5. Pending tasks
METRIC DESCRIPTION DATADOG METRIC NAME
Number of pending tasks elasticsearch.pending_tasks_total

6. Thread pool queues & rejections—more info


METRIC DESCRIPTION DATADOG METRIC NAME
Number of queued threads in a thread pool elasticsearch.thread_pool.bulk.queue
elasticsearch.thread_pool.index.queue
elasticsearch.thread_pool.search.queue
Number of rejected threads in a thread pool elasticsearch.thread_pool.bulk.rejected
elasticsearch.thread_pool.index.rejected
elasticsearch.thread_pool.search.rejected

7. Fielddata cache usage


METRIC DESCRIPTION DATADOG METRIC NAME
Size of the fielddata cache (bytes) elasticsearch.fielddata.size
Number of evictions from the fielddata cache elasticsearch.fielddata.evictions

Datadog's out-of-the-box screenboard. Sections 1-8 correspond to the metric categories outlined below. Number of times the fielddata circuit breaker elasticsearch.breakers.fielddata.tripped
has been tripped (ES version >=1.3)

1. Cluster health—more info


METRIC DESCRIPTION DATADOG METRIC NAME 8. Host-level network and system metrics—more info
Cluster status elasticsearch.cluster_status METRIC DESCRIPTION DATADOG METRIC NAME
Number of unassigned shards elasticsearch.unassigned_shards Percent of disk space in use system.disk.in_use
Page cache usage system.mem.cached
CPU system.cpu.system
2. Search performance—more info I/O utilization system.io.util
METRIC DESCRIPTION DATADOG METRIC NAME Open file descriptors elasticsearch.process.open_fd
Total number of queries elasticsearch.search.query.total Network bytes sent/received system.net.bytes_sent
system.net.bytes_rcvd
Total time spent on queries (s) elasticsearch.search.query.time
HTTP connections currently open & total elasticsearch.http.current_open
Number of queries in progress elasticsearch.search.query.current elasticsearch.http.total_opened
opened over time
Total number of fetches elasticsearch.search.fetch.total
Total time spent on fetches (s) elasticsearch.search.fetch.time
elasticsearch.search.fetch.current
Number of fetches in progress Default directories
DEBIAN/UBUNTU RHEL/CENTOS ZIP OR TAR INSTALLATION
Configuration /etc /etc <ELASTICSEARCH INSTALLATION HOME
3. Indexing performance—more info ↳/elasticsearch ↳/elasticsearch DIRECTORY>/config

METRIC DESCRIPTION DATADOG METRIC NAME Logs /var/log /var/log <ELASTICSEARCH INSTALLATION HOME
↳/elasticsearch ↳/elasticsearch DIRECTORY>/logs
Total number of documents indexed elasticsearch.indexing.index.total
Data /var/lib /var/lib <ELASTICSEARCH INSTALLATION HOME
Total time spent indexing documents (s) elasticsearch.indexing.index.time ↳/elasticsearch ↳/elasticsearch DIRECTORY>/data
↳/data
Number of documents currently being indexed elasticsearch.indexing.index.current
Total number of index flushes to disk elasticsearch.flush.total
Total time spent on flushing indices to disk (s) elasticsearch.flush.total.time

You might also like