ELASTICSEARCH
What’s new since 0.90?
techtalk @ ferret
• Latest stable release: Elasticsearch 1.1.0	

• Released: 25.03.2014	

• Based on Lucene 4.6.1
BREAKING CHANGES
in versions 1.x
CONFIGURATION
• The cluster.routing.allocation settings (disable_allocation,
disable_new_allocation and disable_replica_location) have
been replaced by the single setting:
cluster.routing.allocation.enable: all|primaries|new_primaries|
none	

• Elasticsearch on 64 bit Linux now uses mmapfs by default. Make
sure that you set MAX_MAP_COUNT to a sufficiently high
number. The RPM and Debian packages default this value to
262144.
MULTI-FIELDS
Existing multi-fields will be upgraded to the new format automatically.
"title": {	

"type": "multi_field",	

"fields": {	

"title": { "type": "string" },	

"raw": { 	

"type":“string",	

"index": "not_analyzed" 	

}	

}	

}
"title": {	

"type": "string",	

"fields": {	

"raw": { 	

"type":“string",	

"index": "not_analyzed" }	

}	

}
STOPWORDS
• Previously, the standard and pattern analyzers used
the list of English stopwords by default, which
caused some hard to debug indexing issues.	

• Now they are set to use the empty stopwords list
(ie _none_) instead.
RETURNVALUES
• The ok return value has been removed from all response bodies
as it added no useful information.	

• The found, not_found and exists return values have been
unified as found on all relevant APIs.	

• Field values, in response to the fields parameter, are now always
returned as arrays. Metadata fields are always returned as scalars.	

• The analyze API no longer supports the text response format,
but does support JSON andYAML.
DEPRECATIONS
• Per-document boosting with the _boost field has been
removed.You can use the function_score instead.	

• The custom_score and custom_boost_score is no longer
supported. You can use function_score instead.	

• The field query has been removed. Use the query_string
query instead.	

• The path parameter in mappings has been deprecated. Use
the copy_to parameter instead.
AGGREGATIONS
since version 1.0.0
AGGREGATIONTYPES
• Bucketing aggregations	

Aggregations that build buckets, where each bucket is associated with a key and a
document criterion.	

!
Examples: range, terms, histogram	

!
Bucketing aggregations can have sub-aggregations (bucketing or metric). The sub-aggregations
will be computed for the buckets which their parent aggregation generates.
• Metrics aggregations	

Aggregations that keep track and compute metrics over a set of documents.	

!
Examples: min, max, stats
{	

"aggs" : {	

"price_ranges" : {	

"range" : {	

"field" : "price",	

"ranges" : [	

{ "to" : 50 },	

{ "from" : 100 }	

]	

},	

"aggs" : {	

"price_stats" : {	

"stats" : { "field" : "price" }	

}	

}	

}	

}	

}
{	

"aggregations": {	

"price_ranges" : {	

"buckets": [	

{	

"to": 50,	

"doc_count": 2,	

"price_stats": {	

"count": 2,	

"min": 20,	

"max": 47,	

"avg": 33.5,	

"sum": 67	

}	

}, …	

]	

}	

}	

}
CARDINALITY
The cardinality aggregation is a metric aggregation that allows to compute approximate unique
counts based on the HyperLogLog++ algorithm which has the nice properties of both being close
to accurate on low cardinalities and having fixed memory usage so that estimating high cardinalities
doesn't blow up memory.
{	

"aggs" : {	

"author_count" : {	

"cardinality" : {	

"field" : "author"	

}	

}	

}	

}
PERCENTILES
A percentiles aggregation would allow to compute (approximate) values of arbitrary percentiles
based on the t-digest algorithm. Computing exact percentiles is not reasonably feasible as it would
require shards to stream all values to the node that coordinates search execution, which could be
gigabytes on a high-cardinality field.
1.1.0
{	

"aggs" : {	

"load_time_outlier" : {	

"percentiles" : {	

"field" : "load_time" 	

}	

}	

}	

}
{	

...	

"aggregations": {	

"load_time_outlier": {	

"1.0": 15,	

"5.0": 20,	

"25.0": 23,	

"50.0": 25,	

"75.0": 29,	

"95.0": 60,	

"99.0": 150	

}	

}	

}
SIGNIFICANT_TERMS
{	

"query" : {	

"terms" : {	

"force" : [ "BritishTransport Police" ]	

}	

},	

"aggregations" : {	

"significantCrimeTypes" : {	

"significant_terms" : { "field" : "crime_type" }	

}	

}	

}
An aggregation that identifies terms that are significant rather than merely popular in a result set.
Significance is related to the changes in document frequency observed between everyday use in the
corpus and frequency observed in the result set.
1.1.0
{	

"aggregations" : {	

"significantCrimeTypes" : {	

"doc_count": 47347,	

"buckets" : [	

{	

"key": "Bicycle theft",	

"doc_count": 3640,	

"score": 0.371235374214817,	

"bg_count": 66799	

}, …	

]	

}	

}	

}
IMPROVEMENTS
1.1.0
TERMS AGGREGATION
• Before 1.1.0 terms aggregations return up to size terms, so the way
to get all matching terms back was to set size to an arbitrary high
number that would be larger than the number of unique terms.	

!
• Since version 1.1.0 to get ALL terms just set size=0
MULTI-FIELD SEARCH
• The multi_match query now supports three types of execution:

• best_fields (field-centric, default) Find the field that best matches the
query string. Useful for finding a single concept like “full text search” in
either the title or the body field.	

!
• most_fields (field-centric) Find all matching fields and add up their
scores. Useful for matching against multi-fields, where the same text
has been analyzed in different ways to improve the relevance score:
with/without stemming, shingles, edge-ngrams etc.	

!
• cross_fields (term-centric) New execution mode which looks for
each term in any of the listed fields. Useful for documents whose
identifying features are spread across multiple fields, such as
first_name and last_name, and supports the minimum_should_match
operator in a more natural way than the other two modes.
CAT API
since version 1.0.0
JSON is great… for computers. Human eyes, especially when looking at an ssh terminal, need
compact and aligned text.The cat API aims to meet this need.
$ curl 'localhost:9200/_cat/nodes?h=ip,port,heapPercent,name'	

192.168.56.40 9300 40.3 Captain Universe	

192.168.56.20 9300 15.3 Kaluu	

192.168.56.50 9300 17.0Yellowjacket	

192.168.56.10 9300 12.3 Remy LeBeau	

192.168.56.30 9300 43.9 Ramsey, Doug
TRIBE NODES
since version 1.0.0
The tribes feature allows a tribe node to act as a federated client across multiple clusters.
tribe:	

t1: 	

cluster.name: cluster_one	

t2: 	

cluster.name: cluster_two
elasticsearch.yml
The merged global cluster state means that almost all operations work in the sam
way as a single cluster: distributed search, suggest, percolation, indexing, etc.	

!
However, there are a few exceptions:	

• The merged view cannot handle indices with the same name in multiple cluster
• Master level read operations (eg Cluster State, Cluster Health) will automati
execute with a local flag set to true since there is no master.	

• Master level write operations (eg Create Index) are not allowed.These should
performed on a single cluster.
BACKUP & RESTORE
since version 1.0.0
REPOSITORIES
$ curl -XPUT 'http://localhost:9200/_snapshot/my_backup' -d '{	

"type": "fs",	

"settings": {	

"location": "/mount/backups/my_backup",	

"compress": true }}'
Before any snapshot or restore operation can be performed a snapshot
repository should be registered in Elasticsearch.
Supported repository types:	

• fs (filesystem)	

• S3	

• HDFS (Hadoop)	

• Azure
SNAPSHOTS
$ curl -XPUT "localhost:9200/_snapshot/my_backup/snapshot_1" -d '{	

"indices": "index_1,index_2"	

}'
A repository can contain multiple snapshots of the same cluster. Snapshot are
identified by unique names within the cluster.
• The index snapshot process is incremental.	

• Only one snapshot process can be executed in the cluster at any
time.	

• Snapshotting process is executed in non-blocking fashion
RESTORE
$ curl -XPOST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore" -d '{	

"indices": "index_1,index_2",	

"rename_pattern": "index_(.+)",	

"rename_replacement": "restored_index_$1"	

}'
A snapshot can be restored using the following command:
• The restore operation can be performed on a functioning cluster.	

• An existing index can be only restored if it’s closed.	

• The restored persistent settings are added to the existing
persistent settings.
ELASTICSEARCH-PY
Official low-level client for Elasticsearch
Features:	

• translating basic Python data types to and from json (datetimes are not
decoded for performance reasons)	

• configurable automatic discovery of cluster nodes	

• persistent connections	

• load balancing (with pluggable selection strategy) across all available nodes	

• failed connection penalization (time based - failed connections won’t be
retried until a timeout is reached)	

• thread safety	

• pluggable architecture	

Versioning:
• There are two branches - master and 0.4. Master branch is used to track all the
changes for Elasticsearch 1.0 and beyond whereas 0.4 tracks Elasticsearch 0.90.

More Related Content

PDF
Buzzwords 2014 / Overview / part1
PDF
Introduction to elasticsearch
PDF
What's new in Elasticsearch v5
PDF
Elasticsearch in Netflix
PDF
Introduction to elasticsearch
PPTX
quick intro to elastic search
PPTX
Real Time search using Spark and Elasticsearch
PDF
Elasticsearch and Spark
Buzzwords 2014 / Overview / part1
Introduction to elasticsearch
What's new in Elasticsearch v5
Elasticsearch in Netflix
Introduction to elasticsearch
quick intro to elastic search
Real Time search using Spark and Elasticsearch
Elasticsearch and Spark

What's hot (20)

PPTX
Log analysis using Logstash,ElasticSearch and Kibana
PPTX
Elasticsearch 5.0
PPTX
Centralized log-management-with-elastic-stack
PPTX
Elastic search overview
PDF
Introduction to Elasticsearch
PDF
From Lucene to Elasticsearch, a short explanation of horizontal scalability
ODP
Elasticsearch for beginners
PPTX
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
PPTX
ElasticSearch Basic Introduction
PDF
Elasticsearch From the Bottom Up
PDF
Solr + Hadoop = Big Data Search
PDF
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
PPTX
Real time analytics using Hadoop and Elasticsearch
PDF
Workshop: Learning Elasticsearch
PPTX
Case study of Rujhaan.com (A social news app )
PDF
The How and Why of Fast Data Analytics with Apache Spark
PPTX
ElasticSearch in Production: lessons learned
PPTX
ELK - Stack - Munich .net UG
PDF
Scala and jvm_languages_praveen_technologist
PDF
Managing Your Content with Elasticsearch
Log analysis using Logstash,ElasticSearch and Kibana
Elasticsearch 5.0
Centralized log-management-with-elastic-stack
Elastic search overview
Introduction to Elasticsearch
From Lucene to Elasticsearch, a short explanation of horizontal scalability
Elasticsearch for beginners
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
ElasticSearch Basic Introduction
Elasticsearch From the Bottom Up
Solr + Hadoop = Big Data Search
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Real time analytics using Hadoop and Elasticsearch
Workshop: Learning Elasticsearch
Case study of Rujhaan.com (A social news app )
The How and Why of Fast Data Analytics with Apache Spark
ElasticSearch in Production: lessons learned
ELK - Stack - Munich .net UG
Scala and jvm_languages_praveen_technologist
Managing Your Content with Elasticsearch
Ad

Viewers also liked (17)

PDF
Introduction to Elasticsearch
PPTX
An Introduction to Elastic Search.
KEY
ElasticSearch at berlinbuzzwords 2010
PDF
Your Data, Your Search, ElasticSearch (EURUKO 2011)
PDF
Elasticsearch Introduction at BigData meetup
ODP
Elasticsearch presentation 1
PDF
LogStash - Yes, logging can be awesome
PDF
Elasticsearch for Data Analytics
PDF
Down and dirty with Elasticsearch
PDF
Scaling real-time search and analytics with Elasticsearch
PPSX
Yemen's Remote Mountain Villages
PPTX
U.s. Immigration Demographics and Immigrant Integration
PDF
Что мы сделали в 2015 году?
PPTX
PPSX
TI04_Licencias_ Creative_ commons
PPT
Web 2.0: Warum virtuelle und reale Welt untrennbar miteinander verbunden sind
ODP
Red Hat Storage 3.0
Introduction to Elasticsearch
An Introduction to Elastic Search.
ElasticSearch at berlinbuzzwords 2010
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Elasticsearch Introduction at BigData meetup
Elasticsearch presentation 1
LogStash - Yes, logging can be awesome
Elasticsearch for Data Analytics
Down and dirty with Elasticsearch
Scaling real-time search and analytics with Elasticsearch
Yemen's Remote Mountain Villages
U.s. Immigration Demographics and Immigrant Integration
Что мы сделали в 2015 году?
TI04_Licencias_ Creative_ commons
Web 2.0: Warum virtuelle und reale Welt untrennbar miteinander verbunden sind
Red Hat Storage 3.0
Ad

Similar to Elasticsearch (20)

PPTX
Introduction To Apache Mesos
PPTX
Building and Deploying Application to Apache Mesos
PPTX
Apache Kafka, HDFS, Accumulo and more on Mesos
PDF
ELK - What's new and showcases
PDF
KSQL - Stream Processing simplified!
PDF
Null Bachaav - May 07 Attack Monitoring workshop.
PDF
PostgreSQL High_Performance_Cheatsheet
PDF
Webinar: What's New in Solr 6
PPTX
ElasticSearch for .NET Developers
PDF
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
PPT
How ElasticSearch lives in my DevOps life
PPT
Logstash
PDF
Introduction to Elasticsearch
PPTX
Cassandra
PDF
Into The Box 2018 - CBT
PDF
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
PPT
Elk presentation1#3
PDF
MySQL 5.7 - What's new, How to upgrade and Document Store
PDF
HOW TO SCALE FROM ZERO TO BILLIONS!
PDF
Qlik_Sense_May_2023_Viz_update_1683564048dddddddd.pdf
Introduction To Apache Mesos
Building and Deploying Application to Apache Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
ELK - What's new and showcases
KSQL - Stream Processing simplified!
Null Bachaav - May 07 Attack Monitoring workshop.
PostgreSQL High_Performance_Cheatsheet
Webinar: What's New in Solr 6
ElasticSearch for .NET Developers
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
How ElasticSearch lives in my DevOps life
Logstash
Introduction to Elasticsearch
Cassandra
Into The Box 2018 - CBT
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Elk presentation1#3
MySQL 5.7 - What's new, How to upgrade and Document Store
HOW TO SCALE FROM ZERO TO BILLIONS!
Qlik_Sense_May_2023_Viz_update_1683564048dddddddd.pdf

More from Andrii Gakhov (20)

PDF
Let's start GraphQL: structure, behavior, and architecture
PDF
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
PDF
Too Much Data? - Just Sample, Just Hash, ...
PDF
DNS Delegation
PPTX
Implementing a Fileserver with Nginx and Lua
PPTX
Pecha Kucha: Ukrainian Food Traditions
PDF
Probabilistic data structures. Part 4. Similarity
PDF
Probabilistic data structures. Part 3. Frequency
PDF
Probabilistic data structures. Part 2. Cardinality
PDF
Вероятностные структуры данных
PDF
Recurrent Neural Networks. Part 1: Theory
PDF
Apache Big Data Europe 2015: Selected Talks
PDF
Swagger / Quick Start Guide
PDF
API Days Berlin highlights
PDF
Apache Spark Overview @ ferret
PDF
Data Mining - lecture 8 - 2014
PDF
Data Mining - lecture 7 - 2014
PDF
Data Mining - lecture 6 - 2014
PDF
Data Mining - lecture 5 - 2014
PDF
Data Mining - lecture 4 - 2014
Let's start GraphQL: structure, behavior, and architecture
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Too Much Data? - Just Sample, Just Hash, ...
DNS Delegation
Implementing a Fileserver with Nginx and Lua
Pecha Kucha: Ukrainian Food Traditions
Probabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 2. Cardinality
Вероятностные структуры данных
Recurrent Neural Networks. Part 1: Theory
Apache Big Data Europe 2015: Selected Talks
Swagger / Quick Start Guide
API Days Berlin highlights
Apache Spark Overview @ ferret
Data Mining - lecture 8 - 2014
Data Mining - lecture 7 - 2014
Data Mining - lecture 6 - 2014
Data Mining - lecture 5 - 2014
Data Mining - lecture 4 - 2014

Recently uploaded (20)

PPTX
Chapter_05_System Modeling for software engineering
PDF
What Makes a Great Data Visualization Consulting Service.pdf
PDF
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
PDF
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
PDF
Crypto Loss And Recovery Guide By Expert Recovery Agency.
PDF
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
PDF
Module 1 - Introduction to Generative AI.pdf
PDF
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
PPTX
Human-Computer Interaction for Lecture 2
PPTX
Beige and Black Minimalist Project Deck Presentation (1).pptx
PDF
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
PPT
3.Software Design for software engineering
PPTX
Swiggy API Scraping A Comprehensive Guide on Data Sets and Applications.pptx
PPTX
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
PPTX
Human Computer Interaction lecture Chapter 2.pptx
PPTX
Human-Computer Interaction for Lecture 1
PPTX
ROI from Efficient Content & Campaign Management in the Digital Media Industry
PDF
Mobile App for Guard Tour and Reporting.pdf
PPTX
ESDS_SAP Application Cloud Offerings.pptx
PPTX
Comprehensive Guide to Digital Image Processing Concepts and Applications
Chapter_05_System Modeling for software engineering
What Makes a Great Data Visualization Consulting Service.pdf
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
Crypto Loss And Recovery Guide By Expert Recovery Agency.
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
Module 1 - Introduction to Generative AI.pdf
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
Human-Computer Interaction for Lecture 2
Beige and Black Minimalist Project Deck Presentation (1).pptx
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
3.Software Design for software engineering
Swiggy API Scraping A Comprehensive Guide on Data Sets and Applications.pptx
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
Human Computer Interaction lecture Chapter 2.pptx
Human-Computer Interaction for Lecture 1
ROI from Efficient Content & Campaign Management in the Digital Media Industry
Mobile App for Guard Tour and Reporting.pdf
ESDS_SAP Application Cloud Offerings.pptx
Comprehensive Guide to Digital Image Processing Concepts and Applications

Elasticsearch

  • 1. ELASTICSEARCH What’s new since 0.90? techtalk @ ferret
  • 2. • Latest stable release: Elasticsearch 1.1.0 • Released: 25.03.2014 • Based on Lucene 4.6.1
  • 4. CONFIGURATION • The cluster.routing.allocation settings (disable_allocation, disable_new_allocation and disable_replica_location) have been replaced by the single setting: cluster.routing.allocation.enable: all|primaries|new_primaries| none • Elasticsearch on 64 bit Linux now uses mmapfs by default. Make sure that you set MAX_MAP_COUNT to a sufficiently high number. The RPM and Debian packages default this value to 262144.
  • 5. MULTI-FIELDS Existing multi-fields will be upgraded to the new format automatically. "title": { "type": "multi_field", "fields": { "title": { "type": "string" }, "raw": { "type":“string", "index": "not_analyzed" } } } "title": { "type": "string", "fields": { "raw": { "type":“string", "index": "not_analyzed" } } }
  • 6. STOPWORDS • Previously, the standard and pattern analyzers used the list of English stopwords by default, which caused some hard to debug indexing issues. • Now they are set to use the empty stopwords list (ie _none_) instead.
  • 7. RETURNVALUES • The ok return value has been removed from all response bodies as it added no useful information. • The found, not_found and exists return values have been unified as found on all relevant APIs. • Field values, in response to the fields parameter, are now always returned as arrays. Metadata fields are always returned as scalars. • The analyze API no longer supports the text response format, but does support JSON andYAML.
  • 8. DEPRECATIONS • Per-document boosting with the _boost field has been removed.You can use the function_score instead. • The custom_score and custom_boost_score is no longer supported. You can use function_score instead. • The field query has been removed. Use the query_string query instead. • The path parameter in mappings has been deprecated. Use the copy_to parameter instead.
  • 10. AGGREGATIONTYPES • Bucketing aggregations Aggregations that build buckets, where each bucket is associated with a key and a document criterion. ! Examples: range, terms, histogram ! Bucketing aggregations can have sub-aggregations (bucketing or metric). The sub-aggregations will be computed for the buckets which their parent aggregation generates. • Metrics aggregations Aggregations that keep track and compute metrics over a set of documents. ! Examples: min, max, stats
  • 11. { "aggs" : { "price_ranges" : { "range" : { "field" : "price", "ranges" : [ { "to" : 50 }, { "from" : 100 } ] }, "aggs" : { "price_stats" : { "stats" : { "field" : "price" } } } } } } { "aggregations": { "price_ranges" : { "buckets": [ { "to": 50, "doc_count": 2, "price_stats": { "count": 2, "min": 20, "max": 47, "avg": 33.5, "sum": 67 } }, … ] } } }
  • 12. CARDINALITY The cardinality aggregation is a metric aggregation that allows to compute approximate unique counts based on the HyperLogLog++ algorithm which has the nice properties of both being close to accurate on low cardinalities and having fixed memory usage so that estimating high cardinalities doesn't blow up memory. { "aggs" : { "author_count" : { "cardinality" : { "field" : "author" } } } }
  • 13. PERCENTILES A percentiles aggregation would allow to compute (approximate) values of arbitrary percentiles based on the t-digest algorithm. Computing exact percentiles is not reasonably feasible as it would require shards to stream all values to the node that coordinates search execution, which could be gigabytes on a high-cardinality field. 1.1.0 { "aggs" : { "load_time_outlier" : { "percentiles" : { "field" : "load_time" } } } } { ... "aggregations": { "load_time_outlier": { "1.0": 15, "5.0": 20, "25.0": 23, "50.0": 25, "75.0": 29, "95.0": 60, "99.0": 150 } } }
  • 14. SIGNIFICANT_TERMS { "query" : { "terms" : { "force" : [ "BritishTransport Police" ] } }, "aggregations" : { "significantCrimeTypes" : { "significant_terms" : { "field" : "crime_type" } } } } An aggregation that identifies terms that are significant rather than merely popular in a result set. Significance is related to the changes in document frequency observed between everyday use in the corpus and frequency observed in the result set. 1.1.0 { "aggregations" : { "significantCrimeTypes" : { "doc_count": 47347, "buckets" : [ { "key": "Bicycle theft", "doc_count": 3640, "score": 0.371235374214817, "bg_count": 66799 }, … ] } } }
  • 16. TERMS AGGREGATION • Before 1.1.0 terms aggregations return up to size terms, so the way to get all matching terms back was to set size to an arbitrary high number that would be larger than the number of unique terms. ! • Since version 1.1.0 to get ALL terms just set size=0
  • 17. MULTI-FIELD SEARCH • The multi_match query now supports three types of execution:
 • best_fields (field-centric, default) Find the field that best matches the query string. Useful for finding a single concept like “full text search” in either the title or the body field. ! • most_fields (field-centric) Find all matching fields and add up their scores. Useful for matching against multi-fields, where the same text has been analyzed in different ways to improve the relevance score: with/without stemming, shingles, edge-ngrams etc. ! • cross_fields (term-centric) New execution mode which looks for each term in any of the listed fields. Useful for documents whose identifying features are spread across multiple fields, such as first_name and last_name, and supports the minimum_should_match operator in a more natural way than the other two modes.
  • 19. JSON is great… for computers. Human eyes, especially when looking at an ssh terminal, need compact and aligned text.The cat API aims to meet this need. $ curl 'localhost:9200/_cat/nodes?h=ip,port,heapPercent,name' 192.168.56.40 9300 40.3 Captain Universe 192.168.56.20 9300 15.3 Kaluu 192.168.56.50 9300 17.0Yellowjacket 192.168.56.10 9300 12.3 Remy LeBeau 192.168.56.30 9300 43.9 Ramsey, Doug
  • 21. The tribes feature allows a tribe node to act as a federated client across multiple clusters. tribe: t1: cluster.name: cluster_one t2: cluster.name: cluster_two elasticsearch.yml The merged global cluster state means that almost all operations work in the sam way as a single cluster: distributed search, suggest, percolation, indexing, etc. ! However, there are a few exceptions: • The merged view cannot handle indices with the same name in multiple cluster • Master level read operations (eg Cluster State, Cluster Health) will automati execute with a local flag set to true since there is no master. • Master level write operations (eg Create Index) are not allowed.These should performed on a single cluster.
  • 22. BACKUP & RESTORE since version 1.0.0
  • 23. REPOSITORIES $ curl -XPUT 'http://localhost:9200/_snapshot/my_backup' -d '{ "type": "fs", "settings": { "location": "/mount/backups/my_backup", "compress": true }}' Before any snapshot or restore operation can be performed a snapshot repository should be registered in Elasticsearch. Supported repository types: • fs (filesystem) • S3 • HDFS (Hadoop) • Azure
  • 24. SNAPSHOTS $ curl -XPUT "localhost:9200/_snapshot/my_backup/snapshot_1" -d '{ "indices": "index_1,index_2" }' A repository can contain multiple snapshots of the same cluster. Snapshot are identified by unique names within the cluster. • The index snapshot process is incremental. • Only one snapshot process can be executed in the cluster at any time. • Snapshotting process is executed in non-blocking fashion
  • 25. RESTORE $ curl -XPOST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore" -d '{ "indices": "index_1,index_2", "rename_pattern": "index_(.+)", "rename_replacement": "restored_index_$1" }' A snapshot can be restored using the following command: • The restore operation can be performed on a functioning cluster. • An existing index can be only restored if it’s closed. • The restored persistent settings are added to the existing persistent settings.
  • 27. Features: • translating basic Python data types to and from json (datetimes are not decoded for performance reasons) • configurable automatic discovery of cluster nodes • persistent connections • load balancing (with pluggable selection strategy) across all available nodes • failed connection penalization (time based - failed connections won’t be retried until a timeout is reached) • thread safety • pluggable architecture Versioning: • There are two branches - master and 0.4. Master branch is used to track all the changes for Elasticsearch 1.0 and beyond whereas 0.4 tracks Elasticsearch 0.90.