ELASTICSEARCH –
SCALABILITY AND
MULTITENANCY
Bozhidar Bozhanov
ABOUT ME
• Founder at LogSentinel, an information security startup
• LogSentinel SIEM – product that indexes billions of logs with Elasticsearch
• https://techblog.bozho.net
• https://twitter.com/bozhobg
SCALABILITY AND MULTITENANCY
• Scalability – how to process millions (billions) of documents on multiple machines
• Multitenancy – how to have our system support multiple users/organizations while
segregating their data
• One can exist without the other
• Both are architectural and implementation tasks, not (just) work for Ops.
• „We’ ll push the data in whatever form and Ops will take care of the scaling “
ELASTICSEARCH BSICS
• “You know, for search”
• Indexing documents (document = anything)
• Full-text search and keyword search
• Allows for large clusters
• Licensing issues
USE-CASE: TIME-SERIES DATA
• Indexing events (logs, metrics, etc.)
• Wide-spread and widely applicable scenario
• Documents almost always have a timestamp
SHARDS
ZOOM-IN
LIMITING FACTORS
• One shard shouldn’t be to large
• Ideally between 10 and 50 GB; otherwise recovery after failure may not work
• The number of shards on a node is limited by RAM
• Lucene segments are append-only
• A large number of segments reduce performance
MULTITENANCY
• Cluster-per-tenant
• Heavy for administrations
• No real multitenancy
• Expensive
• Index-per-tenant
• Also heave for administration
• Doesn’t scale well
• Tenant-based routing
• Recommended in most cases
TENANT-BASED ROUTING
• _routing=<tenantId> or _routing=<tenantOwnedResourceId>
• E.g.. userId or dataSourceId
• Routing parameter designates which shard to be used for storing the document
• _routing for search requests tells Elasticsearch where to look for the data =>
faster search
• shard_num = hash(_routing) % num_primary_shards
• mappings._routing.required: true
STRUCTURE OF INDEXED DATA
• One field can have only one type
• The type is determined on index creation or on first indexed document with that
field
• User1 creates custom param “duration” of type String
• User2 wants to create “duration” of a numeric type -> error
• Solution: custom parameter hierarchies by type: params, numericParams,
dateParams, …
SCALABILITY
• „We add more machines and it’s good“?
• Recommended shard size (10-50 GB)
• We can’t change shards on a running index
• Lucene Segments are read-only:
• Deleting a document = bad
• Updating a document = bad
OPTIONS FOR STRUCTURING INDEXES
• We need a structure to allow indexing and searching in an arbitrarily large amount
of data
• One big, ever-growing index
• Convenient for small amounts of data, but faces all scalability problems
• Index-per-day / index-per-week / index-per-size
• Index-per-day-per-retention
• Rollover
• Deletion should be done by deleting whole indexes, not individual documents
MANY INDEXES FOR SEARCH, ONE FOR
INDEXING
• One search query can be directed to many indexes based on an index alias
• Supporting one (or several) active indexes for ingesting documents
• All other indexes– read-only
• This solves the problem with:
• Growing data and growing size of shards
• Deleting old data
EFFECTIVE INDEXING
• In real time (problem: too many requests to Elasticsearch)
• Storing in a database and indexing with a batch job
• Message queue (complex to implement) (we use Kafka)
• In-memory queue (might lose data)
• Batch-indexing when a given size or time threshold is reached
• Hybrid: bulk processing + database
• Quick indexing with in-memory queue + subsequent check based on the data in the database
• Avoid updates (=delete + insert)
CONCLUSION
• Elasticsearch is easy to get running
• …and complex for scaling
• Changes to a production setup are hard
• We must not throw scalability and multitenancy tasks to the Ops teams – they are
application problems
• Elasticsearch internals impose unintuitive limitations (“The law of leaky
abstractions”)
THANK YOU
Contacts: https://www.linkedin.com/in/bozhi
dar-bozhanov/
https://techblog.bozho.net
https://twitter.com/bozhobg
RESOURCES
• https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html
• https://techblog.bozho.net/elasticsearch-multitenancy-with-routing/
• https://techblog.bozho.net/near-real-time-indexing-with-elasticsearch/
• https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-
speed.html
• https://www.loggly.com/blog/nine-tips-configuring-elasticsearch-for-high-performance/
• https://tech.ebayinc.com/engineering/elasticsearch-performance-tuning-practice-at-ebay/

More Related Content

PPTX
Encryption in the enterprise
PPTX
Elasticsearch features presentation
PDF
DevCon Summit 2014 #DevelopersUnitePH: The "What" and "Why" of NoSQL by Matia...
PDF
"TextMining with ElasticSearch", Saskia Vola, CEO at textminers.io
PDF
An Introduction to Druid
PPTX
Generic Crawler
PPTX
Configuration in azure done right
PPTX
Breaking out of crypto authentication
Encryption in the enterprise
Elasticsearch features presentation
DevCon Summit 2014 #DevelopersUnitePH: The "What" and "Why" of NoSQL by Matia...
"TextMining with ElasticSearch", Saskia Vola, CEO at textminers.io
An Introduction to Druid
Generic Crawler
Configuration in azure done right
Breaking out of crypto authentication

What's hot (20)

PDF
Semi Structured Data
PPTX
Securing Passwords
PPTX
Big Data Overview Part 1
PDF
Active directory 101
PDF
Securing data and preventing data breaches
PDF
MongoDB meetup at Hike
PPTX
Market Trends in Microsoft Azure
PDF
Fast, Powerful and Scalable Analytics
PPTX
Elasticsearch tuning
PPTX
Introduction to Fauna
PDF
Building Advanced RESTFul services
PPTX
FaunaDB security
PPTX
Internet of Things Cologne 2015: MongoDB Technical Presentation
PPTX
Test driving Azure Search and DocumentDB
PPTX
Building enterprise records management solutions for share point 2010
PPTX
Securing private keys
PPTX
Survey of the Microsoft Azure Data Landscape
PDF
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
PDF
Análisis del roadmap del Elastic Stack
PDF
Getting Started with SQLite
Semi Structured Data
Securing Passwords
Big Data Overview Part 1
Active directory 101
Securing data and preventing data breaches
MongoDB meetup at Hike
Market Trends in Microsoft Azure
Fast, Powerful and Scalable Analytics
Elasticsearch tuning
Introduction to Fauna
Building Advanced RESTFul services
FaunaDB security
Internet of Things Cologne 2015: MongoDB Technical Presentation
Test driving Azure Search and DocumentDB
Building enterprise records management solutions for share point 2010
Securing private keys
Survey of the Microsoft Azure Data Landscape
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Análisis del roadmap del Elastic Stack
Getting Started with SQLite
Ad

Similar to Elasticsearch - Scalability and Multitenancy (20)

PDF
Elasticsearch Introduction at BigData meetup
PPTX
Solving Office 365 Big Challenges using Cassandra + Spark
PPTX
ElasticSearch as (only) datastore
PPTX
Elastic & Azure & Episever, Case Evira
PDF
Roaring with elastic search sangam2018
PPTX
Episerver and search engines
PDF
Basic Introduction to Crate @ ViennaDB Meetup
PPTX
An intro to Azure Data Lake
PPTX
Elasticsearch meetup final_2014_04
PPTX
BigData, NoSQL & ElasticSearch
PDF
Overview of data analytics service: Treasure Data Service
PPTX
Introduction to Data Science NoSQL.pptx
PPTX
Revision
PPTX
Colorado Springs Open Source Hadoop/MySQL
PDF
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
PPTX
Module 2.2 Introduction to NoSQL Databases.pptx
PDF
Presto: Fast SQL on Everything
PDF
Database Technologies
PDF
Big Data Architecture Workshop - Vahid Amiri
PPTX
An Introduction to Elastic Search.
Elasticsearch Introduction at BigData meetup
Solving Office 365 Big Challenges using Cassandra + Spark
ElasticSearch as (only) datastore
Elastic & Azure & Episever, Case Evira
Roaring with elastic search sangam2018
Episerver and search engines
Basic Introduction to Crate @ ViennaDB Meetup
An intro to Azure Data Lake
Elasticsearch meetup final_2014_04
BigData, NoSQL & ElasticSearch
Overview of data analytics service: Treasure Data Service
Introduction to Data Science NoSQL.pptx
Revision
Colorado Springs Open Source Hadoop/MySQL
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Module 2.2 Introduction to NoSQL Databases.pptx
Presto: Fast SQL on Everything
Database Technologies
Big Data Architecture Workshop - Vahid Amiri
An Introduction to Elastic Search.
Ad

More from Bozhidar Bozhanov (20)

PPTX
Откриване на фалшиви клетки за подслушване
PPTX
Wiretap Detector - detecting cell-site simulators
PPTX
Антикорупционен софтуер
PDF
Nothing is secure.pdf
PPTX
Blockchain overview - types, use-cases, security and usabilty
PPTX
Електронна държава
PPTX
Blockchain - what is it good for?
PPTX
Algorithmic and technological transparency
PPTX
Scaling horizontally on AWS
PDF
Alternatives for copyright protection online
PPTX
GDPR for developers
PPTX
Политики, основани на данни
PDF
Отворено законодателство
PPTX
Overview of Message Queues
PPTX
Electronic governance steps in the right direction?
PPTX
Сигурност на електронното управление
PPTX
Opensource government
PDF
Биометрична идентификация
PDF
Biometric identification
PPTX
Регулации и технологии
Откриване на фалшиви клетки за подслушване
Wiretap Detector - detecting cell-site simulators
Антикорупционен софтуер
Nothing is secure.pdf
Blockchain overview - types, use-cases, security and usabilty
Електронна държава
Blockchain - what is it good for?
Algorithmic and technological transparency
Scaling horizontally on AWS
Alternatives for copyright protection online
GDPR for developers
Политики, основани на данни
Отворено законодателство
Overview of Message Queues
Electronic governance steps in the right direction?
Сигурност на електронното управление
Opensource government
Биометрична идентификация
Biometric identification
Регулации и технологии

Recently uploaded (20)

PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PPTX
Report in SIP_Distance_Learning_Technology_Impact.pptx
PDF
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PPTX
Blending method and technology for hydrogen.pptx
PDF
Build Real-Time ML Apps with Python, Feast & NoSQL
PDF
SaaS reusability assessment using machine learning techniques
PDF
NewMind AI Journal Monthly Chronicles - August 2025
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
Streamline Vulnerability Management From Minimal Images to SBOMs
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PPTX
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
PDF
CEH Module 2 Footprinting CEH V13, concepts
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Report in SIP_Distance_Learning_Technology_Impact.pptx
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Blending method and technology for hydrogen.pptx
Build Real-Time ML Apps with Python, Feast & NoSQL
SaaS reusability assessment using machine learning techniques
NewMind AI Journal Monthly Chronicles - August 2025
Advancing precision in air quality forecasting through machine learning integ...
Data Virtualization in Action: Scaling APIs and Apps with FME
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
NewMind AI Weekly Chronicles – August ’25 Week IV
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Streamline Vulnerability Management From Minimal Images to SBOMs
A symptom-driven medical diagnosis support model based on machine learning te...
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
CEH Module 2 Footprinting CEH V13, concepts
Rapid Prototyping: A lecture on prototyping techniques for interface design

Elasticsearch - Scalability and Multitenancy

  • 2. ABOUT ME • Founder at LogSentinel, an information security startup • LogSentinel SIEM – product that indexes billions of logs with Elasticsearch • https://techblog.bozho.net • https://twitter.com/bozhobg
  • 3. SCALABILITY AND MULTITENANCY • Scalability – how to process millions (billions) of documents on multiple machines • Multitenancy – how to have our system support multiple users/organizations while segregating their data • One can exist without the other • Both are architectural and implementation tasks, not (just) work for Ops. • „We’ ll push the data in whatever form and Ops will take care of the scaling “
  • 4. ELASTICSEARCH BSICS • “You know, for search” • Indexing documents (document = anything) • Full-text search and keyword search • Allows for large clusters • Licensing issues
  • 5. USE-CASE: TIME-SERIES DATA • Indexing events (logs, metrics, etc.) • Wide-spread and widely applicable scenario • Documents almost always have a timestamp
  • 8. LIMITING FACTORS • One shard shouldn’t be to large • Ideally between 10 and 50 GB; otherwise recovery after failure may not work • The number of shards on a node is limited by RAM • Lucene segments are append-only • A large number of segments reduce performance
  • 9. MULTITENANCY • Cluster-per-tenant • Heavy for administrations • No real multitenancy • Expensive • Index-per-tenant • Also heave for administration • Doesn’t scale well • Tenant-based routing • Recommended in most cases
  • 10. TENANT-BASED ROUTING • _routing=<tenantId> or _routing=<tenantOwnedResourceId> • E.g.. userId or dataSourceId • Routing parameter designates which shard to be used for storing the document • _routing for search requests tells Elasticsearch where to look for the data => faster search • shard_num = hash(_routing) % num_primary_shards • mappings._routing.required: true
  • 11. STRUCTURE OF INDEXED DATA • One field can have only one type • The type is determined on index creation or on first indexed document with that field • User1 creates custom param “duration” of type String • User2 wants to create “duration” of a numeric type -> error • Solution: custom parameter hierarchies by type: params, numericParams, dateParams, …
  • 12. SCALABILITY • „We add more machines and it’s good“? • Recommended shard size (10-50 GB) • We can’t change shards on a running index • Lucene Segments are read-only: • Deleting a document = bad • Updating a document = bad
  • 13. OPTIONS FOR STRUCTURING INDEXES • We need a structure to allow indexing and searching in an arbitrarily large amount of data • One big, ever-growing index • Convenient for small amounts of data, but faces all scalability problems • Index-per-day / index-per-week / index-per-size • Index-per-day-per-retention • Rollover • Deletion should be done by deleting whole indexes, not individual documents
  • 14. MANY INDEXES FOR SEARCH, ONE FOR INDEXING • One search query can be directed to many indexes based on an index alias • Supporting one (or several) active indexes for ingesting documents • All other indexes– read-only • This solves the problem with: • Growing data and growing size of shards • Deleting old data
  • 15. EFFECTIVE INDEXING • In real time (problem: too many requests to Elasticsearch) • Storing in a database and indexing with a batch job • Message queue (complex to implement) (we use Kafka) • In-memory queue (might lose data) • Batch-indexing when a given size or time threshold is reached • Hybrid: bulk processing + database • Quick indexing with in-memory queue + subsequent check based on the data in the database • Avoid updates (=delete + insert)
  • 16. CONCLUSION • Elasticsearch is easy to get running • …and complex for scaling • Changes to a production setup are hard • We must not throw scalability and multitenancy tasks to the Ops teams – they are application problems • Elasticsearch internals impose unintuitive limitations (“The law of leaky abstractions”)
  • 18. RESOURCES • https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html • https://techblog.bozho.net/elasticsearch-multitenancy-with-routing/ • https://techblog.bozho.net/near-real-time-indexing-with-elasticsearch/ • https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing- speed.html • https://www.loggly.com/blog/nine-tips-configuring-elasticsearch-for-high-performance/ • https://tech.ebayinc.com/engineering/elasticsearch-performance-tuning-practice-at-ebay/