0% found this document useful (0 votes)

58 views10 pages

Making Prometheus Highly Available (HA) & Scalable With Thanos

The document discusses how to enhance the high availability (HA) and scalability of Prometheus using Thanos, a tool that allows for better data aggregation and storage. It outlines various deployment topologies for Prometheus instances, the integration of Thanos components, and the steps to set up a multi-cluster environment. Ultimately, Thanos provides a cost-effective solution for managing multiple Prometheus clusters while maintaining a unified view of metrics.

Uploaded by

rahos39645

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views10 pages

Making Prometheus Highly Available (HA) & Scalable With Thanos

Uploaded by

rahos39645

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

12/14/21, 6:29 PM Making Prometheus Highly Available (HA) & Scalable with Thanos

82
Making Prometheus Highly Available (HA) &
Shares

Scalable with Thanos

Bhavin Gandhi
Dec 8th, 2018
In previous blog posts, we have talked about the basics of
autoscaling,
autoscaling using custom
metrics
and
Prometheus
Operator
which covers various aspects of monitoring in Kubernetes. One thing we
haven’t talked
so much about is the high availability (HA) of Prometheus in
the cluster. What if the single Prometheus
instance goes down? Or if a
single instance can not handle metrics for the whole cluster and you
need to
scale horizontally?In this post, we will use Thanos to make
Prometheus highly available (HA) and scalable.
Sounds exciting? Let’s get started!

Quick prometheus tour

Prometheus is installed in a cluster and “scrapes” metrics from the
application, but what does “scraping”
mean?. The application will expose
the metric values at a particular port – let’s say 8080 in the
exposition
format defined
by Prometheus. Prometheus will keep hitting this URL (scrape the URL) at
a given interval of
time and show these metrics on it’s dashboard.
According to the retention time period specified in the
configuration,
those values will be kept in memory and later will get stored on the
disk where Prometheus is
running. Based on volume and logical isolation
needed for various components, Prometheus can be installed in
various
topologies.

Prometheus deployment topologies

Prometheus instance per cluster
If a single Prometheus instance is enough to scrape all of your
workload, then the easiest solution is to just
increase the number of
replicas of Prometheus. This means that all the replicas will have
the same scrape
configuration and all instances will have the same
data. In practice, based on interval and data collected, the
values will
be slightly different but that is not an overall deal breaker and gives
higher availability to
Prometheus at the cluster level.
Prometheus instance per group of services
If the previous model does not work in your case, then splitting the
task between two or three Prometheus
instances could be one strategy
(which is basically sharding). Each instance will be configured to
scrape a
group of services instead of scraping all the services. To make
these instances highly available we can have
scale replicas per instance as per need.
[Link] 1/11
12/14/21, 6:29 PM Making Prometheus Highly Available (HA) & Scalable with Thanos

Prometheus instance per service

To split and scale it further, functional sharding is recommended. In
this case, we will have one Prometheus
instance scraping one service.
And again to make this Highly Available (HA) we can have scale replicas
for
each instance. If more sharding is required then it can be achieved
by having separate instances of
Prometheus scrapping a set of metrics
from a service – but in practice, very few use cases would need to do
this in my experience.
What about Alertmanager?
One of the side effects of multiple instances of Prometheus is that the
alert manager can get the same alert
82 twice. Prometheus can discover all
instances of Alertmanager and fires the alert against all of them. But
the
Shares alert manager has a cluster with gossip protocol which takes care of
deduplicating these alerts. More on this
in Prometheus Operator
documentation
and Prometheus
FAQs

How to combine multiple Prometheus instances?

Now that we have multiple instances scraping different parts of our
application, how to get a global view of
whole data? How to view the
graph of data in Grafana, without switching the
data sources? One solution is to
configure a meta Prometheus instance
which will utilize the
federation
feature of Prometheus and scrape all
the instances for some portion of
data. In this way, we will have some kind of overview of all the metrics
we
are scraping. But is this a actual global view? No, we are probably
missing a lot of data which is collected by
our instances. Also, we have
the overhead of configuring this meta Prometheus correctly to have data
aggregated.

Data volume!
While we have our instances collecting a huge amount of data and
Prometheus’ tsdb supports
compression
mechanism, we still have limitation over the amount of data
we can actually retain, as we will have either use
SSD or HDD backed
storage for each instance. The costs of these volumes across multiple
instances and
meta instances can grow quite fast and is not economical
beyond a certain scale.

Thanos to the rescue for high vailability

Prometheus
Thank god, Thanos can solve some of the above problems. Thanos injects a
sidecar in every instance of
Prometheus, which makes it possible to have
real global view of metrics. It can also store the data from
Prometheus’ disk to S3 compatible storage. Let’s quicky understand
various components that make Thanos
work:
Sidecar
Works as proxy that serves Prometheus local data to Querier over
gRPC based Store API
This allows Querier to get data based on labels as well as time
ranges
Uploads the data blocks written by Prometheus to S3 storage
This helps to keep the Prometheus + Sidecar lightweight
Querier
Stateless, horizontally scalable
Works like a Prometheus instance to the outside world; serves
same as Prometheus over HTTP API
Queries the Thanos components and Sidecars for data
Deduplication and aggregations happens in the Querier
Requires unique label pair on all Prometheus instances
across cluster so that the
deduplication will work
correctly. i.e. if the replica label is different for a
metric then it’s a
[Link] 2/11
12/14/21, 6:29 PM Making Prometheus Highly Available (HA) & Scalable with Thanos

duplicate
Store
Retrieves the data from S3 store
Participates in gossip cluster and implements Store API
It’s just like another Sidecar for Querier
The data blocks on S3 may have several large files. Caching them
will also require more storage in
Store component. Instead, it
tries to align the data and uses an index to minimize the HTTP
calls
and bandwidth
Compactor
Applies Prometheus’ local compaction mechanism to the historical
data from the store
82
Shares
Can be run as a periodic batch job
Down-samples the data to 5 minutes and 1 hour as well as stores
basic query results like min, max,
sum, count and the Querier
selects the appropriate result from the data
Ruler
Evaluates rules and alerts against Thanos Querier to give a
global unified view of data

Image credits: Improbable Worlds Ltd

How to deploy Thanos on multi-cluster

environment: hands on
Multi cluster deployment is fairly common for building highly available
services. Let’s take a scenario of two
clusters deployed in two AZs of
regions and Thanos aggregating the data from these two regions. In this
case,
Thanos makes it really simple to have e global view of data across
all the clusters. We can view correctly
aggregated graphs as we will be
using same S3 compatible storage bucket as a backend.

[Link] 3/11
12/14/21, 6:29 PM Making Prometheus Highly Available (HA) & Scalable with Thanos

82
Shares

Let’s create two clusters in different zones and deploy an demo

application on top of them. We are using the
GKE cluster in this case.
Two clusters we have here are:

bhavin-cluster-1 in asia-south1-a

bhavin-cluster-2 in asia-east1-a

Creating the GCP storage bucket

We will create a bucket ‘thanos-store’, where all the data will get
stored. We will also create a Service Account
which will grant our
Thanos components to store and fetch the metrics data to and from the
bucket
Create a bucket Storage -> Create bucket

[Link] 4/11
12/14/21, 6:29 PM Making Prometheus Highly Available (HA) & Scalable with Thanos

82
Shares

Add a Service Account with Roles Storage Object Creator and Viewer

[Link] 5/11
12/14/21, 6:29 PM Making Prometheus Highly Available (HA) & Scalable with Thanos

82
Shares

Generate a new key for the Service Account and download the JSON
file. Rename it to gcs-
[Link]

Creating the secret to store the JSON

The JSON file we downloaded from last step is used to create the secret
in the Kubernetes cluster. We will
deploy all the components in the
namespace monitoring, so let’s create that first.

# Create the monitoring namespace

$ kubectl create namespace monitoring

namespace/monitoring created

# Create the secret

$ kubectl create secret generic gcs-credentials --from-file=[Link] -n monitoring

secret/gcs-credentials created

How to deploy Prometheus plus Thanos setup for

high availability?
Now we will deploy the Prometheus plus Thanos setup. In order to scrape
all the Pods from different
namespaces, we will have to create RBAC
rules as well.

[Link] 6/11
12/14/21, 6:29 PM Making Prometheus Highly Available (HA) & Scalable with Thanos

# RBAC for Prometheus

$ kubectl -n monitoring create -f prometheus/[Link]

serviceaccount/prometheus-server created

[Link]/prometheus-server created

# Deploy Prometheus + Thanos sidecar

$ kubectl -n monitoring create -f thanos/kube/manifests/[Link]

[Link]/prometheus-gcs created

configmap/prometheus-config-gcs created

service/prometheus-gcs created

service/thanos-peers created

82
Shares
# Deploy Thanos Query

$ kubectl -n monitoring create -f thanos/kube/manifests/[Link]

[Link]/thanos-query created

service/thanos-query created

Deploying the application

This will create a deployment ‘mockemtrics’. It generates few metrics
with random values.

$ kubectl create -f [Link]

[Link]/mockmetrics created

Creating a load balancer

Now we have two clusters in different regions with an application
running on them. There are two Prometheus
instances on each cluster
which are scraping the metrics from our application. Let’s create a
single end point,
which can be used with Grafana etc.
thanos-query service is running as NodePort in both the clusters at
port 30909. We will create a load balancer
pointing to the node
pools of both the clusters.

[Link] 7/11
12/14/21, 6:29 PM Making Prometheus Highly Available (HA) & Scalable with Thanos

82
Shares

accessing Thanos
Visiting the load balancer IP at port 80 will show us the Thanos UI.
Tools relying on Prometheus API endpoint
can use this IP.

Data from two instances of Prometheus

[Link] 8/11
12/14/21, 6:29 PM Making Prometheus Highly Available (HA) & Scalable with Thanos

82
Shares

Deduplicated view of data

Storage Bucket

Conclusion
Thanos provides a economical, yet scalable way to aggregate data from
multiple Prometheus clusters and
provides a single pane of glass to
users. The project has a lot of promise and makes scaling Prometheus
clusters really easy.
Hope you enjoyed the process of making Prometheus highly available and scalable with
Thanos. Follow us on Twitter and LinkedIn for regular posts like this

References & further

Thanos’ design doc: Cost comparison
reading
[Link] 9/11
12/14/21, 6:29 PM Making Prometheus Highly Available (HA) & Scalable with Thanos

Modified manifests
used for deploying Prometheus and Thanos
Thanos Github

82 Adopt
Observability
Faster with InfraCloud's Expertise
Shares
Contact Us

Posts You Might Like

Observability Observability Observability

Prometheus Definitive Monitoring TrilioVault for Prometheus HA with
Guide Part III - Prometheus Kubernetes Resources with Thanos Sidecar Or
Operator BotKube Receiver?

Subscribe to Newsletter
Enter your email Subscribe

We unleash growth by helping companies adopt cloud native technologies with our products and services!
Services Products Resources Company
DevOps Consulting Services BotKube Enterprise Cloud Native Blog About Us
Cloud Native Consulting Fission Enterprise Open Source Contributions Careers
[Link] 10/11

High Availability for Prometheus in Kubernetes
No ratings yet
High Availability for Prometheus in Kubernetes
10 pages
Setup Prometheus Monitoring On Kubernetes
No ratings yet
Setup Prometheus Monitoring On Kubernetes
6 pages
Prom Qna
No ratings yet
Prom Qna
43 pages
Prometheus Concepts
No ratings yet
Prometheus Concepts
4 pages
Prometheus and Grafana Monitoring Guide
No ratings yet
Prometheus and Grafana Monitoring Guide
41 pages
16 - Prometheus Handout
No ratings yet
16 - Prometheus Handout
31 pages
Observing Enterprise Kubernetes Clusters at Scale
No ratings yet
Observing Enterprise Kubernetes Clusters at Scale
59 pages
EKS Thanos and S3
No ratings yet
EKS Thanos and S3
4 pages
29 Using Prometheus Alertmanager Node Exporter To Monitor A Companys Geo Distributed Infrastructure
No ratings yet
29 Using Prometheus Alertmanager Node Exporter To Monitor A Companys Geo Distributed Infrastructure
12 pages
House Dzone Refcard 293 Getting Started Prometheus
No ratings yet
House Dzone Refcard 293 Getting Started Prometheus
6 pages
Kubernetes Monitoring With Prometheus Grafana
No ratings yet
Kubernetes Monitoring With Prometheus Grafana
6 pages
Monitoring Ec2 Instance
No ratings yet
Monitoring Ec2 Instance
15 pages
Prometheus
No ratings yet
Prometheus
17 pages
Setup of Prometheus, Node Exporter, and Grafana
No ratings yet
Setup of Prometheus, Node Exporter, and Grafana
18 pages
Monitoring Cloudflare with Prometheus
No ratings yet
Monitoring Cloudflare with Prometheus
76 pages
Prometheus Ebook v2
83% (6)
Prometheus Ebook v2
231 pages
SESSION6 - Real Time Monitoring - 1
No ratings yet
SESSION6 - Real Time Monitoring - 1
16 pages
Prometheus Course
No ratings yet
Prometheus Course
162 pages
Improving HA and Long-Term Storage For Prometheus Using Thanos On EKS With S3 - AWS Open Source Blog
No ratings yet
Improving HA and Long-Term Storage For Prometheus Using Thanos On EKS With S3 - AWS Open Source Blog
11 pages
Prometheus Monitoring on AWS EC2
No ratings yet
Prometheus Monitoring on AWS EC2
59 pages
Monitor Docker with Prometheus Guide
No ratings yet
Monitor Docker with Prometheus Guide
9 pages
Observability - Part 2
No ratings yet
Observability - Part 2
9 pages
Install Prometheus, Grafana & Node Exporter
No ratings yet
Install Prometheus, Grafana & Node Exporter
7 pages
Prometheus & Grafana Setup Guide
No ratings yet
Prometheus & Grafana Setup Guide
4 pages
Configuring Prometheus Service Monitor
No ratings yet
Configuring Prometheus Service Monitor
5 pages
Monitor
No ratings yet
Monitor
4 pages
Prometheus Monitoring
No ratings yet
Prometheus Monitoring
13 pages
DevoxxUK 2021 - Thanos
No ratings yet
DevoxxUK 2021 - Thanos
31 pages
Assignment 3
No ratings yet
Assignment 3
4 pages
Prometheus Grafana Setup
No ratings yet
Prometheus Grafana Setup
4 pages
Prom Notes
No ratings yet
Prom Notes
47 pages
7.IT Infra Support Q&A
No ratings yet
7.IT Infra Support Q&A
3 pages
Prometheus Grafana Setup
100% (1)
Prometheus Grafana Setup
5 pages
Grafana
No ratings yet
Grafana
13 pages
Kubernetes Observability with Prometheus
No ratings yet
Kubernetes Observability with Prometheus
2 pages
L8 Deployments
No ratings yet
L8 Deployments
36 pages
Prometheus
No ratings yet
Prometheus
34 pages
Prometheus Part 13 Use Cases
No ratings yet
Prometheus Part 13 Use Cases
24 pages
Grafana Setup for Prometheus Dashboards
No ratings yet
Grafana Setup for Prometheus Dashboards
4 pages
An Introduction To Prometheus: Brian Brazil Founder
No ratings yet
An Introduction To Prometheus: Brian Brazil Founder
42 pages
Prometheus Monitoring Setup Guide
No ratings yet
Prometheus Monitoring Setup Guide
15 pages
Kubernetes Monitoring Setup Guide
No ratings yet
Kubernetes Monitoring Setup Guide
5 pages
Prometheus
No ratings yet
Prometheus
37 pages
Speeeechh
No ratings yet
Speeeechh
2 pages
Grafana&Prometheus
No ratings yet
Grafana&Prometheus
15 pages
Booking Confirmation
No ratings yet
Booking Confirmation
56 pages
Devo Monitoring Tool Setup Guide
No ratings yet
Devo Monitoring Tool Setup Guide
17 pages
Monitor Health Graf Prom
No ratings yet
Monitor Health Graf Prom
34 pages
Prometheus and Grafana 1712312993
No ratings yet
Prometheus and Grafana 1712312993
6 pages
Prometheus for DevOps Monitoring
No ratings yet
Prometheus for DevOps Monitoring
10 pages
Kubernetes Autoscaling Guide
No ratings yet
Kubernetes Autoscaling Guide
13 pages
Prometheus K8s Monitoring Setup
No ratings yet
Prometheus K8s Monitoring Setup
817 pages
Prometheus Lab
No ratings yet
Prometheus Lab
4 pages
Prometheus PromQL Setup Guide
No ratings yet
Prometheus PromQL Setup Guide
35 pages
Blackbox Exporter Setup for Monitoring
No ratings yet
Blackbox Exporter Setup for Monitoring
17 pages
Kubernetes Monitoring with Prometheus & Grafana
No ratings yet
Kubernetes Monitoring with Prometheus & Grafana
8 pages
Mastering Monitoringwith Prometheusand Grafanae 356 A 4305 D 8896 CF
No ratings yet
Mastering Monitoringwith Prometheusand Grafanae 356 A 4305 D 8896 CF
14 pages
Kubernetes Ha Setup
No ratings yet
Kubernetes Ha Setup
1 page
Mastering Prometheus & Grafana
No ratings yet
Mastering Prometheus & Grafana
18 pages
OCS352 Lab
No ratings yet
OCS352 Lab
38 pages
Python Conditional Statements Quiz
No ratings yet
Python Conditional Statements Quiz
2 pages
8086 Lab Programs
No ratings yet
8086 Lab Programs
4 pages
BAMS 1st Yr Dec 2023 Skau Question Paper @bams - Notes - Library
No ratings yet
BAMS 1st Yr Dec 2023 Skau Question Paper @bams - Notes - Library
19 pages
Welcome To CSC 303 - 2024
No ratings yet
Welcome To CSC 303 - 2024
15 pages
Image Classification: CNN Model
No ratings yet
Image Classification: CNN Model
2 pages
Idrac Telemetry Guide
No ratings yet
Idrac Telemetry Guide
105 pages
ICT Skills & Devices Overview
No ratings yet
ICT Skills & Devices Overview
10 pages
Autodesk BIM 360 Field Field Management Comparison Matrix
No ratings yet
Autodesk BIM 360 Field Field Management Comparison Matrix
4 pages
BCS303 Operating Systems Overview
No ratings yet
BCS303 Operating Systems Overview
8 pages
SQL Joins
No ratings yet
SQL Joins
6 pages
Free PDF Download - Google Search
No ratings yet
Free PDF Download - Google Search
2 pages
ManageEngine ServiceDeskPlusMSP 8.1 Help InstallationGuide
No ratings yet
ManageEngine ServiceDeskPlusMSP 8.1 Help InstallationGuide
49 pages
CDC Claim - Process 1
No ratings yet
CDC Claim - Process 1
14 pages
Word Exercise 1-Introduction
No ratings yet
Word Exercise 1-Introduction
4 pages
Macabacus Manual: Applies To The Following and Subsequent Versions
No ratings yet
Macabacus Manual: Applies To The Following and Subsequent Versions
121 pages
Lecture 1
No ratings yet
Lecture 1
28 pages
Malware Development
100% (1)
Malware Development
22 pages
Ajp Mirco-Project
No ratings yet
Ajp Mirco-Project
12 pages
Library Management System Project Overview
85% (46)
Library Management System Project Overview
53 pages
Excel M1
No ratings yet
Excel M1
16 pages
Ge SNPX Manual
No ratings yet
Ge SNPX Manual
71 pages
En 50716 2024
No ratings yet
En 50716 2024
14 pages
CIS VMware ESXi 7.0 Benchmark v1.1.0 PDF
No ratings yet
CIS VMware ESXi 7.0 Benchmark v1.1.0 PDF
235 pages
Software Engineer Resume
No ratings yet
Software Engineer Resume
3 pages
REVIEWER OF DSA Midterm
No ratings yet
REVIEWER OF DSA Midterm
10 pages
(Ebook) Symbian OS Internals: Real-Time Kernel Programming (Symbian Press) by Jane Sales ISBN 9780470025246, 0470025247 PDF Download
No ratings yet
(Ebook) Symbian OS Internals: Real-Time Kernel Programming (Symbian Press) by Jane Sales ISBN 9780470025246, 0470025247 PDF Download
84 pages
Kitgum Business Institute: 2.0 How To Create Company in Tally Erp9
No ratings yet
Kitgum Business Institute: 2.0 How To Create Company in Tally Erp9
10 pages
Create & Run Macros in Word 2010
No ratings yet
Create & Run Macros in Word 2010
3 pages
Java Programming Lab Exercises 2022-25
No ratings yet
Java Programming Lab Exercises 2022-25
10 pages

Making Prometheus Highly Available (HA) & Scalable With Thanos

Uploaded by

Making Prometheus Highly Available (HA) & Scalable With Thanos

Uploaded by

12/14/21, 6:29 PM Making Prometheus Highly Available (HA) & Scalable with Thanos

Scalable with Thanos

Quick prometheus tour

Prometheus deployment topologies

Prometheus instance per service

How to combine multiple Prometheus instances?

Thanos to the rescue for high vailability

Image credits: Improbable Worlds Ltd

Read more about this in Introducing Thanos: Prometheus at scale by

How to deploy Thanos on multi-cluster

Let’s create two clusters in different zones and deploy an demo

Creating the GCP storage bucket

Creating the secret to store the JSON

# Create the monitoring namespace

$ kubectl create namespace monitoring

# Create the secret

$ kubectl create secret generic gcs-credentials --from-file=[Link] -n monitoring

How to deploy Prometheus plus Thanos setup for

# RBAC for Prometheus

$ kubectl -n monitoring create -f prometheus/[Link]

# Deploy Prometheus + Thanos sidecar

$ kubectl -n monitoring create -f thanos/kube/manifests/[Link]

$ kubectl -n monitoring create -f thanos/kube/manifests/[Link]

Deploying the application

$ kubectl create -f [Link]

Creating a load balancer

Data from two instances of Prometheus

Deduplicated view of data

References & further

Posts You Might Like

Observability Observability Observability

You might also like