Monitoring a Kubernetes-backed microservice
architecture with Prometheus
Björn “Beorn” Rabenstein — SoundCloud
Fabian Reinartz — CoreOS
SoundCloud 2012 – from monolith …
… to microservices
Orchestration needed
Run containers in a cluster…
In-house innovation: Bazooka – PaaS, Heroko style.
Problems:
- Only 12-factor apps (stateless etc.).
- Limited resource isolation.
- No sidecars.
- Maturity.
Meanwhile, the open-source world has evolved…
K U B E R N E T E S
Kubernetes
- inspired by Google’s Borg
- not Borg
Today:
Tomorrow:
X
Labels Labels are the
new hierarchies!
name = MyAPI
app = MyAPI
version = 0.4
app = MyAPI
version = 0.5
app = MyAPI
version = 0.4
app = MyDB
cluster = auth
app = MyDB
cluster = misc
select by [ app = MyAPI ]
service
pod
pod
pod
pod
pod
Graphite
Monitoring at SC 2012
Service A
Service B
Service C
Monitoring challenges
‐ A lot of traffic to monitor
- Monitoring traffic should not be proportional to user traffic
‐ Way more targets to monitor
- One host can run many containers
‐ And they constantly change
- Deploys, scaling, rescheduling unhealthy instances …
‐ Need a fleet-wide view.
- What’s my overall 99th percentile latency?
‐ Still need to be able to drill down for troubleshooting
- Which instance/endpoint/version/... causes those errors I’m seeing?
‐ Meaningful alerting
- Symptom-based alerting for pages, cause-based alerting for warnings
- See Rob Ewaschuk’s “My philosophy on alerting" https://goo.gl/2vrpSO
Monitor everything, all levels, with the same system
Level What to monitor (examples) What exposes metrics (example)
Network Routers, switches SNMP exporter
Host (OS, hardware) Hardware failure, provisioning, host resources Node exporter
Container Resource usage, performance characteristics cAdvisor
Application Latency, errors, QPS, internal state Your own code
Orchestration Cluster resources, scheduling Kubernetes components
P R O M E T H E U S
“Obviously the solution to all our problems with everything forever, right?”
Benjamin Staffin, Fitbit Site Operations
Prometheus
- inspired by Google’s Borgmon
- not Borgmon
- initially developed at SoundCloud, open-source from the beginning
- public announcement early 2015
- collects metrics at scale via HTTP (think: yet another client of your microservice)
- thousands of targets, millions of time series, 800k samples/s, no dependencies
- easy to scale
Features – multi-dimensional data model
http_requests_total{instance="web-1", path="/index", status="500", method="GET"}
http_requests_total{instance="web-1", path="/index", status="404", method="POST"}
http_requests_total{instance="web-3", path="/index", status="200", method="GET"}
#metrics x #values(instance) x #values(path) x #values(status) x #values(method)
▶ millions of time series
Labels are the
new hierarchies!
Features – powerful query language
The 3 path-method combinations with the highest number of failing requests?
topk(3, sum by(path, method) (
rate(http_requests_total{status=~"5.."}[5m])
))
The 99th percentile request latency by request path?
histogram_quantile(0.99, sum by(le, path) (
rate(http_requests_duration_seconds_bucket[5m])
))
The questions to ask are often not known beforehand.
Features – powerful query language
topk(3, sum by(path, method) (
rate(http_requests_total{status=~"5.."}[5m])
))
{path="/api/comments", method="POST"} 105.4
{path="/api/user/:id", method="GET"} 34.122
{path="/api/comment/:id/edit", method="POST"} 29.31
Features – easy instrumentation
from prometheus_client import start_http_server, Histogram
# Create a metric to track time spent and requests made.
REQUEST_TIME = Histogram('request_processing_seconds', 'Time spent processing request')
# Decorate function with metric.
@REQUEST_TIME.time()
def process_request(t):
# do work …
return
start_http_server(8000)
Integrations (selection)
K U B E R N E T E S
P R O M E T H E U S
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Three questions
How to monitor
services running
on Kubernetes
with Prometheus?
X
?
?
How to monitor
Kubernetes and
containers with
Prometheus?
How to run
Prometheus on
Kubernetes?
Monitoring Services
svc = service-X
namespace = production
version = 22.4
version = 22.5
Monitoring Services via Exporters
svc = mysql
namespace = production
MySQL
exporter
MySQL
Monitoring Kubernetes
namespace = kube-system
apiserver
kubelet
kubelet
etcd
kubelet
kubelet
kubelet
kubelet
Running Prometheus on Kubernetes
- So far: Prometheus ran outside of cluster
- Pod IPs must be routable
- Conventional deployment (Chef, Puppet, …)
- Service discovery needs authentication
- To run Prometheus inside of cluster:
kubectl run --image="quay.io/prometheus/prometheus:0.18.0" prometheus
Monitoring Services
svc = service-X
namespace = production
version = 22.4
version = 22.5
app = prometheus
namespace = kube-system
apiserver
kubelet
kubelet
etcd
kubelet
kubelet
kubelet
kubelet
app = prometheus
namespace = infra
Monitoring Kubernetes
What about storage?
A) None
B) Network/Cloud volumes
C) Host volumes
DEMO
CC BY 2.0
Author: carstingaxion / Carsten Bach
The end

More Related Content

PPT
Monitoring using Prometheus and Grafana
PDF
Prometheus – a next-gen Monitoring System
PPTX
Monitoring_with_Prometheus_Grafana_Tutorial
PDF
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
PDF
Prometheus (Microsoft, 2016)
PDF
Monitoring Kubernetes with Prometheus
PDF
Monitoring Kubernetes with Prometheus
PDF
Infrastructure & System Monitoring using Prometheus
Monitoring using Prometheus and Grafana
Prometheus – a next-gen Monitoring System
Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Prometheus (Microsoft, 2016)
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
Infrastructure & System Monitoring using Prometheus

What's hot (20)

PDF
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
PDF
Breaking Prometheus (Promcon Berlin '16)
PDF
Service Discovery in Prometheus
PDF
Monitoring infrastructure with prometheus
PDF
Monitoring Kubernetes with Prometheus
PDF
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
PDF
Getting Started Monitoring with Prometheus and Grafana
ODP
Using Grails to power your electric car
PDF
Monitoring with Prometheus
PDF
Explore your prometheus data in grafana - Promcon 2018
PDF
Server monitoring using grafana and prometheus
PPTX
An Introduction to Prometheus (GrafanaCon 2016)
PDF
The history of Prometheus at SoundCloud
PDF
Kafka monitoring and metrics
PPTX
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
PDF
The RED Method: How to monitoring your microservices.
PPTX
Prometheus with Grafana - AddWeb Solution
PDF
Cloud Monitoring tool Grafana
PDF
What is your application doing right now? An introduction to Prometheus
PDF
20171027 モニタリング勉強会
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
Breaking Prometheus (Promcon Berlin '16)
Service Discovery in Prometheus
Monitoring infrastructure with prometheus
Monitoring Kubernetes with Prometheus
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Getting Started Monitoring with Prometheus and Grafana
Using Grails to power your electric car
Monitoring with Prometheus
Explore your prometheus data in grafana - Promcon 2018
Server monitoring using grafana and prometheus
An Introduction to Prometheus (GrafanaCon 2016)
The history of Prometheus at SoundCloud
Kafka monitoring and metrics
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
The RED Method: How to monitoring your microservices.
Prometheus with Grafana - AddWeb Solution
Cloud Monitoring tool Grafana
What is your application doing right now? An introduction to Prometheus
20171027 モニタリング勉強会
Ad

Similar to Monitoring a Kubernetes-backed microservice architecture with Prometheus (20)

PDF
Prometheus kubernetes tech talk
PDF
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
PDF
Microservices and Prometheus (Microservices NYC 2016)
PDF
Monitoring on Kubernetes using Prometheus - Chandresh
PPTX
Monitoring on Kubernetes using prometheus
PPTX
How to Improve the Observability of Apache Cassandra and Kafka applications...
PPSX
Service Mesh - Observability
PDF
How to monitor your micro-service with Prometheus?
PDF
The hitchhiker’s guide to Prometheus
PDF
The hitchhiker’s guide to Prometheus
PDF
Prometheus monitoring
PPTX
Prometheus and Grafana
PDF
Prometheus and Docker (Docker Galway, November 2015)
PDF
Time series denver an introduction to prometheus
PDF
Prometheus - basics
PDF
DevOps Braga #15: Agentless monitoring with icinga and prometheus
PDF
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
PDF
DevOps Spain 2019. Beatriz Martínez-IBM
PDF
Prometheus Everything, Observing Kubernetes in the Cloud
PPTX
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus kubernetes tech talk
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
Microservices and Prometheus (Microservices NYC 2016)
Monitoring on Kubernetes using Prometheus - Chandresh
Monitoring on Kubernetes using prometheus
How to Improve the Observability of Apache Cassandra and Kafka applications...
Service Mesh - Observability
How to monitor your micro-service with Prometheus?
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
Prometheus monitoring
Prometheus and Grafana
Prometheus and Docker (Docker Galway, November 2015)
Time series denver an introduction to prometheus
Prometheus - basics
DevOps Braga #15: Agentless monitoring with icinga and prometheus
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
DevOps Spain 2019. Beatriz Martínez-IBM
Prometheus Everything, Observing Kubernetes in the Cloud
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Ad

Recently uploaded (20)

PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
Examining Bias in AI Generated News Content.pdf
PDF
SaaS reusability assessment using machine learning techniques
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PDF
Altius execution marketplace concept.pdf
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PDF
Identification of potential depression in social media posts
PDF
Streamline Vulnerability Management From Minimal Images to SBOMs
PDF
Build Real-Time ML Apps with Python, Feast & NoSQL
PPTX
Presentation - Principles of Instructional Design.pptx
PDF
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
PDF
Human Computer Interaction Miterm Lesson
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PDF
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Advancing precision in air quality forecasting through machine learning integ...
Examining Bias in AI Generated News Content.pdf
SaaS reusability assessment using machine learning techniques
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
Altius execution marketplace concept.pdf
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Rapid Prototyping: A lecture on prototyping techniques for interface design
Co-training pseudo-labeling for text classification with support vector machi...
Identification of potential depression in social media posts
Streamline Vulnerability Management From Minimal Images to SBOMs
Build Real-Time ML Apps with Python, Feast & NoSQL
Presentation - Principles of Instructional Design.pptx
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
Human Computer Interaction Miterm Lesson
Lung cancer patients survival prediction using outlier detection and optimize...
A symptom-driven medical diagnosis support model based on machine learning te...
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf

Monitoring a Kubernetes-backed microservice architecture with Prometheus

  • 1. Monitoring a Kubernetes-backed microservice architecture with Prometheus Björn “Beorn” Rabenstein — SoundCloud Fabian Reinartz — CoreOS
  • 2. SoundCloud 2012 – from monolith …
  • 4. Orchestration needed Run containers in a cluster… In-house innovation: Bazooka – PaaS, Heroko style. Problems: - Only 12-factor apps (stateless etc.). - Limited resource isolation. - No sidecars. - Maturity. Meanwhile, the open-source world has evolved…
  • 5. K U B E R N E T E S
  • 6. Kubernetes - inspired by Google’s Borg - not Borg Today: Tomorrow: X
  • 7. Labels Labels are the new hierarchies! name = MyAPI app = MyAPI version = 0.4 app = MyAPI version = 0.5 app = MyAPI version = 0.4 app = MyDB cluster = auth app = MyDB cluster = misc select by [ app = MyAPI ] service pod pod pod pod pod
  • 8. Graphite Monitoring at SC 2012 Service A Service B Service C
  • 9. Monitoring challenges ‐ A lot of traffic to monitor - Monitoring traffic should not be proportional to user traffic ‐ Way more targets to monitor - One host can run many containers ‐ And they constantly change - Deploys, scaling, rescheduling unhealthy instances … ‐ Need a fleet-wide view. - What’s my overall 99th percentile latency? ‐ Still need to be able to drill down for troubleshooting - Which instance/endpoint/version/... causes those errors I’m seeing? ‐ Meaningful alerting - Symptom-based alerting for pages, cause-based alerting for warnings - See Rob Ewaschuk’s “My philosophy on alerting" https://goo.gl/2vrpSO
  • 10. Monitor everything, all levels, with the same system Level What to monitor (examples) What exposes metrics (example) Network Routers, switches SNMP exporter Host (OS, hardware) Hardware failure, provisioning, host resources Node exporter Container Resource usage, performance characteristics cAdvisor Application Latency, errors, QPS, internal state Your own code Orchestration Cluster resources, scheduling Kubernetes components
  • 11. P R O M E T H E U S “Obviously the solution to all our problems with everything forever, right?” Benjamin Staffin, Fitbit Site Operations
  • 12. Prometheus - inspired by Google’s Borgmon - not Borgmon - initially developed at SoundCloud, open-source from the beginning - public announcement early 2015 - collects metrics at scale via HTTP (think: yet another client of your microservice) - thousands of targets, millions of time series, 800k samples/s, no dependencies - easy to scale
  • 13. Features – multi-dimensional data model http_requests_total{instance="web-1", path="/index", status="500", method="GET"} http_requests_total{instance="web-1", path="/index", status="404", method="POST"} http_requests_total{instance="web-3", path="/index", status="200", method="GET"} #metrics x #values(instance) x #values(path) x #values(status) x #values(method) ▶ millions of time series Labels are the new hierarchies!
  • 14. Features – powerful query language The 3 path-method combinations with the highest number of failing requests? topk(3, sum by(path, method) ( rate(http_requests_total{status=~"5.."}[5m]) )) The 99th percentile request latency by request path? histogram_quantile(0.99, sum by(le, path) ( rate(http_requests_duration_seconds_bucket[5m]) )) The questions to ask are often not known beforehand.
  • 15. Features – powerful query language topk(3, sum by(path, method) ( rate(http_requests_total{status=~"5.."}[5m]) )) {path="/api/comments", method="POST"} 105.4 {path="/api/user/:id", method="GET"} 34.122 {path="/api/comment/:id/edit", method="POST"} 29.31
  • 16. Features – easy instrumentation from prometheus_client import start_http_server, Histogram # Create a metric to track time spent and requests made. REQUEST_TIME = Histogram('request_processing_seconds', 'Time spent processing request') # Decorate function with metric. @REQUEST_TIME.time() def process_request(t): # do work … return start_http_server(8000)
  • 18. K U B E R N E T E S P R O M E T H E U S
  • 20. Three questions How to monitor services running on Kubernetes with Prometheus? X ? ? How to monitor Kubernetes and containers with Prometheus? How to run Prometheus on Kubernetes?
  • 21. Monitoring Services svc = service-X namespace = production version = 22.4 version = 22.5
  • 22. Monitoring Services via Exporters svc = mysql namespace = production MySQL exporter MySQL
  • 23. Monitoring Kubernetes namespace = kube-system apiserver kubelet kubelet etcd kubelet kubelet kubelet kubelet
  • 24. Running Prometheus on Kubernetes - So far: Prometheus ran outside of cluster - Pod IPs must be routable - Conventional deployment (Chef, Puppet, …) - Service discovery needs authentication - To run Prometheus inside of cluster: kubectl run --image="quay.io/prometheus/prometheus:0.18.0" prometheus
  • 25. Monitoring Services svc = service-X namespace = production version = 22.4 version = 22.5 app = prometheus
  • 27. What about storage? A) None B) Network/Cloud volumes C) Host volumes
  • 28. DEMO CC BY 2.0 Author: carstingaxion / Carsten Bach