Monitoring Flink with PrometheusMonitoring Flink with Prometheus
 &  & 
Flink Forward Berlin 2018
Maximilian Bode
whoami
Software Engineer with
Focus on
Data-Intensive Applications
Site Reliability Engineering
open-source, metrics-based monitoring system
simple yet powerful data model & query language
client libraries in all popular languages
high-performance and simple to run
 prometheus/prometheus
☝☝
Metrics
Time series of 64-bit floating-point numbers
Labels
Key-value pairs associated with time series
Scrape
Act of fetching metrics via HTTP request
TSDB
Prometheus storage layer, 
PromQL
Query language, used for graphing and alerting
prometheus/tsdb
flink_jobmanager_job_uptime{job_name="PrometheusExampleJob"}



grafana/grafana
prometheus/alertmanager
- alert: FlinkJobsMissing
expr: sum(flink_api_jobs_running) < 2
for: 3m
annotations:
summary: Fewer Flink jobs than expected are running.
      
PrometheusReporter
1. Copy reporter jar in lib directory
2. Configure in conf/flink-conf.yaml
cp /opt/flink-metrics-prometheus-1.6.0.jar /lib
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.Prometheus
metrics.reporter.prom.port: 9999
docs
3. Let Prometheus scrape
a) Statically in prometheus.yml
b) Service Discovery
scrape_configs:
- job_name: 'jobmanager'
static_configs:
- targets: ['jobmanager:9999']
- job_name: 'taskmanager1'
static_configs:
- targets: ['taskmanager1:9999']
- [...]
docs
4. Use custom metrics in your jobs
class CountingMap extends RichMapFunction<Integer, Integer> {
private transient Counter eventCounter;
@Override
public void open(Configuration parameters) {
eventCounter = getRuntimeContext().getMetricGroup().counter("events");
}
@Override
public Integer map(Integer value) {
eventCounter.inc();
return value;
}
}
" " ( ) by Praying squirrel CC BY 2.0 Michael Seeley
[Flink docs]
[Prometheus docs]
, Brian Brazil

[Prometheus docs]

Debugging & Monitoring / Metrics
Introduction / Overview
Prometheus Up & Running
prometheus/pushgateway
Remote endpoints & storage
improbable-eng/thanos




mbode/flink-prometheus-example
maximilian.bode@tng.tech
@mxpbode
Maximilian Bode

Monitoring Flink with Prometheus