Tikal KnowledgeTikal Knowledge
Haggai Philip Zagury - DevOps Group Lead - Tikal Knowledge
FullStack Developers Israel
INTRO - WHO WE ARE
WHO WE ARE ?
▸ Tikal helps ISV’s in Israel & abroad in their technological
challenges.
▸ Our Engineers are Fullstack Developers with expertise in
Android, DevOps, Java, JS, Ruby & Python
▸ We are passionate about technology and specialize in
OpenSource technologies.
▸ Our Tech and Group leaders help establish & enhance existing
software teams with innovative & creative thinking.
https://www.meetup.com/full-stack-developer-il/
INTRODUCTION TO MODERN MONITORING
CURRENT STATUS [ INFRASTRUCTURE ]
▸ AWS, Cloud, Hybrid / Multi Cloud
▸ Define metrics and system health based on experience and application
specific behaviors.
▸ Many False Positives
▸ Scaling is hard [ semi-auto, manual ]
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
COMMON MONITORING STATUS
▸ OPS own monitoring domain
▸ Define metrics and system health based on experience and application
specific behaviours.
▸ Many False Positives
▸ Scaling is hard [ semi-auto, manual ]
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
COMMON MONITORING SOLUTIONS
▸ cloud watch
▸ new relic
▸ Nagios
▸ App Dynamics
▸ Data Dog
▸ Many more ….
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
GOALS
▸ Improve existing monitoring and RCA indicators
▸ Reduce false positives & ‘customer driven alerting’
▸ Proactively identify data anomalies / diversions
▸ Provide meaningful / intelligent notifications [ severity, SLA compliance etc ]
▸ Proactively remediate commonly known issues, or set the foundation of a
robust substitute
▸ Provide KPI integration policy & methodology for both DevOps & R&D teams
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
CHALLENGES
▸ Preserve the knowledge and insights in the existing Monitoring system
▸ Cultural changes:
▸ APM is part of the development process
▸ Monitoring tools are part of the developer stack (or he will wake up on any
issue with his code/app)
▸ On-call isn’t only for OPS … Everybody’s accountable
▸ breakdown the “wall of confusion” between dev and ops
Tikal Knowledge
PHILOSOPHY
Tikal Knowledge
The Gap of Traditional Monitoring
- We know what we want to know …
Tikal Knowledge
System Metrics
Not enough || Too much a little too late
Tikal Knowledge
We do not always
know what we are
looking @ / 4 …
Tikal Knowledge
Is this OK ?! || Normal
What happened at 4AM
Tikal Knowledge
If your’e lucky
+
= No action needed
Tikal Knowledge
Go back to sleep
( you still work up ! )
Tikal Knowledge
REALITY
Murphy’s law …
Tikal Knowledge
Stop using Nagios
(so it can die peacefully)
Feb 13, 2014 [ slideshare ]
Tikal Knowledge
In 2 words:
Configuration files…
In a few more:
- resources
- services
- dependencies
- …
Tikal Knowledge
Traditional Monitoring
• Reliable
• Durable
• Scalable
Conclusion …
system monitoring does not suffice, enter APM
Tikal Knowledge
HOW DID WE GET HERE
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
TRADITIONAL MONITORING WAS(IS) ALL ABOUT THE “BLACK BOX” | “OS” METRICS
▸ All we care about is that the system is OK …
APPLICATION
FROTNEND
APPLICATION
BACKEND
APPLICATION
DATABASE
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
OPS ARE WORKING ON OPTIMIZING INFRASTRUCTURE …
▸ Throw more RAM &
“Reports”
▸ Add another node to
the “FE cluster”
▸ Add another shard to
the DB …
▸ ….
APPLICATION …
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
IN THE PAST ~10 YEARS
▸ Developers have started to implement METRICS
▸ Organizations are adopting Standards
▸ Common metrics have become a commodity
Tikal Knowledge
REALITY PREVAILS
Tikal Knowledge
APPLICATION
FROTNEND
APPLICATION
BACKEND
APPLICATION
DATABASE
APPLICATION …
Tikal Knowledge
Multipule
Dimensions
• [ Stability ]
• Ops dimension
• [ Innovation ]
• Dev dimension
• Product dimension
Tikal Knowledge
Even More
• Environment [ stg, uat, prod ]
• Application Stack(s) || tags || types
• Business metrics
Tikal Knowledge
TEAMS | SCOPES | METRICS - COME TOGETHR
Tikal Knowledge
Tikal KnowledgeTikal Knowledge
Apply
INTRODUCTION TO MODERN MONITORING
MONITORING CRITARIA’S
▸ Server (OS) level monitoring
▸ Application Monitoring (APM)
▸ Perimeter (External website) monitoring
▸ Event driven remediation
▸ Alerting and Escalation
▸ Associated log data & anomaly detection
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
REQUIRED FEATURES
Accessibility
Scheduling
SLA’s assured
Auth & Authorization
Escalation
Durable & Resilient
Forensics
Automatic
Flexible & Elastic
Accountable
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
IT’S AN ITERATIVE PROCESS
▸ How quick did we recover ?
▸ What worked / Didn’t work ?
▸ Iterative improvements [ Chaos Monkey, 10 story test ]
▸ RCA -> Remediation [ a.k.a False positive lifecycle ]
Tikal Knowledge
METHODOLOGY
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
HOW TO DEFINE A METRIC OR ALERT VS. HOW TO STORE DATA
▸ A Metric’s Lifecycle & Design
▸ Time Series Data stream(s) || source(s)
▸ Common tagging
▸ Metric naming conventions and implications
▸ Micro Services, Integration of Traditional and New Generation solutions
▸ Choose short, mid & long term tools / services
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
A METRIC’S LIFECYCLE
NEW (A)
METRIC
INFRUSTRUCTURE (OS)
APPLICATION
EXTERNAL (DEPENDENCY / ENDPOINT)
REMEDIABLE ?
ALEARTABLE ?
LOG CORRELATION
SCOPE OF IMPACT
LEARN IN DEV | STG
}
} DEFINE IN DEV | STG
} SHIP TO PROD
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
A METRIC’S LIFECYCLE - “TAG-ABLE” == FILTERABLE | MEASURABLE | QUANTIFIABLE
NEW (A)
METRIC
INFRUSTRUCTURE (OS)
APPLICATION
EXTERNAL (DEPENDENCY / ENDPOINT)
REMEDIABLE ?
ALEARTABLE ?
LOG CORRELATION
SCOPE OF IMPACT
LEARN IN DEV | STG
}
} DEFINE IN DEV | STG
} SHIP TO PROD
DEVLOPMENT STAGING PRODUCTIONENVIRONMENT
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
A METRIC’S LIFECYCLE
NEW (A)
METRIC
INFRUSTRUCTURE (OS)
APPLICATION
EXTERNAL (DEPENDENCY / ENDPOINT)
REMEDIABLE ?
ALEARTABLE ?
LOG CORRELATION
SCOPE OF IMPACT
LEARN IN DEV | STG
}
} DEFINE IN DEV | STG
} SHIP TO PROD
- QUANTIFIABLE METRICS: SEVERITY, CRITICAL STATE
- EXPOSING A SERVICE
- CONSUMING A SERVICE
- - WHY DOES MY SERVICE HAVE AN OS IMPACT ?
- - IS IT BY DESIGN ?
- FALLBACK METHODS ?
- ALTERNATE ENDPOINTS / RETRY ?
- FEATURE TOGGLE
- DEFINE SEVERITY
37
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
TSD PRINCIPLES
Credit->http://opentsdb.net/overview.html
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DATAPOINTS
Credit->https://www.datadoghq.com/blog/the-power-of-tagged-metrics/
IntoolslikePrometheusyoudon'tneedthetimestampitjustusescollectiontimestamp
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
MIX ’N’ MATCH
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
SHORT | MID | LONG TERM SOLUTIONS
Tikal Knowledge
PROMETHEUS
https://github.com/prometheus/prometheus
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
FEATURES
▸ Open-source systems monitoring and alerting toolkit
▸ A multi-dimensional data model (time series identified by metric name and key/value pairs)
▸ A flexible query language to leverage this dimensionality
▸ A no reliance on distributed storage; single server nodes are autonomous**
▸ A time series collection happens via a pull model over HTTP
▸ A pushing time series is supported via an intermediary gateway
▸ A targets are discovered via service discovery or static configuration
▸ A multiple modes of graphing and dashboarding support
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
PROMETHEUS ARCHITECTURE
Dashboarding
Prometheus Server Alertmanager
Retrieval /
Collection
DataSerie
s
Storage
[DB]
PromQ
L
web UI
Prometheus
server
Prometheus
server(s)
Push Gateway
Service Discovery Providers
Prometheus
server
Prometheus
exporters
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
UNTIL NOW
‣ Try providing this to each developer
‣ Sensu has a very similar approach to
APM …
‣ Complexity is the barrier …
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
UNTIL NOW
‣ Pull has become an advantage …
‣ Severity is implied [TSD]
‣ False Positives reduction
‣ Docker makes it super simple
‣ Go Lang lightweight approach
Tikal Knowledge
IMPLEMENTATION
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
IMPLEMENTATION
‣ Review old system metrics & capabilities and decide what’s good whats bad
‣ What can move
‣ What needs to stay | integrate to new system
‣ Prometheus deployment is Automated from day 1
‣ Prometheus exporter services are tagged and labeled per application stack | layer
‣ Preferably Dockerized
‣ Metric Design Workshops | meetings | slack group
‣ Alert Design Workshops | meetings | slack group
‣ Teams Mectic tags and Alerting & Escalation
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
STEP1 - IMPLEMENT DISCOVERY
AWS Discovery -> https://github.com/prometheus/prometheus/tree/master/discovery
NEW NODE
DEPLOYMEN
T
SERVICE
DISCOVERY
DEV
STAGING
PRODUCTION
STACK / APP
NAME Alertmanager
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
STEP2 - IMPLEMENT EXPORTERS
https://prometheus.io/docs/instrumenting/exporters/
Official node exporter -> https://github.com/prometheus/node_exporter
Mssql Exporter -> https://hub.docker.com/r/awaragi/prometheus-mssql-
exporter/
Nagios Exporter -> https://github.com/m-lab/prometheus-nagios-exporter
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
STEP3 - IMPLEMENT CUSTOM APPLICATION METRICS
https://prometheus.io/docs/instrumenting/exporters/
Windows WMI -> https://github.com/martinlindhe/wmi_exporter
Java -> https://github.com/prometheus/jmx_exporter
node.js -> https://www.npmjs.com/browse/keyword/prometheus
.Net -> https://github.com/andrasm/prometheus-net
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
STEP4 - ADAPT TO YOUR INFRA MONITORING [ FILTER || TAG || SELECTOR ]
kubernetes_sd_config
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
STEP 5 - METRIC DESIGN
‣ Review sample METRICS and GRAPHS
‣ Define | Reuse
‣ Naming conventions { https://prometheus.io/docs/practices/naming/ }
‣ Quantifiable [ numbers not strings … ]
Tikal Knowledge
DASHBOARSTikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEVELOPER TOOL
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEVELOPER TOOL - SIMPLE GRAPHS
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEVELOPER TOOL - METRICS - USING PROMQL
▸ Simple queries:
▸ rate(http_requests_total[5m])
▸ Linear predictions
▸ predict_linear(node_filesystem_free[1h], 4*3600)
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
GRAFANA - SIMILAR WORKING EXPERIENCE - MUCH NICER
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
GRAFANA - SIMILAR WORKING EXPERIENCE - MUCH NICER
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
STEP 6 - ALERT DESIGN
‣ Review new METRICS and GRAPHS define | design thresholds
‣ Define Severity
‣ Ownership
‣ Escalation lader
Tikal Knowledge
ALERTINGTikal Knowledge
INTRODUCTION TO MODERN MONITORING
ALERT DESIGN
▸ ALERT <alert name>
▸ IF <expression>
▸ [ FOR <duration> ]
▸ [ LABELS <label set> ]
▸ [ ANNOTATIONS <label set> ]
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
ALERT FOR ANY INSTANCE THAT IS UNREACHABLE FOR >5 MINUTES.
ALERT high_load
IF node_load1 > 0.5
ANNOTATIONS {description="{{ $labels.instance }} of job {{ $labels.job }} is
under high load.", summary="Instance {{ $labels.instance }} under high load"}
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
STILL LOOKING FOR ONLINE EDITOR FOR EASE OF DEVELOPMENT
https://github.com/alerta/prometheus-config
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
SIMPLE YAML FILE
route:
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- send_resolved: true
username: '<username>'
channel: '#<channel-name>'
api_url: '<incomming-webhook-url>'
WHERE TO ROUTE TO
ROUTER DETAILS
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
ALERTING
global:
resolve_timeout: 5m
smtp_require_tls: true
pagerduty_url: https://events.pagerduty.com/generic/2010-04-15/create_event.json
hipchat_url: https://api.hipchat.com/
opsgenie_api_host: https://api.opsgenie.com/
victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
route:
receiver: slack
receivers:
- name: slack
slack_configs:
- send_resolved: true
api_url: <secret>
channel: '#<channel-name>'
username: <username>
color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}'
title: '{{ template "slack.default.title" . }}'
title_link: '{{ template "slack.default.titlelink" . }}'
pretext: '{{ template "slack.default.pretext" . }}'
text: '{{ template "slack.default.text" . }}'
fallback: '{{ template "slack.default.fallback" . }}'
icon_emoji: '{{ template "slack.default.iconemoji" . }}'
icon_url: '{{ template "slack.default.iconurl" . }}'
templates: []
}
}Channel Configuration
Variables | Global configuration
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
ALERT TEMPLATING
▸ What | How to say …
https://prometheus.io/blog/2016/03/03/custom-alertmanager-templates/
- send_resolved: true
api_url: <secret>
channel: '#<channel-name>'
username: <username>
color: '{{ if eq .Status "firing" }}danger{{ else }}
good{{ end }}'
title: '{{ template "slack.default.title" . }}'
title_link: '{{ template "slack.default.titlelink" . }}'
pretext: '{{ template "slack.default.pretext" . }}'
text: '{{ template "slack.default.text" . }}'
fallback: '{{ template "slack.default.fallback" . }}'
icon_emoji: '{{ template "slack.default.iconemoji" . }}'
icon_url: '{{ template "slack.default.iconurl" . }}'
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
SILENCING, VIA UI / API
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
ANSWERS REQUIRED FEATURES
Accessibility
Scheduling
SLA’s assured
Auth & Authorization
Escalation
Durable & Resilient
Forensics
Automatic
Flexible & Elastic
Accountable
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
NEXT STEPS
INFRUSTRUCTURE (OS)
APPLICATION
EXTERNAL (DEPENDENCY / ENDPOINT)
REMEDIABLE ?
ALEARTABLE ?
LOG CORRELATION
}
ALERT MANAGER
LEGACY
IDENTIFY
CHOOSE
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEMO TIME
‣ Docker-compose - ready fro R&D to start using to run create custom application
Metrics.
‣ Prometheus, Node_exporter, Alertmanager Cadvisor, Grafana
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DOCKER SETTINGS - VOLUMES, NETWORKS
version: ‘2'
volumes:
prometheus_data: {}
grafana_data: {}
networks:
front-tier:
driver: bridge
back-tier:
driver: bridge
Docker-compose version
Docker volumes for preometheus and grafana
Docker Networks
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
PROMETHEUS - OFFICIAL CONTAINER
services:
prometheus:
image: prom/prometheus
container_name: prometheus
volumes:
- ./prometheus/:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '-config.file=/etc/prometheus/prometheus.yml'
- '-storage.local.path=/prometheus'
- '-alertmanager.url=http://alertmanager:9093'
expose:
- 9090
ports:
- 9090:9090
links:
- cadvisor:cadvisor
- alertmanager:alertmanager
depends_on:
- cadvisor
networks:
- back-tier
Docker Service name
Docker volumes for prometheus and grafana
Expose as service on specified port
Ports to expose as service
Link to cadvisor & alertmanager
Network placement ‘back-tier’
Configuration
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
NODE-EXPORTER [ NODE METRICS COLLECTOR ]
node-exporter:
container_name: node-exporter
image: prom/node-exporter
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command: '-collector.procfs=/host/proc -collector.sysfs=/host/sys
-collector.filesystem.ignored-mount-points="^(/rootfs|/host|)/(sys|
proc|dev|host|etc)($$|/)" collector.filesystem.ignored-fs-
types="^(sys|proc|auto|cgroup|devpts|ns|au|fuse.lxc|mqueue)(fs|)$$"'
expose:
- 9100
networks:
- back-tier
Access to /proc /sys
What to mount from
OS to container for
metric collection
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
ALERT MANAGER
alertmanager:
image: prom/alertmanager
ports:
- 9093:9093
volumes:
- ./alertmanager/:/etc/alertmanager/
networks:
- back-tier
command:
- '-config.file=/etc/alertmanager/config.yml'
- '-storage.path=/alertmanager'
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
CADVISOR
cadvisor:
image: google/cadvisor
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
expose:
- 8080
networks:
- back-tier
grafana:
image: grafana/grafana
depends_on:
- prometheus
ports:
- 3000:3000
volumes:
- grafana_data:/var/lib/grafana
env_file:
- config.monitoring
networks:
- back-tier
- front-tier
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
GRAFANA
grafana:
image: grafana/grafana
depends_on:
- prometheus
ports:
- 3000:3000
volumes:
- grafana_data:/var/lib/grafana
env_file:
- config.monitoring
networks:
- back-tier
- front-tier
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DOCKER PS
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3dcfd7c289cb grafana/grafana "/run.sh" 21 hours ago Up 4 minutes 0.0.0.0:3000->3000/tcp prometheus_grafana_1
2b2817fc0bd9 prom/prometheus "/bin/prometheus -..." 21 hours ago Up 4 minutes 0.0.0.0:9090->9090/tcp prometheus
d2c6849d3bd9 google/cadvisor "/usr/bin/cadvisor..." 21 hours ago Up 4 minutes 8080/tcp prometheus_cadvisor_1
d4a3c3ceb97d prom/node-exporter "/bin/node_exporte..." 21 hours ago Up 4 minutes 9100/tcp node-exporter
75eb08791ea9 prom/alertmanager "/bin/alertmanager..." 21 hours ago Up 4 minutes 0.0.0.0:9093->9093/tcp prometheus_alertmanager_1
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEMO PROJECT ON GITHUB
https://github.com/shelleg/monlog-compose-stack
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
‣ All containers - monitored by prometheus + graphed in a small nice project.
Tikal Knowledge
TEXT
ROLLOUT [ LLD ]
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
PLACEMENT OPTIONS
‣ 1 main prometheus server vs. 1 Prometheus server per team
‣ 1 Alert-manager [ with pre-defined “receivers” ] vs. 1 per team / concern
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEPLOYMENT OPTIONS
‣ Automate deployment of prometheus server(s) / Alert-manager [ pre-defined
“receivers” ]
‣ Ansible, puppet etc
‣ Jenkins
‣ The combination of the 2 ;)
‣ Automation helps solve the “one 2 Many” dilemma IMHO …
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEVELOPER STACK
‣ Options:
‣ Personal Docker / Docker-compose[ private fork if desired ]
‣ A small startup.cmd / startup.sh starting go applications of promethes & alertmanager
‣ A centralized Grafana / Alertmanager with only prometheus on dev-machine
‣ Toolkit for
‣ develop metrics, alarms, graphs
‣ Add exporters to configuration [ tendency :: as common as you develop new services ]
‣ SDLC -> Gil Pull/MErge request mechanism
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEVELOPER STACK(S) - EXAMPLE
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
ALERTS IN SCM MASTER -> STG -> PRD
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
POPULATE ALERTS | METRICS | DASHBOARDS VIA SCM
1. Use “ready made” || good starring point graphs from grafana dashboard exchange or build your own
2. Customize
3. Add / push to git master branch
4. “ci” server -> listen on GitHook -> push to staging
5. “ci” server -> wait for manual trigger -> push to production
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
CONTINUOUS DELIVERY OPTIONS [ ADDING AN ALERT SAMPLE WORKFLOW ]
master (dev)
staging
production
DEVELOP
DEPLOY TO STAGE
DEPLOY TO PROD
1 centralized repo
branch per env /
prometheus instance
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
CONTINUOUS DELIVERY OPTIONS [ ADDING GRAPHS ]
master (dev)
staging
production
DEVELOP
DEPLOY TO STAGE
DEPLOY TO PROD
“Grafana Dashboard hub”
- separate repo ?
- part of monitoring repo ?
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
CI PIPELINE -DATA ORIGINS & PRESENTATION
Exporters
REGION POD INSTANCE *
}
}
App Metrics
OS Metrics
Filter Tags & Alerts
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
CI PIPELINE
DEV
STAGING
PRODUCTION
STACK / APP
NAME
ALERTMANAGE
R
ALERTMANAGE
R
Web-hook (PR-builder)
GRAFANA
GRAFANA
OPS “CLEANUP” ROUTINE(S)
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
BUILDING THE PIPELINE
‣ Routine on submit / push builds to dev/stg
‣ Run daily / weekly deployments of Alerts (prometheus) |
Dashboards (grafana)
‣ Avoid / rollback any manual changes of Alerts /
Graphs etc
‣ Help make automation a common practice
‣ Scheduled task which syncs and re-configures the
desired state from SCM
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
MESURE THE PIPELINE
‣ Pipeline steps are monitored
‣ Expose metrics such as:
‣ deployment time & status [ in env | stack etc ]
‣ count (# of alerts, new vs old last week, month etc)
‣ Metric counters [ application metrics ] …
‣ [ Jenkins exporter || push gateway TBD ]
Tikal Knowledge
FEEDBACK / QUESTIONS ? I’M HERE …
HAGZAG@TIKALK.COM, 0545302525
Haggai Philip Zagury - Tikal Knowledge
MONITORING HLD
FullStack Developers Israel

More Related Content

PDF
Test Automation - Principles and Practices
PPTX
Dev ops != Dev+Ops
PDF
12 Steps to API Load Testing with Apache JMeter
PDF
Oracle API Gateway
PDF
Microservice Architecture
PDF
Docker vs VM | | Containerization or Virtualization - The Differences | DevOp...
PPTX
Introducing Swagger
PDF
Deploy 22 microservices from scratch in 30 mins with GitOps
Test Automation - Principles and Practices
Dev ops != Dev+Ops
12 Steps to API Load Testing with Apache JMeter
Oracle API Gateway
Microservice Architecture
Docker vs VM | | Containerization or Virtualization - The Differences | DevOp...
Introducing Swagger
Deploy 22 microservices from scratch in 30 mins with GitOps

What's hot (20)

PDF
Playwright: A New Test Automation Framework for the Modern Web
PPTX
Why you should switch to Cypress for modern web testing?
PDF
Microsoft Cloud Adoption Framework
PDF
DevOps Best Practices
PDF
Comparing Native Java REST API Frameworks - Seattle JUG 2022
PDF
Introduction to Kong API Gateway
PDF
Azure Arc by K.Narisorn // Azure Multi-Cloud
PDF
12 factor app - Core Guidelines To Cloud Ready Solutions
PDF
PPSX
Elastic-Engineering
PPTX
Introduction to microservices
PDF
DevSecOps and the CI/CD Pipeline
PDF
Highlights of WSO2 API Manager 4.0.0
PPTX
Chaos engineering and chaos testing
PPT
Selenium ppt
PDF
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
PDF
Soluciones Dynatrace
PPTX
Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...
PPTX
DevOps introduction
Playwright: A New Test Automation Framework for the Modern Web
Why you should switch to Cypress for modern web testing?
Microsoft Cloud Adoption Framework
DevOps Best Practices
Comparing Native Java REST API Frameworks - Seattle JUG 2022
Introduction to Kong API Gateway
Azure Arc by K.Narisorn // Azure Multi-Cloud
12 factor app - Core Guidelines To Cloud Ready Solutions
Elastic-Engineering
Introduction to microservices
DevSecOps and the CI/CD Pipeline
Highlights of WSO2 API Manager 4.0.0
Chaos engineering and chaos testing
Selenium ppt
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Soluciones Dynatrace
Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...
DevOps introduction
Ad

Similar to Modern Monitoring [ with Prometheus ] (20)

PDF
Transforming to OpenStack: a sample roadmap to DevOps
PDF
Chaos is a ladder !
PDF
Data Center Migration Essentials - Adam Saint-Prix Tim Wong
PPTX
ISACA Ireland Keynote 2015
PDF
5 Steps to Get Precise SAP Impact-Based Testing
PPTX
Winning Governance Strategies for the Technology Disruptions of our Time
PDF
GitOps, Driving NGN Operations Teams 211127 #kcdgt 2021
PDF
Success Factors for a Mature Microservices Implementation
PDF
PDF
Kube Security Shifting left | Scanners & OPA
PDF
Partnership with Synergy
PPTX
DevSecCon Keynote
PPTX
DevSecCon KeyNote London 2015
PDF
Production-Ready Kubernetes: It's Not About Technology
PPTX
How Cloud-Ready Alerting Is Optimal For Today's Environments
PDF
Introduction to DevOps
PPTX
Chicago DevOps Meetup Nov2019
PPTX
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
PPTX
Digital day - Devops & Continuous delivery
PPTX
Design and Deploy Secure Clouds for Financial Services Use Cases
Transforming to OpenStack: a sample roadmap to DevOps
Chaos is a ladder !
Data Center Migration Essentials - Adam Saint-Prix Tim Wong
ISACA Ireland Keynote 2015
5 Steps to Get Precise SAP Impact-Based Testing
Winning Governance Strategies for the Technology Disruptions of our Time
GitOps, Driving NGN Operations Teams 211127 #kcdgt 2021
Success Factors for a Mature Microservices Implementation
Kube Security Shifting left | Scanners & OPA
Partnership with Synergy
DevSecCon Keynote
DevSecCon KeyNote London 2015
Production-Ready Kubernetes: It's Not About Technology
How Cloud-Ready Alerting Is Optimal For Today's Environments
Introduction to DevOps
Chicago DevOps Meetup Nov2019
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
Digital day - Devops & Continuous delivery
Design and Deploy Secure Clouds for Financial Services Use Cases
Ad

More from Haggai Philip Zagury (20)

PDF
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
PDF
TechRadarCon 2022 | Have you built your platform yet ?
PDF
Gitlab, GitOps & ArgoCD
PDF
DevEx | there’s no place like k3s
PDF
Git ops & Continuous Infrastructure with terra*
PDF
Auth experience - vol 1.0
PDF
PDF
Auth experience
PDF
Kubexperience intro session
PDF
Scaling i/o bound Microservices
PDF
The 2nd half. Scaling to the next^2
PDF
Terraform 101
PDF
Natively clouded Journey
PDF
Deep Learning - Continuous Operations
PDF
Terraform 101
PDF
PDF
Machine Learning - Continuous operations
PDF
Whats all the FaaS About
PDF
Git internals
PPTX
Tce automation-d4-110102123012-phpapp01
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
TechRadarCon 2022 | Have you built your platform yet ?
Gitlab, GitOps & ArgoCD
DevEx | there’s no place like k3s
Git ops & Continuous Infrastructure with terra*
Auth experience - vol 1.0
Auth experience
Kubexperience intro session
Scaling i/o bound Microservices
The 2nd half. Scaling to the next^2
Terraform 101
Natively clouded Journey
Deep Learning - Continuous Operations
Terraform 101
Machine Learning - Continuous operations
Whats all the FaaS About
Git internals
Tce automation-d4-110102123012-phpapp01

Recently uploaded (20)

PPTX
Chapter_05_System Modeling for software engineering
PDF
Engineering Document Management System (EDMS)
PDF
Cloud Native Aachen Meetup - Aug 21, 2025
PDF
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
PPTX
ROI from Efficient Content & Campaign Management in the Digital Media Industry
PDF
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
PDF
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
PPTX
StacksandQueuesCLASS 12 COMPUTER SCIENCE.pptx
PPTX
Streamlining Project Management in the AV Industry with D-Tools for Zoho CRM ...
PPTX
Human-Computer Interaction for Lecture 2
PPTX
HackYourBrain__UtrechtJUG__11092025.pptx
PPTX
ESDS_SAP Application Cloud Offerings.pptx
PPTX
Chapter 1 - Transaction Processing and Mgt.pptx
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
PPT
3.Software Design for software engineering
PDF
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
PPTX
Swiggy API Scraping A Comprehensive Guide on Data Sets and Applications.pptx
PDF
infoteam HELLAS company profile 2025 presentation
PPTX
Human Computer Interaction lecture Chapter 2.pptx
PDF
IT Consulting Services to Secure Future Growth
Chapter_05_System Modeling for software engineering
Engineering Document Management System (EDMS)
Cloud Native Aachen Meetup - Aug 21, 2025
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
ROI from Efficient Content & Campaign Management in the Digital Media Industry
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
StacksandQueuesCLASS 12 COMPUTER SCIENCE.pptx
Streamlining Project Management in the AV Industry with D-Tools for Zoho CRM ...
Human-Computer Interaction for Lecture 2
HackYourBrain__UtrechtJUG__11092025.pptx
ESDS_SAP Application Cloud Offerings.pptx
Chapter 1 - Transaction Processing and Mgt.pptx
Understanding the Need for Systemic Change in Open Source Through Intersectio...
3.Software Design for software engineering
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
Swiggy API Scraping A Comprehensive Guide on Data Sets and Applications.pptx
infoteam HELLAS company profile 2025 presentation
Human Computer Interaction lecture Chapter 2.pptx
IT Consulting Services to Secure Future Growth

Modern Monitoring [ with Prometheus ]

  • 1. Tikal KnowledgeTikal Knowledge Haggai Philip Zagury - DevOps Group Lead - Tikal Knowledge
  • 2. FullStack Developers Israel INTRO - WHO WE ARE WHO WE ARE ? ▸ Tikal helps ISV’s in Israel & abroad in their technological challenges. ▸ Our Engineers are Fullstack Developers with expertise in Android, DevOps, Java, JS, Ruby & Python ▸ We are passionate about technology and specialize in OpenSource technologies. ▸ Our Tech and Group leaders help establish & enhance existing software teams with innovative & creative thinking. https://www.meetup.com/full-stack-developer-il/
  • 3. INTRODUCTION TO MODERN MONITORING CURRENT STATUS [ INFRASTRUCTURE ] ▸ AWS, Cloud, Hybrid / Multi Cloud ▸ Define metrics and system health based on experience and application specific behaviors. ▸ Many False Positives ▸ Scaling is hard [ semi-auto, manual ] Tikal Knowledge
  • 4. INTRODUCTION TO MODERN MONITORING COMMON MONITORING STATUS ▸ OPS own monitoring domain ▸ Define metrics and system health based on experience and application specific behaviours. ▸ Many False Positives ▸ Scaling is hard [ semi-auto, manual ] Tikal Knowledge
  • 5. INTRODUCTION TO MODERN MONITORING COMMON MONITORING SOLUTIONS ▸ cloud watch ▸ new relic ▸ Nagios ▸ App Dynamics ▸ Data Dog ▸ Many more …. Tikal Knowledge
  • 6. INTRODUCTION TO MODERN MONITORING GOALS ▸ Improve existing monitoring and RCA indicators ▸ Reduce false positives & ‘customer driven alerting’ ▸ Proactively identify data anomalies / diversions ▸ Provide meaningful / intelligent notifications [ severity, SLA compliance etc ] ▸ Proactively remediate commonly known issues, or set the foundation of a robust substitute ▸ Provide KPI integration policy & methodology for both DevOps & R&D teams Tikal Knowledge
  • 7. INTRODUCTION TO MODERN MONITORING CHALLENGES ▸ Preserve the knowledge and insights in the existing Monitoring system ▸ Cultural changes: ▸ APM is part of the development process ▸ Monitoring tools are part of the developer stack (or he will wake up on any issue with his code/app) ▸ On-call isn’t only for OPS … Everybody’s accountable ▸ breakdown the “wall of confusion” between dev and ops Tikal Knowledge
  • 9. The Gap of Traditional Monitoring - We know what we want to know … Tikal Knowledge
  • 10. System Metrics Not enough || Too much a little too late Tikal Knowledge
  • 11. We do not always know what we are looking @ / 4 … Tikal Knowledge
  • 12. Is this OK ?! || Normal What happened at 4AM Tikal Knowledge
  • 13. If your’e lucky + = No action needed Tikal Knowledge
  • 14. Go back to sleep ( you still work up ! ) Tikal Knowledge
  • 16. Stop using Nagios (so it can die peacefully) Feb 13, 2014 [ slideshare ] Tikal Knowledge
  • 17. In 2 words: Configuration files… In a few more: - resources - services - dependencies - … Tikal Knowledge
  • 18. Traditional Monitoring • Reliable • Durable • Scalable Conclusion … system monitoring does not suffice, enter APM Tikal Knowledge
  • 19. HOW DID WE GET HERE Tikal Knowledge
  • 20. INTRODUCTION TO MODERN MONITORING TRADITIONAL MONITORING WAS(IS) ALL ABOUT THE “BLACK BOX” | “OS” METRICS ▸ All we care about is that the system is OK … APPLICATION FROTNEND APPLICATION BACKEND APPLICATION DATABASE Tikal Knowledge
  • 21. INTRODUCTION TO MODERN MONITORING OPS ARE WORKING ON OPTIMIZING INFRASTRUCTURE … ▸ Throw more RAM & “Reports” ▸ Add another node to the “FE cluster” ▸ Add another shard to the DB … ▸ …. APPLICATION … Tikal Knowledge
  • 22. INTRODUCTION TO MODERN MONITORING IN THE PAST ~10 YEARS ▸ Developers have started to implement METRICS ▸ Organizations are adopting Standards ▸ Common metrics have become a commodity Tikal Knowledge
  • 25. Multipule Dimensions • [ Stability ] • Ops dimension • [ Innovation ] • Dev dimension • Product dimension Tikal Knowledge
  • 26. Even More • Environment [ stg, uat, prod ] • Application Stack(s) || tags || types • Business metrics Tikal Knowledge
  • 27. TEAMS | SCOPES | METRICS - COME TOGETHR
  • 30. INTRODUCTION TO MODERN MONITORING MONITORING CRITARIA’S ▸ Server (OS) level monitoring ▸ Application Monitoring (APM) ▸ Perimeter (External website) monitoring ▸ Event driven remediation ▸ Alerting and Escalation ▸ Associated log data & anomaly detection Tikal Knowledge
  • 31. INTRODUCTION TO MODERN MONITORING REQUIRED FEATURES Accessibility Scheduling SLA’s assured Auth & Authorization Escalation Durable & Resilient Forensics Automatic Flexible & Elastic Accountable Tikal Knowledge
  • 32. INTRODUCTION TO MODERN MONITORING IT’S AN ITERATIVE PROCESS ▸ How quick did we recover ? ▸ What worked / Didn’t work ? ▸ Iterative improvements [ Chaos Monkey, 10 story test ] ▸ RCA -> Remediation [ a.k.a False positive lifecycle ] Tikal Knowledge
  • 34. INTRODUCTION TO MODERN MONITORING HOW TO DEFINE A METRIC OR ALERT VS. HOW TO STORE DATA ▸ A Metric’s Lifecycle & Design ▸ Time Series Data stream(s) || source(s) ▸ Common tagging ▸ Metric naming conventions and implications ▸ Micro Services, Integration of Traditional and New Generation solutions ▸ Choose short, mid & long term tools / services Tikal Knowledge
  • 35. INTRODUCTION TO MODERN MONITORING A METRIC’S LIFECYCLE NEW (A) METRIC INFRUSTRUCTURE (OS) APPLICATION EXTERNAL (DEPENDENCY / ENDPOINT) REMEDIABLE ? ALEARTABLE ? LOG CORRELATION SCOPE OF IMPACT LEARN IN DEV | STG } } DEFINE IN DEV | STG } SHIP TO PROD Tikal Knowledge
  • 36. INTRODUCTION TO MODERN MONITORING A METRIC’S LIFECYCLE - “TAG-ABLE” == FILTERABLE | MEASURABLE | QUANTIFIABLE NEW (A) METRIC INFRUSTRUCTURE (OS) APPLICATION EXTERNAL (DEPENDENCY / ENDPOINT) REMEDIABLE ? ALEARTABLE ? LOG CORRELATION SCOPE OF IMPACT LEARN IN DEV | STG } } DEFINE IN DEV | STG } SHIP TO PROD DEVLOPMENT STAGING PRODUCTIONENVIRONMENT Tikal Knowledge
  • 37. INTRODUCTION TO MODERN MONITORING A METRIC’S LIFECYCLE NEW (A) METRIC INFRUSTRUCTURE (OS) APPLICATION EXTERNAL (DEPENDENCY / ENDPOINT) REMEDIABLE ? ALEARTABLE ? LOG CORRELATION SCOPE OF IMPACT LEARN IN DEV | STG } } DEFINE IN DEV | STG } SHIP TO PROD - QUANTIFIABLE METRICS: SEVERITY, CRITICAL STATE - EXPOSING A SERVICE - CONSUMING A SERVICE - - WHY DOES MY SERVICE HAVE AN OS IMPACT ? - - IS IT BY DESIGN ? - FALLBACK METHODS ? - ALTERNATE ENDPOINTS / RETRY ? - FEATURE TOGGLE - DEFINE SEVERITY 37 Tikal Knowledge
  • 38. INTRODUCTION TO MODERN MONITORING TSD PRINCIPLES Credit->http://opentsdb.net/overview.html Tikal Knowledge
  • 39. INTRODUCTION TO MODERN MONITORING DATAPOINTS Credit->https://www.datadoghq.com/blog/the-power-of-tagged-metrics/ IntoolslikePrometheusyoudon'tneedthetimestampitjustusescollectiontimestamp Tikal Knowledge
  • 40. INTRODUCTION TO MODERN MONITORING MIX ’N’ MATCH Tikal Knowledge
  • 41. INTRODUCTION TO MODERN MONITORING SHORT | MID | LONG TERM SOLUTIONS Tikal Knowledge
  • 43. INTRODUCTION TO MODERN MONITORING FEATURES ▸ Open-source systems monitoring and alerting toolkit ▸ A multi-dimensional data model (time series identified by metric name and key/value pairs) ▸ A flexible query language to leverage this dimensionality ▸ A no reliance on distributed storage; single server nodes are autonomous** ▸ A time series collection happens via a pull model over HTTP ▸ A pushing time series is supported via an intermediary gateway ▸ A targets are discovered via service discovery or static configuration ▸ A multiple modes of graphing and dashboarding support Tikal Knowledge
  • 44. INTRODUCTION TO MODERN MONITORING PROMETHEUS ARCHITECTURE Dashboarding Prometheus Server Alertmanager Retrieval / Collection DataSerie s Storage [DB] PromQ L web UI Prometheus server Prometheus server(s) Push Gateway Service Discovery Providers Prometheus server Prometheus exporters Tikal Knowledge
  • 45. INTRODUCTION TO MODERN MONITORING UNTIL NOW ‣ Try providing this to each developer ‣ Sensu has a very similar approach to APM … ‣ Complexity is the barrier … Tikal Knowledge
  • 46. INTRODUCTION TO MODERN MONITORING UNTIL NOW ‣ Pull has become an advantage … ‣ Severity is implied [TSD] ‣ False Positives reduction ‣ Docker makes it super simple ‣ Go Lang lightweight approach Tikal Knowledge
  • 48. INTRODUCTION TO MODERN MONITORING IMPLEMENTATION ‣ Review old system metrics & capabilities and decide what’s good whats bad ‣ What can move ‣ What needs to stay | integrate to new system ‣ Prometheus deployment is Automated from day 1 ‣ Prometheus exporter services are tagged and labeled per application stack | layer ‣ Preferably Dockerized ‣ Metric Design Workshops | meetings | slack group ‣ Alert Design Workshops | meetings | slack group ‣ Teams Mectic tags and Alerting & Escalation Tikal Knowledge
  • 49. INTRODUCTION TO MODERN MONITORING STEP1 - IMPLEMENT DISCOVERY AWS Discovery -> https://github.com/prometheus/prometheus/tree/master/discovery NEW NODE DEPLOYMEN T SERVICE DISCOVERY DEV STAGING PRODUCTION STACK / APP NAME Alertmanager Tikal Knowledge
  • 50. INTRODUCTION TO MODERN MONITORING STEP2 - IMPLEMENT EXPORTERS https://prometheus.io/docs/instrumenting/exporters/ Official node exporter -> https://github.com/prometheus/node_exporter Mssql Exporter -> https://hub.docker.com/r/awaragi/prometheus-mssql- exporter/ Nagios Exporter -> https://github.com/m-lab/prometheus-nagios-exporter Tikal Knowledge
  • 51. INTRODUCTION TO MODERN MONITORING STEP3 - IMPLEMENT CUSTOM APPLICATION METRICS https://prometheus.io/docs/instrumenting/exporters/ Windows WMI -> https://github.com/martinlindhe/wmi_exporter Java -> https://github.com/prometheus/jmx_exporter node.js -> https://www.npmjs.com/browse/keyword/prometheus .Net -> https://github.com/andrasm/prometheus-net Tikal Knowledge
  • 52. INTRODUCTION TO MODERN MONITORING STEP4 - ADAPT TO YOUR INFRA MONITORING [ FILTER || TAG || SELECTOR ] kubernetes_sd_config Tikal Knowledge
  • 53. INTRODUCTION TO MODERN MONITORING STEP 5 - METRIC DESIGN ‣ Review sample METRICS and GRAPHS ‣ Define | Reuse ‣ Naming conventions { https://prometheus.io/docs/practices/naming/ } ‣ Quantifiable [ numbers not strings … ] Tikal Knowledge
  • 55. INTRODUCTION TO MODERN MONITORING DEVELOPER TOOL Tikal Knowledge
  • 56. INTRODUCTION TO MODERN MONITORING DEVELOPER TOOL - SIMPLE GRAPHS Tikal Knowledge
  • 57. INTRODUCTION TO MODERN MONITORING DEVELOPER TOOL - METRICS - USING PROMQL ▸ Simple queries: ▸ rate(http_requests_total[5m]) ▸ Linear predictions ▸ predict_linear(node_filesystem_free[1h], 4*3600) Tikal Knowledge
  • 58. INTRODUCTION TO MODERN MONITORING GRAFANA - SIMILAR WORKING EXPERIENCE - MUCH NICER Tikal Knowledge
  • 59. INTRODUCTION TO MODERN MONITORING GRAFANA - SIMILAR WORKING EXPERIENCE - MUCH NICER Tikal Knowledge
  • 60. INTRODUCTION TO MODERN MONITORING STEP 6 - ALERT DESIGN ‣ Review new METRICS and GRAPHS define | design thresholds ‣ Define Severity ‣ Ownership ‣ Escalation lader Tikal Knowledge
  • 62. INTRODUCTION TO MODERN MONITORING ALERT DESIGN ▸ ALERT <alert name> ▸ IF <expression> ▸ [ FOR <duration> ] ▸ [ LABELS <label set> ] ▸ [ ANNOTATIONS <label set> ] Tikal Knowledge
  • 63. INTRODUCTION TO MODERN MONITORING ALERT FOR ANY INSTANCE THAT IS UNREACHABLE FOR >5 MINUTES. ALERT high_load IF node_load1 > 0.5 ANNOTATIONS {description="{{ $labels.instance }} of job {{ $labels.job }} is under high load.", summary="Instance {{ $labels.instance }} under high load"} Tikal Knowledge
  • 64. INTRODUCTION TO MODERN MONITORING STILL LOOKING FOR ONLINE EDITOR FOR EASE OF DEVELOPMENT https://github.com/alerta/prometheus-config Tikal Knowledge
  • 65. INTRODUCTION TO MODERN MONITORING SIMPLE YAML FILE route: receiver: 'slack' receivers: - name: 'slack' slack_configs: - send_resolved: true username: '<username>' channel: '#<channel-name>' api_url: '<incomming-webhook-url>' WHERE TO ROUTE TO ROUTER DETAILS Tikal Knowledge
  • 66. INTRODUCTION TO MODERN MONITORING ALERTING global: resolve_timeout: 5m smtp_require_tls: true pagerduty_url: https://events.pagerduty.com/generic/2010-04-15/create_event.json hipchat_url: https://api.hipchat.com/ opsgenie_api_host: https://api.opsgenie.com/ victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/ route: receiver: slack receivers: - name: slack slack_configs: - send_resolved: true api_url: <secret> channel: '#<channel-name>' username: <username> color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}' title: '{{ template "slack.default.title" . }}' title_link: '{{ template "slack.default.titlelink" . }}' pretext: '{{ template "slack.default.pretext" . }}' text: '{{ template "slack.default.text" . }}' fallback: '{{ template "slack.default.fallback" . }}' icon_emoji: '{{ template "slack.default.iconemoji" . }}' icon_url: '{{ template "slack.default.iconurl" . }}' templates: [] } }Channel Configuration Variables | Global configuration Tikal Knowledge
  • 67. INTRODUCTION TO MODERN MONITORING ALERT TEMPLATING ▸ What | How to say … https://prometheus.io/blog/2016/03/03/custom-alertmanager-templates/ - send_resolved: true api_url: <secret> channel: '#<channel-name>' username: <username> color: '{{ if eq .Status "firing" }}danger{{ else }} good{{ end }}' title: '{{ template "slack.default.title" . }}' title_link: '{{ template "slack.default.titlelink" . }}' pretext: '{{ template "slack.default.pretext" . }}' text: '{{ template "slack.default.text" . }}' fallback: '{{ template "slack.default.fallback" . }}' icon_emoji: '{{ template "slack.default.iconemoji" . }}' icon_url: '{{ template "slack.default.iconurl" . }}' Tikal Knowledge
  • 68. INTRODUCTION TO MODERN MONITORING SILENCING, VIA UI / API Tikal Knowledge
  • 69. INTRODUCTION TO MODERN MONITORING ANSWERS REQUIRED FEATURES Accessibility Scheduling SLA’s assured Auth & Authorization Escalation Durable & Resilient Forensics Automatic Flexible & Elastic Accountable Tikal Knowledge
  • 70. INTRODUCTION TO MODERN MONITORING NEXT STEPS INFRUSTRUCTURE (OS) APPLICATION EXTERNAL (DEPENDENCY / ENDPOINT) REMEDIABLE ? ALEARTABLE ? LOG CORRELATION } ALERT MANAGER LEGACY IDENTIFY CHOOSE Tikal Knowledge
  • 71. INTRODUCTION TO MODERN MONITORING DEMO TIME ‣ Docker-compose - ready fro R&D to start using to run create custom application Metrics. ‣ Prometheus, Node_exporter, Alertmanager Cadvisor, Grafana Tikal Knowledge
  • 72. INTRODUCTION TO MODERN MONITORING DOCKER SETTINGS - VOLUMES, NETWORKS version: ‘2' volumes: prometheus_data: {} grafana_data: {} networks: front-tier: driver: bridge back-tier: driver: bridge Docker-compose version Docker volumes for preometheus and grafana Docker Networks Tikal Knowledge
  • 73. INTRODUCTION TO MODERN MONITORING PROMETHEUS - OFFICIAL CONTAINER services: prometheus: image: prom/prometheus container_name: prometheus volumes: - ./prometheus/:/etc/prometheus/ - prometheus_data:/prometheus command: - '-config.file=/etc/prometheus/prometheus.yml' - '-storage.local.path=/prometheus' - '-alertmanager.url=http://alertmanager:9093' expose: - 9090 ports: - 9090:9090 links: - cadvisor:cadvisor - alertmanager:alertmanager depends_on: - cadvisor networks: - back-tier Docker Service name Docker volumes for prometheus and grafana Expose as service on specified port Ports to expose as service Link to cadvisor & alertmanager Network placement ‘back-tier’ Configuration Tikal Knowledge
  • 74. INTRODUCTION TO MODERN MONITORING NODE-EXPORTER [ NODE METRICS COLLECTOR ] node-exporter: container_name: node-exporter image: prom/node-exporter volumes: - /proc:/host/proc:ro - /sys:/host/sys:ro - /:/rootfs:ro command: '-collector.procfs=/host/proc -collector.sysfs=/host/sys -collector.filesystem.ignored-mount-points="^(/rootfs|/host|)/(sys| proc|dev|host|etc)($$|/)" collector.filesystem.ignored-fs- types="^(sys|proc|auto|cgroup|devpts|ns|au|fuse.lxc|mqueue)(fs|)$$"' expose: - 9100 networks: - back-tier Access to /proc /sys What to mount from OS to container for metric collection Tikal Knowledge
  • 75. INTRODUCTION TO MODERN MONITORING ALERT MANAGER alertmanager: image: prom/alertmanager ports: - 9093:9093 volumes: - ./alertmanager/:/etc/alertmanager/ networks: - back-tier command: - '-config.file=/etc/alertmanager/config.yml' - '-storage.path=/alertmanager' Tikal Knowledge
  • 76. INTRODUCTION TO MODERN MONITORING CADVISOR cadvisor: image: google/cadvisor volumes: - /:/rootfs:ro - /var/run:/var/run:rw - /sys:/sys:ro - /var/lib/docker/:/var/lib/docker:ro expose: - 8080 networks: - back-tier grafana: image: grafana/grafana depends_on: - prometheus ports: - 3000:3000 volumes: - grafana_data:/var/lib/grafana env_file: - config.monitoring networks: - back-tier - front-tier Tikal Knowledge
  • 77. INTRODUCTION TO MODERN MONITORING GRAFANA grafana: image: grafana/grafana depends_on: - prometheus ports: - 3000:3000 volumes: - grafana_data:/var/lib/grafana env_file: - config.monitoring networks: - back-tier - front-tier Tikal Knowledge
  • 78. INTRODUCTION TO MODERN MONITORING DOCKER PS CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3dcfd7c289cb grafana/grafana "/run.sh" 21 hours ago Up 4 minutes 0.0.0.0:3000->3000/tcp prometheus_grafana_1 2b2817fc0bd9 prom/prometheus "/bin/prometheus -..." 21 hours ago Up 4 minutes 0.0.0.0:9090->9090/tcp prometheus d2c6849d3bd9 google/cadvisor "/usr/bin/cadvisor..." 21 hours ago Up 4 minutes 8080/tcp prometheus_cadvisor_1 d4a3c3ceb97d prom/node-exporter "/bin/node_exporte..." 21 hours ago Up 4 minutes 9100/tcp node-exporter 75eb08791ea9 prom/alertmanager "/bin/alertmanager..." 21 hours ago Up 4 minutes 0.0.0.0:9093->9093/tcp prometheus_alertmanager_1 Tikal Knowledge
  • 79. INTRODUCTION TO MODERN MONITORING DEMO PROJECT ON GITHUB https://github.com/shelleg/monlog-compose-stack Tikal Knowledge
  • 80. INTRODUCTION TO MODERN MONITORING ‣ All containers - monitored by prometheus + graphed in a small nice project. Tikal Knowledge
  • 81. TEXT ROLLOUT [ LLD ] Tikal Knowledge
  • 82. INTRODUCTION TO MODERN MONITORING PLACEMENT OPTIONS ‣ 1 main prometheus server vs. 1 Prometheus server per team ‣ 1 Alert-manager [ with pre-defined “receivers” ] vs. 1 per team / concern Tikal Knowledge
  • 83. INTRODUCTION TO MODERN MONITORING DEPLOYMENT OPTIONS ‣ Automate deployment of prometheus server(s) / Alert-manager [ pre-defined “receivers” ] ‣ Ansible, puppet etc ‣ Jenkins ‣ The combination of the 2 ;) ‣ Automation helps solve the “one 2 Many” dilemma IMHO … Tikal Knowledge
  • 84. INTRODUCTION TO MODERN MONITORING DEVELOPER STACK ‣ Options: ‣ Personal Docker / Docker-compose[ private fork if desired ] ‣ A small startup.cmd / startup.sh starting go applications of promethes & alertmanager ‣ A centralized Grafana / Alertmanager with only prometheus on dev-machine ‣ Toolkit for ‣ develop metrics, alarms, graphs ‣ Add exporters to configuration [ tendency :: as common as you develop new services ] ‣ SDLC -> Gil Pull/MErge request mechanism Tikal Knowledge
  • 85. INTRODUCTION TO MODERN MONITORING DEVELOPER STACK(S) - EXAMPLE Tikal Knowledge
  • 86. INTRODUCTION TO MODERN MONITORING ALERTS IN SCM MASTER -> STG -> PRD Tikal Knowledge
  • 87. INTRODUCTION TO MODERN MONITORING POPULATE ALERTS | METRICS | DASHBOARDS VIA SCM 1. Use “ready made” || good starring point graphs from grafana dashboard exchange or build your own 2. Customize 3. Add / push to git master branch 4. “ci” server -> listen on GitHook -> push to staging 5. “ci” server -> wait for manual trigger -> push to production Tikal Knowledge
  • 88. INTRODUCTION TO MODERN MONITORING CONTINUOUS DELIVERY OPTIONS [ ADDING AN ALERT SAMPLE WORKFLOW ] master (dev) staging production DEVELOP DEPLOY TO STAGE DEPLOY TO PROD 1 centralized repo branch per env / prometheus instance Tikal Knowledge
  • 89. INTRODUCTION TO MODERN MONITORING CONTINUOUS DELIVERY OPTIONS [ ADDING GRAPHS ] master (dev) staging production DEVELOP DEPLOY TO STAGE DEPLOY TO PROD “Grafana Dashboard hub” - separate repo ? - part of monitoring repo ? Tikal Knowledge
  • 90. INTRODUCTION TO MODERN MONITORING CI PIPELINE -DATA ORIGINS & PRESENTATION Exporters REGION POD INSTANCE * } } App Metrics OS Metrics Filter Tags & Alerts Tikal Knowledge
  • 91. INTRODUCTION TO MODERN MONITORING CI PIPELINE DEV STAGING PRODUCTION STACK / APP NAME ALERTMANAGE R ALERTMANAGE R Web-hook (PR-builder) GRAFANA GRAFANA OPS “CLEANUP” ROUTINE(S) Tikal Knowledge
  • 92. INTRODUCTION TO MODERN MONITORING BUILDING THE PIPELINE ‣ Routine on submit / push builds to dev/stg ‣ Run daily / weekly deployments of Alerts (prometheus) | Dashboards (grafana) ‣ Avoid / rollback any manual changes of Alerts / Graphs etc ‣ Help make automation a common practice ‣ Scheduled task which syncs and re-configures the desired state from SCM Tikal Knowledge
  • 93. INTRODUCTION TO MODERN MONITORING MESURE THE PIPELINE ‣ Pipeline steps are monitored ‣ Expose metrics such as: ‣ deployment time & status [ in env | stack etc ] ‣ count (# of alerts, new vs old last week, month etc) ‣ Metric counters [ application metrics ] … ‣ [ Jenkins exporter || push gateway TBD ] Tikal Knowledge
  • 94. FEEDBACK / QUESTIONS ? I’M HERE … [email protected], 0545302525 Haggai Philip Zagury - Tikal Knowledge MONITORING HLD FullStack Developers Israel