Cloud Native Observability Insights
Cloud Native Observability Insights
BRKCLD-2158
Cisco Webex App
Questions?
Use Cisco Webex App to chat
with the speaker after the session
How
1 Find this session in the Cisco Live Mobile App
2 Click “Join the Discussion”
3 Install the Webex App or go directly to the Webex space Enter your personal notes here
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 2
Agenda
• What is Cloud Native Observability (CNO)?
• What is M.E.L.T?
• Metrics
• Events (and Alerts)
• Logs
• Traces
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 3
What is Cloud
Native
Observability?
What is Cloud Native?
• “Cloud native technologies empower organizations to build and run scalable
applications in modern, dynamic environments such as public, private, and hybrid
clouds. Containers, service meshes, microservices, immutable infrastructure, and
declarative APIs exemplify this approach.
• These techniques enable loosely coupled systems that are resilient, manageable,
and observable. Combined with robust automation, they allow engineers to make
high-impact changes frequently and predictably with minimal toil.” - CNCF
• [Link]
• Other Cloud Native criteria include:
• Elasticity/Horizontal Scaling of Live Services
• Leveraging Common Frameworks (Application service leverages a Service Mesh)
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Persona/Role – Moving from Monitoring to
Observability
Cloud Native Observability
• Platform Operator/Developer
Apps/Services
• They may want to see the application through a Layer 7
(HTTP/gRPC) lens Data
• What is the latency/RPS/memory/CPU for each service Serverless
component?
Pods, Containers
• Where is the bottleneck?
M.E.L.T.
Security
Kubernetes
• Does each component adhere to an SLO?
L4-7 Networking
• Data Scientist/Data Engineer
L2-3 Networking
• They may want to see very specific parts of the
streaming data pipeline that is sub-component of the Operating System
overall application
Virtualization
• CISO/Security Architect/DevSecOps Compute
• They may want to see the same application view as the Storage
developer, but with a specific focus on CI/CD-centric
security (image scanning, code scanning) and
internal/external API security
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 7
Let’s look at a topology
Networking Service
VPC peering, Hybrid Cloud, etc.
DNS CA
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
“Full-Stack Observability” adds to traditional
monitoring to support seamless digital
experiences for modern architectures and teams
Monitoring Full-Stack Observability
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
Monitoring and Observability
Traditional New Observability
Monitoring Chain Dominant Design
Full-Stack
Observability Now
Detect Multi-domain
3 connected
trends forming
Cloud + AI/ML new dominant
DevOps Analysis design around
Diagnose
observability
Future
Automated
Actions
Fix
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
What is
M.E.L.T.?
Metrics
Metrics
Collect/Measure Data at Regular Intervals
Expose Collect Store Query
• Expose
• Infrastructure:
• AWS Elastic Compute 2 (EC2) VM hosting Elastic Kubernetes Service (EKS) worker node - CPU, Memory,
Storage, Network
• Application:
• NGINX, DB
• Collect
• Scrape from exposed sources
• Store
• Time-Series Database (TSDB)
• Query
• PromQL, MQL (monitoring query language), MetricsQL (VictoriaMetrics)
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 13
Metrics
Collect/Measure Data at Regular Intervals
Expose Collect Store Query
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 14
Metrics
Common Architectural Components
Service
Discovery
Alerts
UI
Collector Server/Engine
API
CLI
Storage
3rd Party UI
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
Metrics
[Link]
Prometheus Example
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
Events
and Alerts
Events
• Metrics and Events are two different data types:
• Metrics = regular/predictable data
• Events = irregular/unpredictable data
• Scheduled or unscheduled state changes
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
Events
• Events can be paired with other toolsets to provide a robust
Event<>Action framework
• Metrics, Pub/Sub, AI/ML, DevOps, etc.
• Event-Driven Architecture (EDA):
• KEDA – Kubernetes-Based Event-Driven Autoscaler: [Link]
• AWS EventBridge
• Many, many more
Target/
Metrics Event Rule Destination/
Action
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
Events – AWS EventBridge
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Events – AWS EventBridge
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
Alerts
• Alerts – A predefined trigger based on a threshold or event
• Static alert example:
• HTTP Request Per Second (RPS) of 90% triggers alert to Slack
• Kubernetes Node isn’t ready for 1 minutes (Prometheus Example)
- alert: KubernetesNodeReady
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: Kubernetes Node ready (instance {{ $[Link] }})
description: "Node {{ $[Link] }} unready \n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
Alerts – AWS CloudWatch Alerts
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
Alerts – AWS CloudWatch Example
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
Alerts – AWS CloudWatch Example
© 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public
Alerts – AWS CloudWatch Example
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
Logs
Logs
• Nearly everything in a Cloud Native (or other) environment produces logs in some form
• Logging has tremendous potential, but it is very complex to manage all the sources
and then derive value out of what the logs say
• Collection and data formatting should be simple, but it isn’t:
• Currently, K8s, doesn’t enforce uniform structure for log messages*
• You can’t safely assume all log formats are in JSON
• You may need to transform logs
• There are MANY gotchas on storage, forwarding, rotation – We don’t have time for
that today
*[Link]
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
Logs
tl;dr: Logs sucks
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
L ogs – Common Architecture Components
Output Plugins:
Input Plugins:
fluentd >
stdout >
elasticsearch
fluentd fluentd > S3
http > fluentd
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 30
Logs
• Do things the easier way:
• Kubernetes:
• Use Operators – Cisco (Banzai Cloud)
Open-Source Logging Operator
• [Link]
operator
• Fluentd/FluentBit and source
configuration
• Security (TLS, RBAC, etc)
• Output configuration
• AWS CloudWatch, S3, Azure Storage,
GCP Storage, Elasticsearch, Grafana Loki,
Kafka, etc.
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 31
Logging Operator – Example – One source, multi-outputs
– NGINX to Elasticsearch/Kibana & Grafana Loki
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 32
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 33
Traces
Traces
• Distributed tracing helps with:
• Service mapping (topology)
• Bottlenecks/Latency/Drops in a distributed architecture (network, microservice, etc.)
• Example projects/solutions:
• OpenTelemetry - Combo of OpenCensus + OpenTracing – Library-based collection
• Service Meshes – Istio, Linkerd, etc. – Sidecar-based collection
• Jaeger – Visualize traces
• W3C TraceContext/B3 TraceContext – Bringing some sanity to the format of a trace
ID
• AWS X-ray, GCP Cloud Trace, Azure Monitor (Application Insights) – Tracing
libraries and visualization service
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 35
Traces
• Primer:
• A “span” is the foundational element of a distributed trace – it represents an individual unit of
work
• A “span” can reference another span and when assembled, you have a “trace”
• “Context propagation” – Correlate trace metadata across service boundaries
• Not using a standard method for trace context propagation can lead to VERY painful
deployments and VERY expensive workarounds
svc_1 span ff9a7eb95d042655
ac6ee4da7079ddc7ff9a7eb95d042655
"traceID": "ac6ee4da7079ddc7ff9a7eb95d042655",
"spans": [
{
"traceID": "ac6ee4da7079ddc7ff9a7eb95d042655",
"spanID": "ff9a7eb95d042655",
"operationName": "[Link]/*",
"references": [],
CHILD_OF
"startTime": 1638308084933526,
svc_2 span
Trace
"duration": 803461, 799a0a38bbd339d4
{
"traceID": "ac6ee4da7079ddc7ff9a7eb95d042655",
"spanID": "799a0a38bbd339d4",
"operationName": "[Link]/*",
"references": [
{
"refType": "CHILD_OF",
"traceID": "ac6ee4da7079ddc7ff9a7eb95d042655",
"spanID": "ff9a7eb95d042655"
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
OpenTelemetry (OTel)
[Link]
• Language-specific
libraries
• Supports: Traces,
Metrics, Logs
• The Collector
recognizes multiple
Trace Context formats
• Different form factors
for the Collector
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
Example: OpenTelemetry Components in Action
Visualization,
Storage, etc.
Optional path
OTel
collector
(gateway)
sidecar/ OTel
collector
K8s
daemons (agent)
et
Application Code Application Code
svc_1 span
Trace
svc_2 span
svc_3 span
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 39
Tracing Deployment Example
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 40
Example OpenTelemetry Deploy - 1
Deploy a test KinD cluster
# kind create cluster
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
Example OpenTelemetry Deploy - 2
Deploy the OTel Collector Deploy the OTel Java Auto-instrumentation CRD
# kubectl apply -f - <<EOF # kubectl apply -f - <<EOF
apiVersion: [Link]/v1alpha1 apiVersion: [Link]/v1alpha1
kind: OpenTelemetryCollector kind: Instrumentation
metadata: metadata:
name: otel name: my-instrumentation
spec: spec:
config: | exporter:
receivers: endpoint: [Link]
otlp: propagators:
protocols: - tracecontext
grpc: - baggage
http: - b3
processors: sampler:
exporters: type: parentbased_traceidratio
logging: argument: "0.25"
jaeger: java:
endpoint: "simplest-collector:14250" image: [Link]/open-telemetry/opentelemetry-
tls: operator/autoinstrumentation-java:latest
insecure: true nodejs:
service: image: [Link]/open-telemetry/opentelemetry-
pipelines: operator/autoinstrumentation-nodejs:latest
traces: python:
receivers: [otlp] image: [Link]/open-telemetry/opentelemetry-
processors: [] operator/autoinstrumentation-python:latest
exporters: [jaeger] EOF
EOF
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 42
Example OpenTelemetry Deploy - 3
Deploy the Spring Pet Clinic service
# kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: spring-petclinic
spec:
selector:
matchLabels:
app: spring-petclinic
replicas: 1
template:
metadata:
labels:
app: spring-petclinic
annotations:
[Link]/inject: "true"
[Link]/inject-java: "true"
spec:
containers:
- name: app
image: [Link]/pavolloffay/spring-petclinic:latest
EOF
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
Example OpenTelemetry - Validation
# kubectl logs [Link]/otel-collector
. . . <output summarized>
builder/receivers_builder.go:73 Receiver started. {"kind": "receiver", "name": "otlp"}
. . .
jaegerexporter@v0.41.0/[Link] State of the connection with the Jaeger Collector backend {"kind": "exporter",
"name": "jaeger", "state": "READY"}
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 44
Hybrid MELT Support
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 45
Application Instrumentation Options
• Library/SDK-based:
• OpenTelemetry
• Cisco/AppD
• Cisco/Espagon
• AWS X-ray, GCP Cloud Trace, Azure Monitor (Application Insights)
• Many others
• Sidecar-based:
• Service Meshes – Istio, Linkerd, Consul Connect, KongHQ, etc.
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 46
Service Mesh
What is a Service Mesh? • Infrastructure layer for
service-to-service
Service Mesh Control Plane communication
• Can use a mesh of sidecar
proxies:
• Can inspect API
sidecar sidecar
svcB proxy svcC proxy transactions at Layer 7
podB podC
and 4 (TCP)
• Intelligent routing rules
can be applied
between endpoints
• Allow for tracing and some
UI sidecar
Service proxy application instrumentation
podA
without the need to add
Ingress/Gateway
User/Tool/Service
code/libraries/SDK to the
application
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
Service Mesh Observability with Proxies
• Service Meshes provide observability via their sidecar proxies
• Observability info:
• Mesh-specific metrics
• Application-specific metrics
• Distributed traces: Layer 4-7: TCP, HTTP, gRPC
• Access logs (mesh and apps)
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
Cisco Solutions
[Link]
us/solutions/full-stack-
[Link]
Cisco Calisti Operationalize the Service Mesh
[Link]
Multi-cloud, multi-cluster connectivity and
observability
Connect any on-prem and public cloud together
Traffic management ensures Complete application and Security at all layers between
smooth app updates health observability clusters and clouds
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
Demo
Cisco Full-Stack Observability
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 53
Differentiated Solution with Business Context
Full-Stack actions for the business
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
Full-Stack Observability
Builds on monitoring and visibility, and adds business context
Full-Stack Observability
Multiple domains and cross-functional teams
Business context and Impact Cross-domain full MELT and security DevOps and SRE
Real-time, distributed and hybrid apps Cloud and Edge native KPI: SLO with business context, insights
Issues and Incident remediation driven actions/automation,
Visibility/Observability
Per domain/team
Active and modern apps Telemetry based (MEL) subset KPI: performance, experience
Root Cause Identification Tools sprawl, some integrations
Monitoring
Per domain/team
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 55
Summary
• There are a lot of things to keep track of and a lot of tools to help you do so:
• Metrics, Events, Logs, Traces
• Proprietary solutions, open-source solutions
• Most solutions (vendor and OSS) do a handful of things well - most of the time up to you to
‘integrate’ them
• Next-gen solutions such as Cisco Full Stack Observability will reduce/remove the
burden of you having to stitch together various tools to gain visibility to – derive value
from and take action on your data
• Check out Cisco FSO: [Link]
[Link]
• Start working with Cisco Calisti!
• [Link]
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 56
Related Sessions
Session Code Title Speaker
BRKCLD-2759 Full-Stack Observability: The HOW! Carlos Pereira
BRKETI-2005 Simplifying Cloud Native Application Connectivity and Observability Ivan Padilla Ojeda
with Calisti
LTRAPP-2682 Building Observability Solutions on the FSO Platform Instructor-led Labs, labs 3
Renato Quedas
BRKAPP-4042 What is Full-Stack Observability (FSO) and How It Can Help You Joe Byrne, Wei Wang
Featuring customer EasyJet
BRKAPP-2098 Observe and Troubleshoot Cloud Native Applications for IT Ops and Vipul Shah
DevOps with AppDynamics Cloud
BRKAPP-2624 Full-stack Observability (FSO) for App Security in the Cloud or Randy Birdsall
Wherever
PSOAPP-1775 New AppDynamics Innovation in Cloud and Security Randy Birdsall, Eugene Kim
BRKAPP-1154 Do Tell About OTel: An Introduction to OpenTelemetry and How Wayne Brown
AppDynamics is Embracing It
BRKAPP-2322 Observability Starts Here: Enhance and Add Value to your Cloud Native Pranav Kumar
Capabilities
BRKAPP-3503 Custom Correlation on AppDynamics: The Secret Weapon for True Ivo Santos
Business Transaction Visibility
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 57
Complete your Session Survey
• Please complete your session survey
after each session. Your feedback
is very important.
• Complete a minimum of 4 session
surveys and the Overall Conference
survey (open from Thursday) to
receive your Cisco Live t-shirt.
• All surveys can be taken in the Cisco Events Mobile App or
by logging in to the Session Catalog and clicking the
"Attendee Dashboard” at
[Link]
[Link]
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 58
Continue Your Education
BRKCLD-2158 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 59
Thank you