© 2020 SPLUNK INC.
More Than Monitoring:
How Observability Take
You From Firefighting
to Fire Prevention
© 2 0 1 9 S P L U N K I N C .
Stephane Estevez
EMEA Product Marketing Director, IT Markets, Splunk
During the course of this presentation, we may make forward-looking statements
regarding future events or plans of the company. We caution you that such statements
reflect our current expectations and estimates based on factors currently known to us
and that actual events or results may differ materially. The forward-looking statements
made in the this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, it may not contain current or
accurate information. We do not assume any obligation to update
any forward-looking statements made herein.
In addition, any information about our roadmap outlines our general product direction
and is subject to change at any time without notice. It is for informational purposes only,
and shall not be incorporated into any contract or other commitment. Splunk undertakes
no obligation either to develop the features or functionalities described or to include any
such feature or functionality in a future release.
Splunk, Splunk>, Turn Data Into Doing, The Engine for Machine Data, Splunk Cloud,
Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the
United States and other countries. All other brand names, product names, or
trademarks belong to their respective owners. © 2020 Splunk Inc. All rights reserved.
Forward-
Looking
Statements
© 2020 SPLUNK INC.
© 2020 SPLUNK INC.
• Observability in a nutshell
• Key Observability use cases
• Adding Observability to Monitoring
• Demo
• Adding AIOps
• About Splunk
Agenda
© 2020 SPLUNK INC.
Observability in a
Nutshell
© 2020 SPLUNK INC.
Distributed Services with High-Velocity Releases
= New Organizational Challenges
Investment in new observability and incident management tools becomes critical
© 2020 SPLUNK INC.
Understanding Observability Mindset
Source: Wikipedia
Survivorship bias or survival bias is the logical error of concentrating on the people or things that made it
past some selection process, and overlooking those that did not, typically because of their lack of visibility.
This can lead to false conclusions in several different ways.
“Gentlemen, you need
to put more armour-
plate where the holes
aren’t because that’s
where the holes
where on the airplane
that didn’t return”
–(Abraham Wald 1942)
A shot down aircraft
doesn’t externalize
its state
© 2020 SPLUNK INC.
Analyze
Monitoring
Observability
A Noun
A thing you have –
a property of a system
A Verb
Something you do to determine the state
of an application, a system, a service…
Act
If you are observable
I can monitor you
and take actions
find patterns
Turning Observability into Action
© 2020 SPLUNK INC.
Cloud-Native Journey Increases Operating
Complexity
Retain & Optimize Lift & Shift Re-Factor Re-Architect/
Cloud-Native
DEV OPS DEV OPS DEV OPS DEV OPS
Cloud Managed e.g. RDS,
DynamoDB, SaaS
Cloud First Architecture
Tightly Coupled Apps,
Slow Deployment Cycles
Primarily using
Cloud IaaS
More Modular, but
Dependent App
Components
Loosely Coupled
Microservices, and
Serverless Functions
VM VM VMVM VM VM VM VM VM
Private Public
VM VM VM VM VM VM
Private Public Private Public
© 2020 SPLUNK INC.
Adding Observability to Support Cloud-
Native Environments
Observability helps detect, investigate and resolve the unknown unknowns
Monitoring
Keep an eye on things
we know can
go wrong
Observability
Find the unexpected
and explain why
it happened
© 2 0 1 9 S P L U N K I N C .
“Focus on what you can’t see, the
unknowns. If the root cause of a
failure stays invisible (the bullet
holes) your IT-plane will be shot
down again”
So what is
Observability?
METRICS
TRACES
LOGS
© 2 0 1 9 S P L U N K I N C .
WHAT’S HAPPENING?
Observability The Three Pillars
WHY IS IT HAPPENING?
WHERE IS IT HAPPENING?
METRICS
EVENTS / LOGS
TRACES
© 2020 SPLUNK INC.
Enhancing Incident / Problem Management
Correlation / Investigation
Monitoring / Alerting
AIOps
Incident Response
Automation
VM VM VM VM VM VM
Private Public
LOGS METRICS TRACESImonitoryou
Observability
Private Public
Iamobservable
© 2020 SPLUNK INC.
All the
data
Real-time and
scalable
Analytics
/ML
What’s required for
Observability
Customer experience
Release quality and velocity
Developer efficiency
Business Adaptability
© 2020 SPLUNK INC.
Key Use Cases
© 2020 SPLUNK INC.
Frequent Use Cases
• Hybrid cloud monitoring
• Cloud cost management
• Cloud capacity planning
• Public cloud monitoring
• Kubernetes & container
monitoring
• Serverless monitoring
• KPIs monitoring using
custom metrics
• Observability-as-a-Service
• Application modernization
• Microservices monitoring &
troubleshooting
• Business SLx monitoring
• DevOps application lifecycle
monitoring
Cloud
Migration
Multi-Cloud
Monitoring
Application Performance
Monitoring
• Reduce remediation
time & Improve on-call
(“Incident Response”)
Incident
Response
© 2020 SPLUNK INC.
Observability
with Splunk
© 2020 SPLUNK INC.
“Observability means that you have the
data that you need (logs, metrics and
traces) for every single unit of work that is of
interest to the business.”
© 2020 SPLUNK INC.
Complexity is everywhere
even when you only have one public cloud
EVENTS
LOGS &
REPORTS
Elastic Load Balancing Access
Logs
Amazon CloudFront Access
logs
Amazon CloudTrail logs
Billing Reports
Application Logs Application S3 access
Logs
Other service logs AWS configs snapshots & history
files
METRICS
EMR Cluster Auto Scaling
EVENTS
LOGS
RULES/EVENTS
Events
Logs
Push path (via Splunk HEC)
Your IT team
© 2 0 1 9 S P L U N K I N C .
CISO
DevSysAdmin
MKT
??
?
? ?
Storage
Admin
DBA
GREEN
OUTSIDE
RED
INSIDE
SILOED
TEAMS
SILOED
TOOLS+ =
WATERMELON
EFFECT
CONSEQUENCE:
THE WAR ROOM WATERMELON EFFECT
© 2020 SPLUNK INC.
ENTERPRISE MANAGEMENT AND USABILITY
Infra Agent
Metrics for Host
Containers
VM, etc.
App Libraries
Custom
Metrics
Cloud Services
Integrations
Multi Region
Multi Cloud
Tracing / APM
APM Agent
Library
Event Collector
DATACOLLECTORS
DEPLOYMENT QUOTA / TEAMS SELF-SERVICE DATA ACCESS API
AGGREGATION
METRICS PIPELINE
TRACES PIPELINE
EVENTS PIPELINE
Metrics Dashboard
Grafana / Chronograph
Traces Dashboard
Alerts
DS / ML
SPARK
CI / CD
Automation
TRACES DB
TSDB
EVENTS DB
Replicated / Clustered
Replicated / Clustered
Replicated / Clustered
Long-Term Data Retention
CLOUD STORAGE
COLLECTION PIPELINE STORAGE VISUALIZATION
ALTERING
The DIY Approach is Too Complex
© 2020 SPLUNK INC.
Configs,
Tickets,
Changes…
DATA
VOLUME
FORMAT
LOCATION
Metrics
Logs
Clouds
WHAT’S
HAPPENING
?
WHY IT IS
HAPPENING
?
WHY IT IS
HAPPENING
?
Traces
Real User
Monitoring
(new)
WHO IS
IMPACTED ?
WHERE IT IS
HAPPENING
?
ANY :
On-call
WHO
SHOULD I
CALL
?
AutomationRELAX
© 2020 SPLUNK INC.
Configs,
Tickets,
Changes…
DATA
VOLUME
FORMAT
LOCATION
Metrics
Logs
Clouds
Traces
Real User
Monitoring
(new)
ANY :
On-call
Automation
Sources
2000+ apps available on
splunkbase.splunk.com
Logs
industry-leading solution to
consolidate and index any
log and machine data
(structured, unstructured,
complex multi-line
application logs…)
regardless of volume, format
or location
Metrics
Infrastructure Metrics:
massively scalable streaming
architecture
Traces
NoSampleTM Full-
Fidelity Tracing
& Open Standards
Events
unified operational console
of all your events and
service-impacting issues
RUM
leveraging our NoSample
Full-fidelity Tracing that
ingests ALL front-end traces
and connects them with their
corresponding backend
traces
On-call
Mobile- first incident
response using AI,
ChatOps, virtual war
rooms, Incident
timelines for
blameless incident
management
Orchestration
& Automation
Codify your workflows into
automated playbooks using our
visual editor (no coding
required) or the integrated
Python development
environment.
© 2020 SPLUNK INC.
Observability Suite
Single, tightly
integrated user
experience
NoSample™
Full-Fidelity
Real-Time
Streaming
Massively
Scalable
AI/ML-Driven
Analytics
OpenTelemetry
Logs | Metrics | Traces
Digital
Experience
Monitoring
Infrastructure MonitoringApplication
Performance
Monitoring
Log Investigation
Incident
Response
© 2020 SPLUNK INC.
Observability Suite
Single, tightly
integrated user
experience
NoSample™
Full-Fidelity
Real-Time
Streaming
Massively
Scalable
AI/ML-Driven
Analytics
OpenTelemetry
Logs | Metrics | Traces
Digital
Experience
Monitoring
Infrastructure MonitoringApplication
Performance
Monitoring
Log Investigation
Incident
Response
DEMO
© 2 0 1 9 S P L U N K I N C .
© 2020 SPLUNK INC.
Adding
AIOps and
Business
insights
© 2 0 2 0 S P L U N K I N C .
Keyword:
visibility
Correlatingbusiness
outcomes from all
‘altitudes’ is nowa must
have
INFRASTRUCTURE
APP
Cloud
Networks
Security
API
WEB Smartphones
and Devices
Custom
Applications
Storage
Servers
DB
APM
Containers /
microservices
APP logs
Syslogs
TraditionalITOps
Monitoring
BIZ / SERVICE
Call center
Revenue NPS
Customer
retention
Funnel
Exec
MBO’s
Business-value
Monitoring
Digital
Online
© 2 0 2 0 S P L U N K I N C .
Business &
IT service
monitoring
See across silos
DeepDive whenneeded
Metrics,traces andlogs
inone place for you
INFRASTRUCTURE
APP
Cloud
Networks
Security
API
WEB Smartphones
and Devices
Custom
Applications
Storage
Servers
DB
APM
Containers /
microservices
APP logs
Syslogs
TraditionalITOps
Monitoring
BIZ / SERVICE
Call center
Revenue NPS
Customer
retention
Funnel
Exec
MBO’s
Business-value
Monitoring
Digital
Online
© 2 0 2 0 S P L U N K I N C .
Business &
IT service
monitoring
See across silos
DeepDive whenneeded
Metrics,traces andlogs
inone place for you
INFRASTRUCTURE
APP
Cloud
Networks
Security
API
WEB Smartphones
and Devices
Custom
Applications
Storage
Servers
DB
APM
Containers /
microservices
APP logs
Syslogs
TraditionalITOps
Monitoring
BIZ / SERVICE
Call center
Revenue NPS
Customer
retention
Funnel
Exec
MBO’s
Business-value
Monitoring
Digital
Online
© 2020 SPLUNK INC.
Enhancing Incident / Problem Management
Correlation / Investigation
Monitoring / Alerting
AIOps
Incident Response
Automation
VM VM VM VM VM VM
Private Public
LOGS METRICS TRACESImonitoryou
Observability
Private Public
Iamobservable
AIOps
© 2020 SPLUNK INC.
Machine Learning:
Overview
© 2018 SPLUNK INC.
How to find a needle in multiple haystacks?
(choose your tool)
Network?
Database?
Middleware?
Hardware?
Wrong
command?
Connection?
Apache?
VM?
Mainframe?
Load
balancer?Wrong code
released?
Collect ALL data
• Collect from all silos
• Data in original raw format
• Add open sources apps to
ingest data on the fly
• Schema on the fly
• Dynamic thresholding
• Realtime correlation
Clustering & aggregation
• Real time event
clustering/correlation
• Reduce alert noise
• Behavioural analytics
• Deduplication
Add context
• Measure / report on
indicators that matters
• Add service / business
context
• Add actionable
information to detection
Salessso
Claims
Anomaly detection
• Catch issues that thresholds
cannot
• Reduce event clutter
• Deviation from past
behaviour
• Deviation from peers
• Unusual change in features
Assisted deep dive
investigation
• Root cause analysis
• Powerful & easy to use
search & investigate
language
?
Predictive
Analytics
• Predict service health
• Predict events
• Trend forecasting
• Detect influencing
entities
• Early warning of
failure
70% to 90%
Reduction in investigation time
15% to 45%
Reduction in high priority incidents
67% to 82%
Reduction in business
impact
© 2020 SPLUNK INC.
Machine Learning:
Predictive Analytics
© 2020 SPLUNK INC.
Predictive
Analytics
WHAT IT IS
Applying machine learning
to predict issues up
to 30 minutes before
they happen
WHY IT MATTERS
Find and fix issues
before they impact
your end users
KPIPredictions
Servicehealth
Predictions
© 2020 SPLUNK INC.
Machine Learning:
Event Analytics
© 2020 SPLUNK INC.
Event Analytics
Applications
Servers
Databases
We can extend the grouping across siloed monitoring tools, and across layers of the stack. What if I told
you that all the events in orange were associated with machines that run the Ecommerce Store.
Silo views
Silo views
Silo views
War room
Fatigue + Noise
eCommerce
store incident
Mobile app
incident
© 2020 SPLUNK INC.
Event Analytics
WHAT IT IS
Bring together events from Splunk or
any other tool to analyze events
together, reduce noise and
enhance triage
WHY IT MATTERS
A holistic view of your events can
provide better insights into the root
cause of issues and reduce
Operations Center workload
© 2020 SPLUNK INC.
Working with episodes
Machine Learning supported investigation
SMART IMPACT EVALUATION
• Blast radius
• Impacted entities
• Impacted business services
• Impact on KPIs and service health
• Service topology context
• Related tickets in ServiceNow
ROOT CAUSE ANALYSIS
• Auto identification of probable root cause
• Use of future alert prediction to score
episodes
• Contextual access to advanced diagnostic
data and tools
KNOWLEDGE REUSE
• Auto identifies similar episodes
• Allows operator to jump into solved
episodes for faster resolution
• Contextual access to full diagnostic
data
• Access to past episodes’ resolution
activities and people
© 2020 SPLUNK INC.
Your virtual War Room
Deep Dive Episode investigation
Deep dives is a powerful investigation tool
that allows users to drill down into the
collective behavior of multiple elements
related to an episode.
• View KPIs, metrics, events… in context
• Direct access to raw data for full
investigation visibility
• Navigate through service trees to bring
additional elements to the investigation,
easily
• Compare observed episode with past
behavior and quickly find differences
• One click creation of new multi
dimensional alerts when suspected
correlation of KPI behavior is identified
© 2020 SPLUNK INC.
About Splunk
© 2 0 2 0 S P L U N K I N C . © 2 0 1 9 S P L U N K I N C .
A Market LeaderSources: IDC ww Security Information & event management
Share 2018, IDC worldwide IT Operations Management
Software Market Share 2019 (May 2020), IDC WW IT
Operations Analytics Software Market Shares 2017 and/or
Gartner 2018 & 2019, Research In Action AIOps top 15
global vendors 2019. Gartner, Market Share: Enterprise
Infrastructure Software, Worldwide, 2019 (April 2020).
ITOM
IT Operations Management : tools
to manage provisioning, capacity,
performance and availability of IT
OBSERVE
ITOA
IT Operations Analytics : the
practice of monitoring systems,
and gathering, processing,
analyzing & interpreting data from
ITOps sources to guide decisions
& predict issues
DECIDE
AIOps
Artificial Intelligence Operations :
AIOps platforms enhance IT
operations through greater insights
by combining big data, machine
learning and visualization.
>>
>>
ACCELERATE
SIEM
Security Event Information Management
PROTECT
Splunk among AIOps
market leaders (top 5)
By Research in Action &
#1 - Gartner
Marketshare: Gartner's
Performance Analysis:
AIOps, ITIM and ITOM
APM
Application Performance
management : tools to monitor and
optimize applications
OBSERVE
Splunk named a Visionary in
our first-ever placement in
the Gartner MQ
Splunk #1 in Worldwide
+32.3% YoY
#2 IBM, #3 Microsoft
Splunk #1 in Worldwide
+32.6% YoY
#2 VMware, #3 IBM
Splunk #1 in Worldwide
+37.6% YoY, #2 IBM, #3 MicroFocus
© 2 0 2 0 S P L U N K I N C . © 2 0 1 9 S P L U N K I N C .
A Market Leader
ITOM
IT Operations Management : tools
to manage provisioning, capacity,
performance and availability of IT
ITOA
IT Operations Analytics : the
practice of monitoring systems,
and gathering, processing,
analyzing & interpreting data from
ITOps sources to guide decisions
& predict issues
SIEM
Security Event Information Management
AIOps
Artificial Intelligence Operations :
AIOps platforms enhance IT
operations through greater insights
by combining big data, machine
learning and visualization.
>>
>>
Sources: IDC ww Security Information & event management
Share 2018, IDC worldwide IT Operations Management
Software Market Share 2018, IDC WW IT Operations
Analytics Software Market Shares 2017 and/or Gartner 2018
& 2019, Research In Action AIOps top 15 global vendors
2019. Gartner, Market Share: Enterprise Infrastructure
Software, Worldwide, 2019 (April 2020).
Splunk #1 in Worldwide
+32.3% YoY
#2 IBM, #3 Microsoft
Splunk #1 in Worldwide
+32.6% YoY
#2 VMware, #3 IBM
Splunk #1 in Worldwide
+37.6% YoY, #2 IBM, #3 MicroFocus
OBSERVE
DECIDE
ACCELERATEPROTECT
Splunk among AIOps
market leaders (top 5)
By Research in Action &
#1 - Gartner
Marketshare: Gartner's
Performance Analysis:
AIOps, ITIM and ITOM
APM
Application Performance
management : tools to monitor and
optimize applications
OBSERVE
Splunk named a Visionary in
our first-ever placement in
the Gartner MQ
Splunk ranked #1 in Gartner’s 2019
Market Share for Performance
Analysis: AIOps, ITIM and Other
Monitoring Tools category
#1 Splunk 16.5% market share (+30.4%)
#2 IBM : 13.2% (-6.5%)
#3 Microsoft : 8.4% (+9.1%)
© 2020 SPLUNK INC.
Disjointed data sets
Siloed views
High MTTR
Negative customer experience
Zero downtime
Record sales
Record high customer satisfaction
« Best black Friday ever »
Sr. Director of SRE, Dell EMC
Thank You
© 2020 SPLUNK INC.

More Than Monitoring: How Observability Takes You From Firefighting to Fire Prevention

  • 1.
    © 2020 SPLUNKINC. More Than Monitoring: How Observability Take You From Firefighting to Fire Prevention
  • 2.
    © 2 01 9 S P L U N K I N C . Stephane Estevez EMEA Product Marketing Director, IT Markets, Splunk
  • 3.
    During the courseof this presentation, we may make forward-looking statements regarding future events or plans of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results may differ materially. The forward-looking statements made in the this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, it may not contain current or accurate information. We do not assume any obligation to update any forward-looking statements made herein. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionalities described or to include any such feature or functionality in a future release. Splunk, Splunk>, Turn Data Into Doing, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2020 Splunk Inc. All rights reserved. Forward- Looking Statements © 2020 SPLUNK INC.
  • 4.
    © 2020 SPLUNKINC. • Observability in a nutshell • Key Observability use cases • Adding Observability to Monitoring • Demo • Adding AIOps • About Splunk Agenda
  • 5.
    © 2020 SPLUNKINC. Observability in a Nutshell
  • 6.
    © 2020 SPLUNKINC. Distributed Services with High-Velocity Releases = New Organizational Challenges Investment in new observability and incident management tools becomes critical
  • 7.
    © 2020 SPLUNKINC. Understanding Observability Mindset Source: Wikipedia Survivorship bias or survival bias is the logical error of concentrating on the people or things that made it past some selection process, and overlooking those that did not, typically because of their lack of visibility. This can lead to false conclusions in several different ways. “Gentlemen, you need to put more armour- plate where the holes aren’t because that’s where the holes where on the airplane that didn’t return” –(Abraham Wald 1942) A shot down aircraft doesn’t externalize its state
  • 8.
    © 2020 SPLUNKINC. Analyze Monitoring Observability A Noun A thing you have – a property of a system A Verb Something you do to determine the state of an application, a system, a service… Act If you are observable I can monitor you and take actions find patterns Turning Observability into Action
  • 9.
    © 2020 SPLUNKINC. Cloud-Native Journey Increases Operating Complexity Retain & Optimize Lift & Shift Re-Factor Re-Architect/ Cloud-Native DEV OPS DEV OPS DEV OPS DEV OPS Cloud Managed e.g. RDS, DynamoDB, SaaS Cloud First Architecture Tightly Coupled Apps, Slow Deployment Cycles Primarily using Cloud IaaS More Modular, but Dependent App Components Loosely Coupled Microservices, and Serverless Functions VM VM VMVM VM VM VM VM VM Private Public VM VM VM VM VM VM Private Public Private Public
  • 10.
    © 2020 SPLUNKINC. Adding Observability to Support Cloud- Native Environments Observability helps detect, investigate and resolve the unknown unknowns Monitoring Keep an eye on things we know can go wrong Observability Find the unexpected and explain why it happened
  • 11.
    © 2 01 9 S P L U N K I N C . “Focus on what you can’t see, the unknowns. If the root cause of a failure stays invisible (the bullet holes) your IT-plane will be shot down again” So what is Observability? METRICS TRACES LOGS
  • 12.
    © 2 01 9 S P L U N K I N C . WHAT’S HAPPENING? Observability The Three Pillars WHY IS IT HAPPENING? WHERE IS IT HAPPENING? METRICS EVENTS / LOGS TRACES
  • 13.
    © 2020 SPLUNKINC. Enhancing Incident / Problem Management Correlation / Investigation Monitoring / Alerting AIOps Incident Response Automation VM VM VM VM VM VM Private Public LOGS METRICS TRACESImonitoryou Observability Private Public Iamobservable
  • 14.
    © 2020 SPLUNKINC. All the data Real-time and scalable Analytics /ML What’s required for Observability Customer experience Release quality and velocity Developer efficiency Business Adaptability
  • 15.
    © 2020 SPLUNKINC. Key Use Cases
  • 16.
    © 2020 SPLUNKINC. Frequent Use Cases • Hybrid cloud monitoring • Cloud cost management • Cloud capacity planning • Public cloud monitoring • Kubernetes & container monitoring • Serverless monitoring • KPIs monitoring using custom metrics • Observability-as-a-Service • Application modernization • Microservices monitoring & troubleshooting • Business SLx monitoring • DevOps application lifecycle monitoring Cloud Migration Multi-Cloud Monitoring Application Performance Monitoring • Reduce remediation time & Improve on-call (“Incident Response”) Incident Response
  • 17.
    © 2020 SPLUNKINC. Observability with Splunk
  • 18.
    © 2020 SPLUNKINC. “Observability means that you have the data that you need (logs, metrics and traces) for every single unit of work that is of interest to the business.”
  • 19.
    © 2020 SPLUNKINC. Complexity is everywhere even when you only have one public cloud EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs Amazon CloudFront Access logs Amazon CloudTrail logs Billing Reports Application Logs Application S3 access Logs Other service logs AWS configs snapshots & history files METRICS EMR Cluster Auto Scaling EVENTS LOGS RULES/EVENTS Events Logs Push path (via Splunk HEC) Your IT team
  • 20.
    © 2 01 9 S P L U N K I N C . CISO DevSysAdmin MKT ?? ? ? ? Storage Admin DBA GREEN OUTSIDE RED INSIDE SILOED TEAMS SILOED TOOLS+ = WATERMELON EFFECT CONSEQUENCE: THE WAR ROOM WATERMELON EFFECT
  • 21.
    © 2020 SPLUNKINC. ENTERPRISE MANAGEMENT AND USABILITY Infra Agent Metrics for Host Containers VM, etc. App Libraries Custom Metrics Cloud Services Integrations Multi Region Multi Cloud Tracing / APM APM Agent Library Event Collector DATACOLLECTORS DEPLOYMENT QUOTA / TEAMS SELF-SERVICE DATA ACCESS API AGGREGATION METRICS PIPELINE TRACES PIPELINE EVENTS PIPELINE Metrics Dashboard Grafana / Chronograph Traces Dashboard Alerts DS / ML SPARK CI / CD Automation TRACES DB TSDB EVENTS DB Replicated / Clustered Replicated / Clustered Replicated / Clustered Long-Term Data Retention CLOUD STORAGE COLLECTION PIPELINE STORAGE VISUALIZATION ALTERING The DIY Approach is Too Complex
  • 22.
    © 2020 SPLUNKINC. Configs, Tickets, Changes… DATA VOLUME FORMAT LOCATION Metrics Logs Clouds WHAT’S HAPPENING ? WHY IT IS HAPPENING ? WHY IT IS HAPPENING ? Traces Real User Monitoring (new) WHO IS IMPACTED ? WHERE IT IS HAPPENING ? ANY : On-call WHO SHOULD I CALL ? AutomationRELAX
  • 23.
    © 2020 SPLUNKINC. Configs, Tickets, Changes… DATA VOLUME FORMAT LOCATION Metrics Logs Clouds Traces Real User Monitoring (new) ANY : On-call Automation Sources 2000+ apps available on splunkbase.splunk.com Logs industry-leading solution to consolidate and index any log and machine data (structured, unstructured, complex multi-line application logs…) regardless of volume, format or location Metrics Infrastructure Metrics: massively scalable streaming architecture Traces NoSampleTM Full- Fidelity Tracing & Open Standards Events unified operational console of all your events and service-impacting issues RUM leveraging our NoSample Full-fidelity Tracing that ingests ALL front-end traces and connects them with their corresponding backend traces On-call Mobile- first incident response using AI, ChatOps, virtual war rooms, Incident timelines for blameless incident management Orchestration & Automation Codify your workflows into automated playbooks using our visual editor (no coding required) or the integrated Python development environment.
  • 24.
    © 2020 SPLUNKINC. Observability Suite Single, tightly integrated user experience NoSample™ Full-Fidelity Real-Time Streaming Massively Scalable AI/ML-Driven Analytics OpenTelemetry Logs | Metrics | Traces Digital Experience Monitoring Infrastructure MonitoringApplication Performance Monitoring Log Investigation Incident Response
  • 25.
    © 2020 SPLUNKINC. Observability Suite Single, tightly integrated user experience NoSample™ Full-Fidelity Real-Time Streaming Massively Scalable AI/ML-Driven Analytics OpenTelemetry Logs | Metrics | Traces Digital Experience Monitoring Infrastructure MonitoringApplication Performance Monitoring Log Investigation Incident Response DEMO
  • 26.
    © 2 01 9 S P L U N K I N C .
  • 27.
    © 2020 SPLUNKINC. Adding AIOps and Business insights
  • 28.
    © 2 02 0 S P L U N K I N C . Keyword: visibility Correlatingbusiness outcomes from all ‘altitudes’ is nowa must have INFRASTRUCTURE APP Cloud Networks Security API WEB Smartphones and Devices Custom Applications Storage Servers DB APM Containers / microservices APP logs Syslogs TraditionalITOps Monitoring BIZ / SERVICE Call center Revenue NPS Customer retention Funnel Exec MBO’s Business-value Monitoring Digital Online
  • 29.
    © 2 02 0 S P L U N K I N C . Business & IT service monitoring See across silos DeepDive whenneeded Metrics,traces andlogs inone place for you INFRASTRUCTURE APP Cloud Networks Security API WEB Smartphones and Devices Custom Applications Storage Servers DB APM Containers / microservices APP logs Syslogs TraditionalITOps Monitoring BIZ / SERVICE Call center Revenue NPS Customer retention Funnel Exec MBO’s Business-value Monitoring Digital Online
  • 30.
    © 2 02 0 S P L U N K I N C . Business & IT service monitoring See across silos DeepDive whenneeded Metrics,traces andlogs inone place for you INFRASTRUCTURE APP Cloud Networks Security API WEB Smartphones and Devices Custom Applications Storage Servers DB APM Containers / microservices APP logs Syslogs TraditionalITOps Monitoring BIZ / SERVICE Call center Revenue NPS Customer retention Funnel Exec MBO’s Business-value Monitoring Digital Online
  • 31.
    © 2020 SPLUNKINC. Enhancing Incident / Problem Management Correlation / Investigation Monitoring / Alerting AIOps Incident Response Automation VM VM VM VM VM VM Private Public LOGS METRICS TRACESImonitoryou Observability Private Public Iamobservable AIOps
  • 32.
    © 2020 SPLUNKINC. Machine Learning: Overview
  • 33.
    © 2018 SPLUNKINC. How to find a needle in multiple haystacks? (choose your tool) Network? Database? Middleware? Hardware? Wrong command? Connection? Apache? VM? Mainframe? Load balancer?Wrong code released? Collect ALL data • Collect from all silos • Data in original raw format • Add open sources apps to ingest data on the fly • Schema on the fly • Dynamic thresholding • Realtime correlation Clustering & aggregation • Real time event clustering/correlation • Reduce alert noise • Behavioural analytics • Deduplication Add context • Measure / report on indicators that matters • Add service / business context • Add actionable information to detection Salessso Claims Anomaly detection • Catch issues that thresholds cannot • Reduce event clutter • Deviation from past behaviour • Deviation from peers • Unusual change in features Assisted deep dive investigation • Root cause analysis • Powerful & easy to use search & investigate language ? Predictive Analytics • Predict service health • Predict events • Trend forecasting • Detect influencing entities • Early warning of failure 70% to 90% Reduction in investigation time 15% to 45% Reduction in high priority incidents 67% to 82% Reduction in business impact
  • 34.
    © 2020 SPLUNKINC. Machine Learning: Predictive Analytics
  • 35.
    © 2020 SPLUNKINC. Predictive Analytics WHAT IT IS Applying machine learning to predict issues up to 30 minutes before they happen WHY IT MATTERS Find and fix issues before they impact your end users KPIPredictions Servicehealth Predictions
  • 36.
    © 2020 SPLUNKINC. Machine Learning: Event Analytics
  • 37.
    © 2020 SPLUNKINC. Event Analytics Applications Servers Databases We can extend the grouping across siloed monitoring tools, and across layers of the stack. What if I told you that all the events in orange were associated with machines that run the Ecommerce Store. Silo views Silo views Silo views War room Fatigue + Noise eCommerce store incident Mobile app incident
  • 38.
    © 2020 SPLUNKINC. Event Analytics WHAT IT IS Bring together events from Splunk or any other tool to analyze events together, reduce noise and enhance triage WHY IT MATTERS A holistic view of your events can provide better insights into the root cause of issues and reduce Operations Center workload
  • 39.
    © 2020 SPLUNKINC. Working with episodes Machine Learning supported investigation SMART IMPACT EVALUATION • Blast radius • Impacted entities • Impacted business services • Impact on KPIs and service health • Service topology context • Related tickets in ServiceNow ROOT CAUSE ANALYSIS • Auto identification of probable root cause • Use of future alert prediction to score episodes • Contextual access to advanced diagnostic data and tools KNOWLEDGE REUSE • Auto identifies similar episodes • Allows operator to jump into solved episodes for faster resolution • Contextual access to full diagnostic data • Access to past episodes’ resolution activities and people
  • 40.
    © 2020 SPLUNKINC. Your virtual War Room Deep Dive Episode investigation Deep dives is a powerful investigation tool that allows users to drill down into the collective behavior of multiple elements related to an episode. • View KPIs, metrics, events… in context • Direct access to raw data for full investigation visibility • Navigate through service trees to bring additional elements to the investigation, easily • Compare observed episode with past behavior and quickly find differences • One click creation of new multi dimensional alerts when suspected correlation of KPI behavior is identified
  • 41.
    © 2020 SPLUNKINC. About Splunk
  • 42.
    © 2 02 0 S P L U N K I N C . © 2 0 1 9 S P L U N K I N C . A Market LeaderSources: IDC ww Security Information & event management Share 2018, IDC worldwide IT Operations Management Software Market Share 2019 (May 2020), IDC WW IT Operations Analytics Software Market Shares 2017 and/or Gartner 2018 & 2019, Research In Action AIOps top 15 global vendors 2019. Gartner, Market Share: Enterprise Infrastructure Software, Worldwide, 2019 (April 2020). ITOM IT Operations Management : tools to manage provisioning, capacity, performance and availability of IT OBSERVE ITOA IT Operations Analytics : the practice of monitoring systems, and gathering, processing, analyzing & interpreting data from ITOps sources to guide decisions & predict issues DECIDE AIOps Artificial Intelligence Operations : AIOps platforms enhance IT operations through greater insights by combining big data, machine learning and visualization. >> >> ACCELERATE SIEM Security Event Information Management PROTECT Splunk among AIOps market leaders (top 5) By Research in Action & #1 - Gartner Marketshare: Gartner's Performance Analysis: AIOps, ITIM and ITOM APM Application Performance management : tools to monitor and optimize applications OBSERVE Splunk named a Visionary in our first-ever placement in the Gartner MQ Splunk #1 in Worldwide +32.3% YoY #2 IBM, #3 Microsoft Splunk #1 in Worldwide +32.6% YoY #2 VMware, #3 IBM Splunk #1 in Worldwide +37.6% YoY, #2 IBM, #3 MicroFocus
  • 43.
    © 2 02 0 S P L U N K I N C . © 2 0 1 9 S P L U N K I N C . A Market Leader ITOM IT Operations Management : tools to manage provisioning, capacity, performance and availability of IT ITOA IT Operations Analytics : the practice of monitoring systems, and gathering, processing, analyzing & interpreting data from ITOps sources to guide decisions & predict issues SIEM Security Event Information Management AIOps Artificial Intelligence Operations : AIOps platforms enhance IT operations through greater insights by combining big data, machine learning and visualization. >> >> Sources: IDC ww Security Information & event management Share 2018, IDC worldwide IT Operations Management Software Market Share 2018, IDC WW IT Operations Analytics Software Market Shares 2017 and/or Gartner 2018 & 2019, Research In Action AIOps top 15 global vendors 2019. Gartner, Market Share: Enterprise Infrastructure Software, Worldwide, 2019 (April 2020). Splunk #1 in Worldwide +32.3% YoY #2 IBM, #3 Microsoft Splunk #1 in Worldwide +32.6% YoY #2 VMware, #3 IBM Splunk #1 in Worldwide +37.6% YoY, #2 IBM, #3 MicroFocus OBSERVE DECIDE ACCELERATEPROTECT Splunk among AIOps market leaders (top 5) By Research in Action & #1 - Gartner Marketshare: Gartner's Performance Analysis: AIOps, ITIM and ITOM APM Application Performance management : tools to monitor and optimize applications OBSERVE Splunk named a Visionary in our first-ever placement in the Gartner MQ Splunk ranked #1 in Gartner’s 2019 Market Share for Performance Analysis: AIOps, ITIM and Other Monitoring Tools category #1 Splunk 16.5% market share (+30.4%) #2 IBM : 13.2% (-6.5%) #3 Microsoft : 8.4% (+9.1%)
  • 44.
    © 2020 SPLUNKINC. Disjointed data sets Siloed views High MTTR Negative customer experience Zero downtime Record sales Record high customer satisfaction « Best black Friday ever » Sr. Director of SRE, Dell EMC
  • 45.
    Thank You © 2020SPLUNK INC.