SlideShare a Scribd company logo
© 2017 Mesosphere, Inc. All Rights Reserved. 1
FlinkForward 2017 - San Francisco
Flink meet DC/OS
Deploying Apache Flink at Scale
Elizabeth K. Joseph, @pleia2
Ravi Yadav, @RaaveYadav
© 2017 Mesosphere, Inc. All Rights Reserved. 2
Part 1
Introduction to Apache
Mesos, Marathon, and
DC/OS
Part 2
Demonstration of demo
data pipeline + Installing
Flink on DC/OS
Part 3
DC/OS 1.9 key features for
data services and beyond
Talk Outline
© 2017 Mesosphere, Inc. All Rights Reserved. 3
Apache Mesos:
The datacenter kernel
http://mesos.apache.org/
© 2017 Mesosphere, Inc. All Rights Reserved. 4
● Mesos can’t run applications on its
own.
● A Mesos framework is a distributed
system that has a scheduler.
● Schedulers like Marathon start and
keep your applications running. A bit
like a distributed init system.
● Mesos mechanics are fair and HA
● Learn more at
https://mesosphere.github.io/marat
hon/
Marathon
© 2017 Mesosphere, Inc. All Rights Reserved. 5
Introducing DC/OS
Solves common problems
● Resource management
● Task scheduling
● Container orchestration
● Self-healing infrastructure
● Logging and metrics
● Network management
● “Universe” of pre-configured apps (including Flink, Kafka…)
● Learn more and contribute at https://dcos.io/
© 2017 Mesosphere, Inc. All Rights Reserved. 6
DC/OS
Architecture
Overview
Security &
Governance
Container Orchestration Monitoring & Operations User Interface & Command Line
HDFS Jenkins Marathon Cassandra Flink
Spark Docker Kafka MongoDB +30 more...
DC/OS
Services & Containers
ANY INFRASTRUCTURE
© 2017 Mesosphere, Inc. All Rights Reserved. 7
Web-based GUI
https://dcos.io/docs/lates
t/usage/webinterface/
Interact with DC/OS (1/2)
© 2017 Mesosphere, Inc. All Rights Reserved. 8
Universe
© 2017 Mesosphere, Inc. All Rights Reserved. 9
CLI tool
https://dcos.io/docs/latest/usage/cli/
API
https://dcos.io/docs/latest/api/
Interact with DC/OS (2/2)
© 2017 Mesosphere, Inc. All Rights Reserved. 10
According to the December 2016 data Artisans-organized Apache Flink user survey
just under 30% of respondents were running Flink on Apache Mesos
https://dcos.io/blog/2017/apache-flink-on-dc-os-and-apache-mesos/
You may already be using Apache Mesos!
Version 1.2 of Flink includes support for Apache Mesos and DC/OS, “it is now possible
to run an highly available Flink cluster on Mesos”
https://flink.apache.org/news/2017/02/06/release-1.2.0.html#run-flink-with-apache-
mesos & https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/mesos.html
Flink on Apache Mesos and DC/OS
© 2017 Mesosphere, Inc. All Rights Reserved. 11
DEMOSDemo data pipeline + Installing Flink on DC/OS
© 2017 Mesosphere, Inc. All Rights Reserved.
DC/OS 1.9 - Data Services Ecosystem
WORKLOADS
● Pods
● GPU based
scheduling
DATA SERVICES
ECOSYSTEM
OPERATIONS
● Remote
Container Shell
● Unified Metrics
● Deployment
failure analyzer
● Alluxio
● Couchbase
● Datastax DSE
● Elastic (ELK)
● Redis
● Apache Flink
© 2017 Mesosphere, Inc. All Rights Reserved.
DC/OS 1.9 - Operations
WORKLOADS
● Pods
● GPU based
scheduling
DATA SERVICES
ECOSYSTEM
OPERATIONS
● Remote
Container Shell
● Unified Metrics
● Unified Logging
● Deployment
Failure
Debugging
● Upgrades &
Configuration
updates
© 2017 Mesosphere, Inc. All Rights Reserved. 14
● Open encrypted, interactive, remote session
to your containers
● Remotely execute commands for real time
app troubleshooting
● Provide developers access to their own
applications, not the entire host or cluster
my-laptop$ dcos task exec my-task /bin/bash
Starting /bin/bash in my-task ...
Connecting to remote my-task …
REMOTE CONTAINER SHELL
DC/OS: OPERATIONS
DC/OS 1.9
© 2017 Mesosphere, Inc. All Rights Reserved. 15
UNIFIED LOGGING
DC/OS: OPERATIONS
● Access application, DC/OS and OS logs
● Easily troubleshoot applications with critical
metadata such as container id and app id
● Integrate easily with existing logging systems
DC/OS 1.9
© 2017 Mesosphere, Inc. All Rights Reserved. 16
UNIFIED METRICS
DC/OS: OPERATIONS
● Single API for system, container and
application metrics
● Metadata such as host id and container id
are automatically added to assist in
debugging
● Integrate easily with existing metrics systems
DC/OS 1.9
Container
© 2017 Mesosphere, Inc. All Rights Reserved. 17
DEPLOYMENT FAILURE DEBUGGING
DC/OS: OPERATIONS
● Understand why your application is not
deploying
● Understand which nodes in the cluster can
accommodate the role, constraints, cpu,
mem, disk and port requirements for your
app
DC/OS 1.9
© 2017 Mesosphere, Inc. All Rights Reserved.
UPGRADES AND CONFIG UPDATES
DC/OS: OPERATIONS
DC/OS 1.9
18
● Generate new config for cluster nodes
$ dcos_generate_config.sh --generate-node-upgrade-script
<installed_cluster_version>
● Single command upgrade script for individual nodes
$ curl -O <Node upgrade script URL>
$ sudo bash ./dcos_node_upgrade.sh
© 2017 Mesosphere, Inc. All Rights Reserved.
DC/OS 1.9 - Workloads
WORKLOADS
● Pods
● GPU based
scheduling
DATA SERVICES
ECOSYSTEM
OPERATIONS
● Remote
Container Shell
● Unified Logging
● Unified Metrics
● Deployment
failure analyzer
© 2017 Mesosphere, Inc. All Rights Reserved. 20
● Schedule, deploy and scale multiple
containers on the same host(s) while sharing
IP address and storage volumes
● All containers in a pod instance run as if they
are running on a single host in pre-container
world
● Useful for migrating legacy applications or
building advanced micro services (side car
containers)
PODS
DC/OS: WORKLOADS
DC/OS 1.9
© 2017 Mesosphere, Inc. All Rights Reserved. 21
● Traditional monolithic apps on VMs
usually have support services such as log
shipper, message queuing clients
● Many support services assume
col-location on same host, and
local-host access to networking and
storage
● Pods simplify moving legacy monolithic
apps to containers, reducing risk and
accelerating migrations
DC/OS 1.9
PODS: MIGRATING LEGACY APPS TO CONTAINERS
DC/OS: WORKLOADS
© 2017 Mesosphere, Inc. All Rights Reserved. 22
● Advanced Micro Services patterns require
colocating containers together
● Support services include for example:
○ Logging or monitoring agents,
○ Backup tooling & Proxies
○ Data change watchers & Event publishers
● Pods simplify the building and maintenance of
complex such microservices
DC/OS 1.9
PODS: SUPPORT SERVICES (SIDE-CAR CONTAINERS)
DC/OS: WORKLOADS
© 2017 Mesosphere, Inc. All Rights Reserved.
GPU: WHY GPU?
DC/OS: WORKLOADS
DC/OS 1.9
● GPUs are needed for many machine
learning and deep learning applications
● GPUs are essential for real-time or near
real time machine learning models
● GPUs deliver from 10X to 100X
performance for some applications,
resulting lower $$$/IOPS and more
productivity to data science teams
● GPU applications include real time fraud
detection, genome sequencing, cohort
analysis and many others
© 2017 Mesosphere, Inc. All Rights Reserved.
GPU BASED SCHEDULING
DC/OS: WORKLOADS
DC/OS 1.9
24
● Test Locally with Nvidia-Docker, deploy to
production with DC/OS
● Isolate GPU instances and schedule workloads
just like CPU and memory, guaranteeing
performance
● Efficiently Share GPU resources across data
science team
● Simplify migrating machine learning models
across from dev to production, and across
clouds
© 2017 Mesosphere, Inc. All Rights Reserved.
OTHER IMPROVEMENTS
DC/OS
DC/OS 1.9
25
● Mesos 1.2
● Marathon 1.4
● Docker 1.12 and 1.13 (17.03-ce) support
● Centos 7.3 and CoreOS 1235.12.0 support
● Performance improvements across all networking
features.
● CNI support for 3rd party CNI plugins.
● 100s of additional bugfixes and tests
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS – Deploying Flink at Scale

More Related Content

PDF
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
PDF
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...
PDF
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
PDF
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
PDF
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
PPTX
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
PDF
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
PDF
Marton Balassi – Stateful Stream Processing
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Marton Balassi – Stateful Stream Processing

What's hot (20)

PDF
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
PDF
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
PDF
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
PDF
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
PDF
Alexander Kolb – Flink. Yet another Streaming Framework?
PDF
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
PDF
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
PPTX
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
PDF
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
PDF
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
PPTX
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
PDF
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
PDF
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
PDF
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
PDF
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
PDF
Stateful stream processing with Apache Flink
PDF
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
PPTX
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
PDF
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
PPTX
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Alexander Kolb – Flink. Yet another Streaming Framework?
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Stateful stream processing with Apache Flink
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
Ad

Similar to Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS – Deploying Flink at Scale (20)

PDF
Using DC/OS for Continuous Delivery - DevPulseCon 2017
PDF
Introduction to Apache Mesos and DC/OS
PPTX
Introduction to DC/OS
PDF
Introduction to DC/OS
PDF
Introduction to DC/OS
PDF
DOO-007_How to run containers in production, at scale!
PDF
DCOS Presentation
PDF
Downtime is not an option - day 2 operations - Jörg Schad
PDF
Apache Mesos and the new Open Source Architecture of the Modern Datacenter
PDF
OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...
PDF
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
PDF
Elastic data services on Apache Mesos via Mesosphere’s DCOS
PDF
Containerizing couchbase with microservice architecture on mesosphere.pptx
PPTX
DevOps in Age of Kubernetes
PPTX
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
PDF
Discover the all new Mesosphere DC/OS 1.10
PDF
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...
PDF
CI/CD with Docker, DC/OS, and Jenkins
PDF
Modern Container Orchestration (Without Breaking the Bank)
PDF
Easy Docker Deployments with Mesosphere DCOS on Azure
Using DC/OS for Continuous Delivery - DevPulseCon 2017
Introduction to Apache Mesos and DC/OS
Introduction to DC/OS
Introduction to DC/OS
Introduction to DC/OS
DOO-007_How to run containers in production, at scale!
DCOS Presentation
Downtime is not an option - day 2 operations - Jörg Schad
Apache Mesos and the new Open Source Architecture of the Modern Datacenter
OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
Elastic data services on Apache Mesos via Mesosphere’s DCOS
Containerizing couchbase with microservice architecture on mesosphere.pptx
DevOps in Age of Kubernetes
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
Discover the all new Mesosphere DC/OS 1.10
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...
CI/CD with Docker, DC/OS, and Jenkins
Modern Container Orchestration (Without Breaking the Bank)
Easy Docker Deployments with Mesosphere DCOS on Azure
Ad

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Autoscaling Flink with Reactive Mode
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
Flink powered stream processing platform at Pinterest
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PPTX
The Current State of Table API in 2022
PDF
Flink SQL on Pulsar made easy
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Processing Semantically-Ordered Streams in Financial Services
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PDF
Batch Processing at Scale with Flink & Iceberg
Building a fully managed stream processing platform on Flink at scale for Lin...
Evening out the uneven: dealing with skew in Flink
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing the Apache Flink Kubernetes Operator
Autoscaling Flink with Reactive Mode
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
One sink to rule them all: Introducing the new Async Sink
Tuning Apache Kafka Connectors for Flink.pptx
Flink powered stream processing platform at Pinterest
Apache Flink in the Cloud-Native Era
Where is my bottleneck? Performance troubleshooting in Flink
Using the New Apache Flink Kubernetes Operator in a Production Deployment
The Current State of Table API in 2022
Flink SQL on Pulsar made easy
Dynamic Rule-based Real-time Market Data Alerts
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Processing Semantically-Ordered Streams in Financial Services
Tame the small files problem and optimize data layout for streaming ingestion...
Batch Processing at Scale with Flink & Iceberg

Recently uploaded (20)

PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PDF
Lecture1 pattern recognition............
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Database Infoormation System (DBIS).pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Global journeys: estimating international migration
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
A Quantitative-WPS Office.pptx research study
PPTX
Supervised vs unsupervised machine learning algorithms
Business Ppt On Nestle.pptx huunnnhhgfvu
oil_refinery_comprehensive_20250804084928 (1).pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
Lecture1 pattern recognition............
Fluorescence-microscope_Botany_detailed content
Database Infoormation System (DBIS).pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction to machine learning and Linear Models
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Miokarditis (Inflamasi pada Otot Jantung)
Global journeys: estimating international migration
Major-Components-ofNKJNNKNKNKNKronment.pptx
A Quantitative-WPS Office.pptx research study
Supervised vs unsupervised machine learning algorithms

Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS – Deploying Flink at Scale

  • 1. © 2017 Mesosphere, Inc. All Rights Reserved. 1 FlinkForward 2017 - San Francisco Flink meet DC/OS Deploying Apache Flink at Scale Elizabeth K. Joseph, @pleia2 Ravi Yadav, @RaaveYadav
  • 2. © 2017 Mesosphere, Inc. All Rights Reserved. 2 Part 1 Introduction to Apache Mesos, Marathon, and DC/OS Part 2 Demonstration of demo data pipeline + Installing Flink on DC/OS Part 3 DC/OS 1.9 key features for data services and beyond Talk Outline
  • 3. © 2017 Mesosphere, Inc. All Rights Reserved. 3 Apache Mesos: The datacenter kernel http://mesos.apache.org/
  • 4. © 2017 Mesosphere, Inc. All Rights Reserved. 4 ● Mesos can’t run applications on its own. ● A Mesos framework is a distributed system that has a scheduler. ● Schedulers like Marathon start and keep your applications running. A bit like a distributed init system. ● Mesos mechanics are fair and HA ● Learn more at https://mesosphere.github.io/marat hon/ Marathon
  • 5. © 2017 Mesosphere, Inc. All Rights Reserved. 5 Introducing DC/OS Solves common problems ● Resource management ● Task scheduling ● Container orchestration ● Self-healing infrastructure ● Logging and metrics ● Network management ● “Universe” of pre-configured apps (including Flink, Kafka…) ● Learn more and contribute at https://dcos.io/
  • 6. © 2017 Mesosphere, Inc. All Rights Reserved. 6 DC/OS Architecture Overview Security & Governance Container Orchestration Monitoring & Operations User Interface & Command Line HDFS Jenkins Marathon Cassandra Flink Spark Docker Kafka MongoDB +30 more... DC/OS Services & Containers ANY INFRASTRUCTURE
  • 7. © 2017 Mesosphere, Inc. All Rights Reserved. 7 Web-based GUI https://dcos.io/docs/lates t/usage/webinterface/ Interact with DC/OS (1/2)
  • 8. © 2017 Mesosphere, Inc. All Rights Reserved. 8 Universe
  • 9. © 2017 Mesosphere, Inc. All Rights Reserved. 9 CLI tool https://dcos.io/docs/latest/usage/cli/ API https://dcos.io/docs/latest/api/ Interact with DC/OS (2/2)
  • 10. © 2017 Mesosphere, Inc. All Rights Reserved. 10 According to the December 2016 data Artisans-organized Apache Flink user survey just under 30% of respondents were running Flink on Apache Mesos https://dcos.io/blog/2017/apache-flink-on-dc-os-and-apache-mesos/ You may already be using Apache Mesos! Version 1.2 of Flink includes support for Apache Mesos and DC/OS, “it is now possible to run an highly available Flink cluster on Mesos” https://flink.apache.org/news/2017/02/06/release-1.2.0.html#run-flink-with-apache- mesos & https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/mesos.html Flink on Apache Mesos and DC/OS
  • 11. © 2017 Mesosphere, Inc. All Rights Reserved. 11 DEMOSDemo data pipeline + Installing Flink on DC/OS
  • 12. © 2017 Mesosphere, Inc. All Rights Reserved. DC/OS 1.9 - Data Services Ecosystem WORKLOADS ● Pods ● GPU based scheduling DATA SERVICES ECOSYSTEM OPERATIONS ● Remote Container Shell ● Unified Metrics ● Deployment failure analyzer ● Alluxio ● Couchbase ● Datastax DSE ● Elastic (ELK) ● Redis ● Apache Flink
  • 13. © 2017 Mesosphere, Inc. All Rights Reserved. DC/OS 1.9 - Operations WORKLOADS ● Pods ● GPU based scheduling DATA SERVICES ECOSYSTEM OPERATIONS ● Remote Container Shell ● Unified Metrics ● Unified Logging ● Deployment Failure Debugging ● Upgrades & Configuration updates
  • 14. © 2017 Mesosphere, Inc. All Rights Reserved. 14 ● Open encrypted, interactive, remote session to your containers ● Remotely execute commands for real time app troubleshooting ● Provide developers access to their own applications, not the entire host or cluster my-laptop$ dcos task exec my-task /bin/bash Starting /bin/bash in my-task ... Connecting to remote my-task … REMOTE CONTAINER SHELL DC/OS: OPERATIONS DC/OS 1.9
  • 15. © 2017 Mesosphere, Inc. All Rights Reserved. 15 UNIFIED LOGGING DC/OS: OPERATIONS ● Access application, DC/OS and OS logs ● Easily troubleshoot applications with critical metadata such as container id and app id ● Integrate easily with existing logging systems DC/OS 1.9
  • 16. © 2017 Mesosphere, Inc. All Rights Reserved. 16 UNIFIED METRICS DC/OS: OPERATIONS ● Single API for system, container and application metrics ● Metadata such as host id and container id are automatically added to assist in debugging ● Integrate easily with existing metrics systems DC/OS 1.9 Container
  • 17. © 2017 Mesosphere, Inc. All Rights Reserved. 17 DEPLOYMENT FAILURE DEBUGGING DC/OS: OPERATIONS ● Understand why your application is not deploying ● Understand which nodes in the cluster can accommodate the role, constraints, cpu, mem, disk and port requirements for your app DC/OS 1.9
  • 18. © 2017 Mesosphere, Inc. All Rights Reserved. UPGRADES AND CONFIG UPDATES DC/OS: OPERATIONS DC/OS 1.9 18 ● Generate new config for cluster nodes $ dcos_generate_config.sh --generate-node-upgrade-script <installed_cluster_version> ● Single command upgrade script for individual nodes $ curl -O <Node upgrade script URL> $ sudo bash ./dcos_node_upgrade.sh
  • 19. © 2017 Mesosphere, Inc. All Rights Reserved. DC/OS 1.9 - Workloads WORKLOADS ● Pods ● GPU based scheduling DATA SERVICES ECOSYSTEM OPERATIONS ● Remote Container Shell ● Unified Logging ● Unified Metrics ● Deployment failure analyzer
  • 20. © 2017 Mesosphere, Inc. All Rights Reserved. 20 ● Schedule, deploy and scale multiple containers on the same host(s) while sharing IP address and storage volumes ● All containers in a pod instance run as if they are running on a single host in pre-container world ● Useful for migrating legacy applications or building advanced micro services (side car containers) PODS DC/OS: WORKLOADS DC/OS 1.9
  • 21. © 2017 Mesosphere, Inc. All Rights Reserved. 21 ● Traditional monolithic apps on VMs usually have support services such as log shipper, message queuing clients ● Many support services assume col-location on same host, and local-host access to networking and storage ● Pods simplify moving legacy monolithic apps to containers, reducing risk and accelerating migrations DC/OS 1.9 PODS: MIGRATING LEGACY APPS TO CONTAINERS DC/OS: WORKLOADS
  • 22. © 2017 Mesosphere, Inc. All Rights Reserved. 22 ● Advanced Micro Services patterns require colocating containers together ● Support services include for example: ○ Logging or monitoring agents, ○ Backup tooling & Proxies ○ Data change watchers & Event publishers ● Pods simplify the building and maintenance of complex such microservices DC/OS 1.9 PODS: SUPPORT SERVICES (SIDE-CAR CONTAINERS) DC/OS: WORKLOADS
  • 23. © 2017 Mesosphere, Inc. All Rights Reserved. GPU: WHY GPU? DC/OS: WORKLOADS DC/OS 1.9 ● GPUs are needed for many machine learning and deep learning applications ● GPUs are essential for real-time or near real time machine learning models ● GPUs deliver from 10X to 100X performance for some applications, resulting lower $$$/IOPS and more productivity to data science teams ● GPU applications include real time fraud detection, genome sequencing, cohort analysis and many others
  • 24. © 2017 Mesosphere, Inc. All Rights Reserved. GPU BASED SCHEDULING DC/OS: WORKLOADS DC/OS 1.9 24 ● Test Locally with Nvidia-Docker, deploy to production with DC/OS ● Isolate GPU instances and schedule workloads just like CPU and memory, guaranteeing performance ● Efficiently Share GPU resources across data science team ● Simplify migrating machine learning models across from dev to production, and across clouds
  • 25. © 2017 Mesosphere, Inc. All Rights Reserved. OTHER IMPROVEMENTS DC/OS DC/OS 1.9 25 ● Mesos 1.2 ● Marathon 1.4 ● Docker 1.12 and 1.13 (17.03-ce) support ● Centos 7.3 and CoreOS 1235.12.0 support ● Performance improvements across all networking features. ● CNI support for 3rd party CNI plugins. ● 100s of additional bugfixes and tests