Lightbend Pipelines Overview
Craig Blitz, Senior Product Director
Disclaimer
• © Lightbend Inc 2019. All Rights Reserved
• The information contained in this presentation is provided for informational purposes only. This information is based on
Lightbend’s current product plans and strategy, which are subject to change by Lightbend without notice. Lightbend shall not
be responsible for any damages arising out of the use of, or otherwise related to, this presentation.
• References in this presentation to Lightbend products, programs, or services do not imply that they will be available in all
countries in which Lightbend operates.
• Product release dates and/or capabilities referenced in this presentation may change at any time at Lightbend’s sole discretion
based on market opportunities or other factors, and are not intended to be a commitment to future product or feature
availability in any way.
Kiki Carter
Principal Enterprise Architect
Craig Blitz
Senior Product Director
4
Digital transformation mega-trends
5
Digital transformation requires building modern,
data-centric applications to enable the real-time enterprise
“The next frontier of competitive advantage
is the speedwith which you can
extract value from data.”
“Traditional application architectures
and platforms are obsolete.”
MIKE GUALTIERI
VP, Principal Analyst Forrester Research, Inc.
ANNE THOMAS
VP and Distinguished Analyst, Gartner, Inc
Reactive Microservices
6
The enabler of these characteristics is a Cloud-Ready, Message-Driven Model.
Lightbend codified these principles into the Reactive Manifesto in 2013; 24,000+ signatories around the world so far.
Responsive Resilient Elastic
React to
Users
React to
Failures
React to
Load Variance
Low latency / High
performance
Real-time / NRT
Graceful, Non-catastrophic
Recovery
Self-Healing
Responsive in the face of
changing loads
Big Data at Speed: “Fast Data”
7
Reactive
Microservices
Big Data
Fast Data
Applications
Increasing requirements for scalability & resilience
Streaming data pipelines are being served by Microservices
Increasing requirements for streaming vs. batch
Lightbend Platform
8
The complete architecture
for building and running
applications optimized for a
hybrid cloud infrastructure
New Challenges for Old Personas
App Dev
Domain Logic, Agile,
Scalable, Robust
New challenge: infusing app
with intelligence
Data Scientist
Big Data, Assisting business
in decision making
New challenge: exporting
models to assist
applications in real-time
decision making
Data Engineer
Data Preparation, Cleansing,
Streaming, Data Pipelines
New challenge: real-time
processing, meeting same
expectations (scalability,
robustness) as App Dev
Data
Scientist
App
Dev
Data
Eng.
Did I forget someone?
Age’s Dream
Imagine a world in which you can quickly
create an HTTP endpoint to ingest
streaming data and keep it safe (durable,
available, and scalable). And then, just as
easily, post-process that data using
Spark and then expose a projection of
the results as a microservice by
providing a simple flattening function.
You have now written an app that most
enterprises spend months on; You’ve
written it in one day and made it easily
deployable and scalable out of the box.
Sample use case - hand-coded and operated
● Lots of Moving Parts
● Manually managed Kafka
topics, serialization
● Schema?
● Hand-crafted and deployed
Akka-based REST
microservice
● Manual deploymentCluster
Why Lightbend Pipelines?
Developing streaming applications involves a lot of complexity
• Managing and composing applications
• Dealing with multiple frameworks
• Managing state within pipelines
• Evolving applications
• Operationalizing Machine Learning
• Integrating streaming pipelines with applications
Concepts
Streamlets: implement the actual stream
processing logic.
Streamlets have different shapes, including
ingress, egress, fan-in, fan-out, processors,
and views.
Each Streamlet is strongly typed with inlet
and outlet schemas
A Blueprint comprises a set of composed
streamlets. An Application Blueprint can
then be deployed.
{ Schema } Ingress
Streamlets
Egress
Cluster
UX - Workflow
Source Code
(Streamlets)
Blueprint
Docker Image
Runtime
Code, Compose, Build (local)
Push Images
Kubernetes Deploy
2
3
1
Without Pipelines…
Spark Support in Pipelines
class CallStatsAggregator extends SparkProcessor[CallRecord, AggregatedCallStats] {
override def createLogic = new ProcessorLogic[CallRecord, AggregatedCallStats](OutputMode.Update) {
override def process(inDataset: Dataset[CallRecord]): Dataset[AggregatedCallStats] = {
val query =
inDataset
.withColumn("ts", $"timestamp".cast(TimestampType))
.withWatermark("ts", "1 minute")
.groupBy(window($"ts", "1 minute"))
.agg(avg($"duration") as "avgCallDuration", sum($"duration") as "totalCallDuration")
.withColumn("windowDuration", $"window.end".cast(LongType) - $"window.start".cast(LongType))
query.select($"window.start".cast(LongType) as "startTime", $"windowDuration", $"avgCallDuration", $"totalCallDuration")
.as[AggregatedCallStats]
}
}
}
Scalability
Key-based partitioning
Partitions stored in Kafka
Clients consume from multiple Kafka Partitions
Implies scalability up to configurable # of Kafka Partitions
Ordering per Partition
Built on Kubernetes for Cloud Native Deployment
▪ Helm Chart-based install
▪ CLI plugin to Kubectl
▪ Pipelines and Kubernetes Deployment
• The Pipelines operator handles deployment for Pipelines Applications
• Pipelines Applications expressed as Custom Resource Definitions (CRDs)
• Pipelines Applications are submitted to Kubernetes as Custom Resources
• The Pipelines Operator deploys Streamlets as various types of Kubernetes
Pods
▪ Integrated Prometheus metrics for all components
• Deployed Pipelines Application
• Pipelines itself
• Core components like Kafka and Spark
Kubernetes Operators
▪ Pipelines relies on Operators to manage underlying frameworks
▪ What is a Kubernetes Operator?
• Application Specific Custom Controllers
• An Operator is the standard way to manage resource types on Kubernetes.
• Kafka clusters
• Spark jobs
• Pipelines Applications
• Encodes operational knowledge to automate lifecycle management
• Create/Update/Manage
▪ Lightbend contributes to the Strimzi Kafka and Google Spark operators
▪ The Lightbend Pipelines operator manages Lightbend Pipelines software.
Demo
Lightbend Pipelines
Orchestrate and Operate Multi-Stage, Multi-Component Streaming Data Pipelines and Microservices
Features
1. Develop, compose, and operate streaming apps
2. Choose streaming engine that works for you
3. Automated provisioning and management
4. Expose streaming data pipelines as microservices
5. Support collaborative multi-team development
Benefits
1. Ingest, transform, analyze, serve data in real time
2. Focus on business logic - avoid boilerplate code
3. Transform legacy systems into fast data apps
For More Information
https://www.lightbend.com/lightbend-platform
Email: craig.blitz@lightbend.com
Free Ebooks on Fast Data Architectures, Reactive, and all things Lightbend:
https://www.lightbend.com/ebooks
Schedule a demo: https://www.lightbend.com/lightbend-platform-demo

How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes

  • 1.
    Lightbend Pipelines Overview CraigBlitz, Senior Product Director
  • 2.
    Disclaimer • © LightbendInc 2019. All Rights Reserved • The information contained in this presentation is provided for informational purposes only. This information is based on Lightbend’s current product plans and strategy, which are subject to change by Lightbend without notice. Lightbend shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation. • References in this presentation to Lightbend products, programs, or services do not imply that they will be available in all countries in which Lightbend operates. • Product release dates and/or capabilities referenced in this presentation may change at any time at Lightbend’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way.
  • 3.
    Kiki Carter Principal EnterpriseArchitect Craig Blitz Senior Product Director
  • 4.
  • 5.
    5 Digital transformation requiresbuilding modern, data-centric applications to enable the real-time enterprise “The next frontier of competitive advantage is the speedwith which you can extract value from data.” “Traditional application architectures and platforms are obsolete.” MIKE GUALTIERI VP, Principal Analyst Forrester Research, Inc. ANNE THOMAS VP and Distinguished Analyst, Gartner, Inc
  • 6.
    Reactive Microservices 6 The enablerof these characteristics is a Cloud-Ready, Message-Driven Model. Lightbend codified these principles into the Reactive Manifesto in 2013; 24,000+ signatories around the world so far. Responsive Resilient Elastic React to Users React to Failures React to Load Variance Low latency / High performance Real-time / NRT Graceful, Non-catastrophic Recovery Self-Healing Responsive in the face of changing loads
  • 7.
    Big Data atSpeed: “Fast Data” 7 Reactive Microservices Big Data Fast Data Applications Increasing requirements for scalability & resilience Streaming data pipelines are being served by Microservices Increasing requirements for streaming vs. batch
  • 8.
    Lightbend Platform 8 The completearchitecture for building and running applications optimized for a hybrid cloud infrastructure
  • 9.
    New Challenges forOld Personas App Dev Domain Logic, Agile, Scalable, Robust New challenge: infusing app with intelligence Data Scientist Big Data, Assisting business in decision making New challenge: exporting models to assist applications in real-time decision making Data Engineer Data Preparation, Cleansing, Streaming, Data Pipelines New challenge: real-time processing, meeting same expectations (scalability, robustness) as App Dev Data Scientist App Dev Data Eng.
  • 10.
    Did I forgetsomeone?
  • 11.
    Age’s Dream Imagine aworld in which you can quickly create an HTTP endpoint to ingest streaming data and keep it safe (durable, available, and scalable). And then, just as easily, post-process that data using Spark and then expose a projection of the results as a microservice by providing a simple flattening function. You have now written an app that most enterprises spend months on; You’ve written it in one day and made it easily deployable and scalable out of the box.
  • 12.
    Sample use case- hand-coded and operated ● Lots of Moving Parts ● Manually managed Kafka topics, serialization ● Schema? ● Hand-crafted and deployed Akka-based REST microservice ● Manual deploymentCluster
  • 13.
    Why Lightbend Pipelines? Developingstreaming applications involves a lot of complexity • Managing and composing applications • Dealing with multiple frameworks • Managing state within pipelines • Evolving applications • Operationalizing Machine Learning • Integrating streaming pipelines with applications
  • 14.
    Concepts Streamlets: implement theactual stream processing logic. Streamlets have different shapes, including ingress, egress, fan-in, fan-out, processors, and views. Each Streamlet is strongly typed with inlet and outlet schemas A Blueprint comprises a set of composed streamlets. An Application Blueprint can then be deployed.
  • 15.
    { Schema }Ingress Streamlets Egress Cluster
  • 16.
    UX - Workflow SourceCode (Streamlets) Blueprint Docker Image Runtime Code, Compose, Build (local) Push Images Kubernetes Deploy 2 3 1
  • 17.
  • 18.
    Spark Support inPipelines class CallStatsAggregator extends SparkProcessor[CallRecord, AggregatedCallStats] { override def createLogic = new ProcessorLogic[CallRecord, AggregatedCallStats](OutputMode.Update) { override def process(inDataset: Dataset[CallRecord]): Dataset[AggregatedCallStats] = { val query = inDataset .withColumn("ts", $"timestamp".cast(TimestampType)) .withWatermark("ts", "1 minute") .groupBy(window($"ts", "1 minute")) .agg(avg($"duration") as "avgCallDuration", sum($"duration") as "totalCallDuration") .withColumn("windowDuration", $"window.end".cast(LongType) - $"window.start".cast(LongType)) query.select($"window.start".cast(LongType) as "startTime", $"windowDuration", $"avgCallDuration", $"totalCallDuration") .as[AggregatedCallStats] } } }
  • 19.
    Scalability Key-based partitioning Partitions storedin Kafka Clients consume from multiple Kafka Partitions Implies scalability up to configurable # of Kafka Partitions Ordering per Partition
  • 20.
    Built on Kubernetesfor Cloud Native Deployment ▪ Helm Chart-based install ▪ CLI plugin to Kubectl ▪ Pipelines and Kubernetes Deployment • The Pipelines operator handles deployment for Pipelines Applications • Pipelines Applications expressed as Custom Resource Definitions (CRDs) • Pipelines Applications are submitted to Kubernetes as Custom Resources • The Pipelines Operator deploys Streamlets as various types of Kubernetes Pods ▪ Integrated Prometheus metrics for all components • Deployed Pipelines Application • Pipelines itself • Core components like Kafka and Spark
  • 21.
    Kubernetes Operators ▪ Pipelinesrelies on Operators to manage underlying frameworks ▪ What is a Kubernetes Operator? • Application Specific Custom Controllers • An Operator is the standard way to manage resource types on Kubernetes. • Kafka clusters • Spark jobs • Pipelines Applications • Encodes operational knowledge to automate lifecycle management • Create/Update/Manage ▪ Lightbend contributes to the Strimzi Kafka and Google Spark operators ▪ The Lightbend Pipelines operator manages Lightbend Pipelines software.
  • 22.
  • 23.
    Lightbend Pipelines Orchestrate andOperate Multi-Stage, Multi-Component Streaming Data Pipelines and Microservices Features 1. Develop, compose, and operate streaming apps 2. Choose streaming engine that works for you 3. Automated provisioning and management 4. Expose streaming data pipelines as microservices 5. Support collaborative multi-team development Benefits 1. Ingest, transform, analyze, serve data in real time 2. Focus on business logic - avoid boilerplate code 3. Transform legacy systems into fast data apps
  • 24.
    For More Information https://www.lightbend.com/lightbend-platform Email:[email protected] Free Ebooks on Fast Data Architectures, Reactive, and all things Lightbend: https://www.lightbend.com/ebooks Schedule a demo: https://www.lightbend.com/lightbend-platform-demo