SlideShare a Scribd company logo
Multi-Tenant Flink-as-a-Service on YARN
Jim Dowling
Associate Prof @ KTH
Senior Researcher @ SICS
CEO @ Logical Clocks AB
Slides by Jim Dowling, Theofilos Kakantousis
Berlin, 13th September 2016
www.hops.io
@hopshadoop
A Polyglot
2
Polyglot Data Parallel Processing
•Stream Processing
- Beam/Flink, Spark
•ETL/Batch Processing
- Spark, MapReduce
•SQL-on-hadoop
- Hive, Presto, SparkSQL
•Distributed ML
- SparkML, FlinkML
•Deep Learning
- Distributed Tensorflow
3
Flink Standalone good enough for some
•Enterprises are polyglot due to economies of scale
•Standalone Flink works great for enterprises
- Dedicate some servers
- Dedicate some SREs
4
Polyglot Data Parallel Processing In Context
5
Data Processing
Spark, MR, Flink, Presto, Tensorflow
Storage
HDFS, MapR, S3, WAS
Resource Management
YARN, Mesos, Kubernetes
Metadata
Hive, Parquet, Authorization, Search
Flink for the Little Guy
•Flink-as-a-Service on Hops Hadoop
- Fully UI Driven, Easy to Install
•Project-Based Multi-tenancy
6
Hops
Flink-as-a-Service running on hops.site
7
SICS ICE: A datacenter research and test environment
Purpose: Increase knowledge, strengthen universities, companies and researchers
HopsFS Architecture
8
NameNodes
NDB
Leader
HDFS Client
DataNodes
Hops-YARN Architecture
9
ResourceMgrs
NDB
Scheduler
YARN Client
NodeManagers
Resource Trackers
Heartbeats
(70-95%)
AM Reqs
(5-30%)
HopsFS Throughput (Spotify Workload)
10
NDB Setup: 8 Nodes using Xeon E5-2620 2.40GHz Processors and 10GbE.
NameNodes: Xeon E5-2620 2.40GHz Processors machines and 10GbE.
HopsFS Metadata Scaleout
11Assuming 256MB Block Size, 100 GB JVM Heap for Apache Hadoop
Hopsworks
12
Hopsworks – Project-Based Multi-Tenancy
•A project is a collection of
- Users with Roles
- HDFS DataSets
- Kafka Topics
- Notebooks, Jobs
•Per-Project quotas
- Storage in HDFS
- CPU in YARN
• Uber-style Pricing
•Sharing across Projects
- Datasets/Topics
13
project
dataset 1
dataset N
Topic 1
Topic N
Kafka
HDFS
Hopsworks – Dynamic Roles
14
Alice@gmail.com
NSA__Alice
Authenticate
Users__Alice
Glassfish
HopsFS
HopsYARN
Projects
Secure
Impersonation
Kafka
X.509
Certificates
Look Ma, No Kerberos!
•For each project, a user is issued with a X.509
certificate, containing the project-specific userID.
•Services are also issued with X.509 certificates.
- Both user and service certs are signed with the same CA.
- Services extract the userID from RPCs to identify the caller.
•Netflix’ BLESS system is a similar model, with short-
lived certificates.
X.509 Certificate Per Project-Specific User
16
Alice@gmail.com
Authenticate
Add/Del
Users
Distributed
Database
Insert/Remove CertsProject
Mgr
Root
CA
Services
Hadoop
Spark
Kafka
etc
Cert Signing
Requests
Flink on YARN
•Two modes: detached or blocking
•Hopsworks supports detached mode
- Client started locally, then exits after the job is submitted
to YARN
- No accumulator results or exceptions from the
ExecutionEnvironment.execute()
- Can only kill YARN job, not Flink session. Cleanup issues.
•New Architecture proposal for a Flink Dispatcher
A Flink/Kafka Job on YARN with Hopsworks
18
Alice@gmail.com
1. Launch Flink Job
Distributed
Database
2. Get certs,
service endpoints
YARN Private
LocalResources
Flink/Kafka Streaming App
4. Materialize certs
3. YARN Job + config
6. Get Schema
7. Consume
Produce
5. Read Certs
Hopsworks
KafkaUtil
Flink Stream Producer in Secure Kafka
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
String topic = parameterTool.get("topic");
1. Discover: Schema Registry and Kafka Broker Endpoints
2. Create: Kafka Properties file with certs and broker details
3. Create: producer using Kafka Properties
4. Distribute: X.509 certs to all hosts on the cluster
5. Download: the Schema for the Topic from the Schema Registry
6. Do this all securely
DataStream<…> messageStream = env.addSource(…);
messageStream.addSink(producer);
env.execute("Write to Kafka");
19
Developer
Operations
Flink/Kafka Stream Producer in Hopsworks
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
String topic = parameterTool.get("topic");
FlinkProducer producer = KafkaUtil.getFlinkProducer(topic);
DataStream<…> messageStream = env.addSource(…);
messageStream.addSink(producer);
env.execute("Write to Kafka");
20https://github.com/hopshadoop/hops-kafka-examples
Flink/Kafka Stream Consumer in Hopsworks
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
String topic = parameterTool.get("topic");
FlinkConsumer consumer = KafkaUtil.getFlinkConsumer(topic);
DataStream<…> messageStream = env.addSource(consumer);
RollingSink<String> rollingSink = ... // HDFS path
messageStream.addSink(rollingSink);
env.execute(“Read from Kafka, write to HDFS");
21https://github.com/hopshadoop/hops-kafka-examples
Zeppelin Support for Flink
22
Karamel/Chef for Automated Installation
23
Google Compute Engine BareMetal
Demo
24
Summary
•Hopsworks provides first-class support for
Flink-as-a-Service
- Streaming or Batch Jobs
- Zeppelin Notebooks
•Hopworks simplifies secure use of Kafka in Flink on
YARN
•YARN support for Flink still a work-in-progress
25
Hops Team
Active: Jim Dowling, Seif Haridi, Tor Björn Minde,
Gautier Berthou, Salman Niazi, Mahmoud Ismail,
Theofilos Kakantousis, Johan Svedlund Nordström,
Konstantin Popov, Antonios Kouzoupis.
Ermias Gebremeskel, Daniel Bekele
Alumni: Vasileios Giannokostas, Misganu Dessalegn,
Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca,
K “Sri” Srijeyanthan, Steffen Grohsschmiedt,
Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems,
Stig Viaene, Hooman Peiro, Evangelos Savvidis,
Jude D’Souza, Qi Qi, Gayana Chandrasekara,
Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos,
Peter Buechler, Pushparaj Motamari, Hamid Afzali,
Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
Hops
[Hadoop For Humans]
Join us!
http://github.com/hopshadoop

More Related Content

PPTX
The top 3 challenges running multi-tenant Flink at scale
PDF
Fundamentals of Apache Kafka
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PPTX
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
PDF
Flink Complex Event Processing
PDF
ELK stack introduction
PDF
RocksDB Performance and Reliability Practices
PDF
ksqlDB: Building Consciousness on Real Time Events
The top 3 challenges running multi-tenant Flink at scale
Fundamentals of Apache Kafka
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Flink Complex Event Processing
ELK stack introduction
RocksDB Performance and Reliability Practices
ksqlDB: Building Consciousness on Real Time Events

What's hot (20)

PDF
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
PDF
Prometheus - basics
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PDF
Producer Performance Tuning for Apache Kafka
PDF
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
PDF
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
PPTX
PDF
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
PPTX
YugaByte DB Internals - Storage Engine and Transactions
PDF
Iceberg + Alluxio for Fast Data Analytics
PPTX
Apache Kafka 0.8 basic training - Verisign
PPTX
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
PDF
Apache Kafka – (Pattern and) Anti-Pattern
PDF
Introduction to Kafka and Event-Driven
PDF
Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...
PPTX
CAP Theorem and Split Brain Syndrome
PDF
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
PDF
Introduction to Apache Kafka
PDF
Kubernetes Observability with Prometheus by Example
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
Prometheus - basics
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Producer Performance Tuning for Apache Kafka
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
YugaByte DB Internals - Storage Engine and Transactions
Iceberg + Alluxio for Fast Data Analytics
Apache Kafka 0.8 basic training - Verisign
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Apache Kafka – (Pattern and) Anti-Pattern
Introduction to Kafka and Event-Driven
Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...
CAP Theorem and Split Brain Syndrome
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
APACHE KAFKA / Kafka Connect / Kafka Streams
Introduction to Apache Kafka
Kubernetes Observability with Prometheus by Example
Ad

Viewers also liked (6)

PDF
Data Science with the Help of Metadata
PPTX
Multi-tenant Flink as-a-service with Kafka on Hopsworks
PDF
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
PDF
Odsc workshop - Distributed Tensorflow on Hops
PDF
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
PDF
Spark Summit EU talk by Jim Dowling
Data Science with the Help of Metadata
Multi-tenant Flink as-a-service with Kafka on Hopsworks
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Odsc workshop - Distributed Tensorflow on Hops
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Spark Summit EU talk by Jim Dowling
Ad

Similar to Jim Dowling - Multi-tenant Flink-as-a-Service on YARN (20)

PPTX
On-premise Spark as a Service with YARN
PDF
Secure Streaming-as-a-Service with Kafka/Spark/Flink in Hopsworks
PDF
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
PDF
Spark summit-east-dowling-feb2017-full
PDF
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
PPTX
Chicago Flink Meetup: Flink's streaming architecture
PPTX
Stream processing on mobile networks
PPTX
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
PDF
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
PPTX
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
PPTX
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
PPTX
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
PDF
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
PPTX
Get most out of Spark on YARN
PDF
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
PDF
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
PPTX
Stephan Ewen - Running Flink Everywhere
PPTX
Flink Streaming @BudapestData
PDF
Flink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
PPT
Spark & Yarn better together 1.2
On-premise Spark as a Service with YARN
Secure Streaming-as-a-Service with Kafka/Spark/Flink in Hopsworks
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Spark summit-east-dowling-feb2017-full
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Chicago Flink Meetup: Flink's streaming architecture
Stream processing on mobile networks
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
Get most out of Spark on YARN
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Stephan Ewen - Running Flink Everywhere
Flink Streaming @BudapestData
Flink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
Spark & Yarn better together 1.2

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Autoscaling Flink with Reactive Mode
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
Flink powered stream processing platform at Pinterest
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
The Current State of Table API in 2022
PDF
Flink SQL on Pulsar made easy
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Processing Semantically-Ordered Streams in Financial Services
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PDF
Batch Processing at Scale with Flink & Iceberg
PPTX
Welcome to the Flink Community!
Building a fully managed stream processing platform on Flink at scale for Lin...
Evening out the uneven: dealing with skew in Flink
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing the Apache Flink Kubernetes Operator
Autoscaling Flink with Reactive Mode
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
One sink to rule them all: Introducing the new Async Sink
Tuning Apache Kafka Connectors for Flink.pptx
Flink powered stream processing platform at Pinterest
Apache Flink in the Cloud-Native Era
Where is my bottleneck? Performance troubleshooting in Flink
The Current State of Table API in 2022
Flink SQL on Pulsar made easy
Dynamic Rule-based Real-time Market Data Alerts
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Processing Semantically-Ordered Streams in Financial Services
Tame the small files problem and optimize data layout for streaming ingestion...
Batch Processing at Scale with Flink & Iceberg
Welcome to the Flink Community!

Recently uploaded (20)

PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Logistic Regression ml machine learning.pptx
PPTX
Azure Data management Engineer project.pptx
PDF
Linux OS guide to know, operate. Linux Filesystem, command, users and system
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Challenges and opportunities in feeding a growing population
PPT
Quality review (1)_presentation of this 21
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Data Science Trends & Career Guide---ppt
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
STUDY DESIGN details- Lt Col Maksud (21).pptx
Business Acumen Training GuidePresentation.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Launch Your Data Science Career in Kochi – 2025
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
Foundation of Data Science unit number two notes
Logistic Regression ml machine learning.pptx
Azure Data management Engineer project.pptx
Linux OS guide to know, operate. Linux Filesystem, command, users and system
Clinical guidelines as a resource for EBP(1).pdf
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction-to-Cloud-ComputingFinal.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Challenges and opportunities in feeding a growing population
Quality review (1)_presentation of this 21
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Data Science Trends & Career Guide---ppt
climate analysis of Dhaka ,Banglades.pptx
Reliability_Chapter_ presentation 1221.5784

Jim Dowling - Multi-tenant Flink-as-a-Service on YARN

  • 1. Multi-Tenant Flink-as-a-Service on YARN Jim Dowling Associate Prof @ KTH Senior Researcher @ SICS CEO @ Logical Clocks AB Slides by Jim Dowling, Theofilos Kakantousis Berlin, 13th September 2016 www.hops.io @hopshadoop
  • 3. Polyglot Data Parallel Processing •Stream Processing - Beam/Flink, Spark •ETL/Batch Processing - Spark, MapReduce •SQL-on-hadoop - Hive, Presto, SparkSQL •Distributed ML - SparkML, FlinkML •Deep Learning - Distributed Tensorflow 3
  • 4. Flink Standalone good enough for some •Enterprises are polyglot due to economies of scale •Standalone Flink works great for enterprises - Dedicate some servers - Dedicate some SREs 4
  • 5. Polyglot Data Parallel Processing In Context 5 Data Processing Spark, MR, Flink, Presto, Tensorflow Storage HDFS, MapR, S3, WAS Resource Management YARN, Mesos, Kubernetes Metadata Hive, Parquet, Authorization, Search
  • 6. Flink for the Little Guy •Flink-as-a-Service on Hops Hadoop - Fully UI Driven, Easy to Install •Project-Based Multi-tenancy 6 Hops
  • 7. Flink-as-a-Service running on hops.site 7 SICS ICE: A datacenter research and test environment Purpose: Increase knowledge, strengthen universities, companies and researchers
  • 10. HopsFS Throughput (Spotify Workload) 10 NDB Setup: 8 Nodes using Xeon E5-2620 2.40GHz Processors and 10GbE. NameNodes: Xeon E5-2620 2.40GHz Processors machines and 10GbE.
  • 11. HopsFS Metadata Scaleout 11Assuming 256MB Block Size, 100 GB JVM Heap for Apache Hadoop
  • 13. Hopsworks – Project-Based Multi-Tenancy •A project is a collection of - Users with Roles - HDFS DataSets - Kafka Topics - Notebooks, Jobs •Per-Project quotas - Storage in HDFS - CPU in YARN • Uber-style Pricing •Sharing across Projects - Datasets/Topics 13 project dataset 1 dataset N Topic 1 Topic N Kafka HDFS
  • 14. Hopsworks – Dynamic Roles 14 [email protected] NSA__Alice Authenticate Users__Alice Glassfish HopsFS HopsYARN Projects Secure Impersonation Kafka X.509 Certificates
  • 15. Look Ma, No Kerberos! •For each project, a user is issued with a X.509 certificate, containing the project-specific userID. •Services are also issued with X.509 certificates. - Both user and service certs are signed with the same CA. - Services extract the userID from RPCs to identify the caller. •Netflix’ BLESS system is a similar model, with short- lived certificates.
  • 16. X.509 Certificate Per Project-Specific User 16 [email protected] Authenticate Add/Del Users Distributed Database Insert/Remove CertsProject Mgr Root CA Services Hadoop Spark Kafka etc Cert Signing Requests
  • 17. Flink on YARN •Two modes: detached or blocking •Hopsworks supports detached mode - Client started locally, then exits after the job is submitted to YARN - No accumulator results or exceptions from the ExecutionEnvironment.execute() - Can only kill YARN job, not Flink session. Cleanup issues. •New Architecture proposal for a Flink Dispatcher
  • 18. A Flink/Kafka Job on YARN with Hopsworks 18 [email protected] 1. Launch Flink Job Distributed Database 2. Get certs, service endpoints YARN Private LocalResources Flink/Kafka Streaming App 4. Materialize certs 3. YARN Job + config 6. Get Schema 7. Consume Produce 5. Read Certs Hopsworks KafkaUtil
  • 19. Flink Stream Producer in Secure Kafka StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); String topic = parameterTool.get("topic"); 1. Discover: Schema Registry and Kafka Broker Endpoints 2. Create: Kafka Properties file with certs and broker details 3. Create: producer using Kafka Properties 4. Distribute: X.509 certs to all hosts on the cluster 5. Download: the Schema for the Topic from the Schema Registry 6. Do this all securely DataStream<…> messageStream = env.addSource(…); messageStream.addSink(producer); env.execute("Write to Kafka"); 19 Developer Operations
  • 20. Flink/Kafka Stream Producer in Hopsworks StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); String topic = parameterTool.get("topic"); FlinkProducer producer = KafkaUtil.getFlinkProducer(topic); DataStream<…> messageStream = env.addSource(…); messageStream.addSink(producer); env.execute("Write to Kafka"); 20https://github.com/hopshadoop/hops-kafka-examples
  • 21. Flink/Kafka Stream Consumer in Hopsworks StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); String topic = parameterTool.get("topic"); FlinkConsumer consumer = KafkaUtil.getFlinkConsumer(topic); DataStream<…> messageStream = env.addSource(consumer); RollingSink<String> rollingSink = ... // HDFS path messageStream.addSink(rollingSink); env.execute(“Read from Kafka, write to HDFS"); 21https://github.com/hopshadoop/hops-kafka-examples
  • 23. Karamel/Chef for Automated Installation 23 Google Compute Engine BareMetal
  • 25. Summary •Hopsworks provides first-class support for Flink-as-a-Service - Streaming or Batch Jobs - Zeppelin Notebooks •Hopworks simplifies secure use of Kafka in Flink on YARN •YARN support for Flink still a work-in-progress 25
  • 26. Hops Team Active: Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Johan Svedlund Nordström, Konstantin Popov, Antonios Kouzoupis. Ermias Gebremeskel, Daniel Bekele Alumni: Vasileios Giannokostas, Misganu Dessalegn, Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, K “Sri” Srijeyanthan, Steffen Grohsschmiedt, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Jude D’Souza, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
  • 27. Hops [Hadoop For Humans] Join us! http://github.com/hopshadoop

Editor's Notes

  • #6: Stream Processing Flink ETL workflow (batch) processing Spark SQL-on-hadoop Presto, SparkSQL, Hive Deep Learning Distributed Tensorflow
  • #14: Privileges – upload/download data, run analysis jobs Like RBAC solution. All access via HopsWorks.
  • #15: 14
  • #17: 16
  • #18: Netty dependency conflict with our app in blocking mode Impacts: application size, main class run on our multi-tenant application - System.exit(), logs are written locally No accumulator results or exceptions from the ExecutionEnvironment.execute() call Can only kill YARN job, not Flink session – cleanup issues Flink Dispatcher The client directly starts the Job in YARN, rather than bootstrapping a cluster and after that submitting the job to that cluster. The client can hence disconnect immediately after the job was submitted All user code libraries and config files are directly in the Application Classpath, rather than in the dynamic user code class loader Containers are requested as needed and will be released when not used any more The “as needed” allocation of containers allows for different profiles of containers (CPU / memory) to be used for different operators
  • #19: 18
  • #20: public class HopsKafkaUtil implements Serializable { KAFKA_BROKERADDR_ENV_VAR = "kafka.brokeraddress"; KAFKA_RESTENDPOINT = "kafka.restendpoint"; KAFKA_SESSIONID_ENV_VAR = "kafka.sessionid"; KAFKA_PROJECTID_ENV_VAR = "kafka.projectid"; KAFKA_K_CERTIFICATE_ENV_VAR = "kafka_k_certificate"; KAFKA_T_CERTIFICATE_ENV_VAR = "kafka_t_certificate"; String getHopsConsumer(String topic) {…} String getHopsProducer(String topic) {…} String getHopsFlinkKafkaConsumer(String topic) {…} String getHopsFlinkKafkaProducer(String topic) {…} String getSchema(String topicName, int versionId) {..} Map<String, String> getKafkaProps(String propsStr) {…} }
  • #21: HopsKafkaProperties.defaultProps())
  • #22: HopsKafkaProperties.defaultProps())
  • #23: https://gist.github.com/rawkintrevo/ad206879753733f5a536
  • #28: I need some sound-effects to go with that.