SlideShare a Scribd company logo
Apache Flink 
Fast and reliable big data processing 
Aljoscha Krettek 
aljoscha@apache.org
What is Apache Flink? 
• Project undergoing incubation in the Apache Software 
Foundation 
• Originating from the Stratosphere research project 
started at TU Berlin in 2009 
• http://flink.incubator.apache.org 
• 59 contributors (doubled in ~4 months) 
• Has awesome squirrel logo
What is Apache Flink? 
Flink Client
Apache Flink 
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); 
flink.incubator.apache.org 
… 
env.execute();
Apache Flink 
DataSet<String> input = env.readTextFile(“/hello/there”); 
flink.incubator.apache.org
Apache Flink 
DataSet<String> input = env.readTextFile(“hdfs:///hello/there”); 
flink.incubator.apache.org
Apache Flink 
DataSet<Tuple2<String,Integer> input = env.readCsvFile(“/hello/there”) 
flink.incubator.apache.org 
.fieldDelimiter(‘|’) 
.lineDelimiter(“n") 
.ignoreFirstLine() 
.types(String.class, Integer.class);
Apache Flink 
DataSet<Beer> beers = input.map( new MapFunction<?, Beer>() { 
flink.incubator.apache.org 
public Beer map(? in) { 
return new Beer(in.f0, in.f1); 
} 
});
Apache Flink 
DataSet<Beer> beers = input.map( in -> new Beer(in.f0, in.f1) ); 
flink.incubator.apache.org
Apache Flink 
val beers = input.map( in => new Beer(in._1, in._2) ) 
flink.incubator.apache.org
Apache Flink 
DataSet<Beer> filtered = beers.filter( new FilterFunction<Beer>() { 
flink.incubator.apache.org 
public boolean filter(Beer in) { 
return beer.name.contains(“augustiner”); 
} 
});
Apache Flink 
flink.incubator.apache.org 
DataSet<Beer> grouped = beers 
.groupBy(“name”) 
.sortGroup(“rating”, Order.DESCENDING) 
.reduceGroup( new GroupReduceFunction<Beer, Beer>() { 
public Beer reduceGroup(Iterable<Beer> in, Collector<Beer> out) { 
out.collect(in.iterator().next()); 
} 
});
Apache Flink 
DataSet<Tuple2<String, Integer>> aggregated = input.groupBy(0).sum(1); 
flink.incubator.apache.org
Apache Flink 
flink.incubator.apache.org 
result.print();
Apache Flink 
result.writeAsText(“/ciao/for/now”); 
flink.incubator.apache.org
github.com/aljoscha/beer-analysis 
www.filedropper.com/beerdatacsv 
www.filedropper.com/beerdata (large) 
flink.incubator.apache.org 
github.com/apache/incubator-flink 
meetup.com/Apache-Flink-Meetup

More Related Content

What's hot (20)

PDF
elk_stack_alexander_szalonnas
Alexander Szalonnas
 
PPT
Parse Server Open Source
George Batschinski
 
PPT
Drupal and Elasticsearch
Nikolay Ignatov
 
PPTX
Hypershell - Sameen Jalal, Facebook - DevOpsDays Tel Aviv 2016
DevOpsDays Tel Aviv
 
PPTX
Everything you wanted to know about writing async, concurrent http apps in java
Baruch Sadogursky
 
PDF
A gentle intro of Apache zeppelin
Ahyoung Ryu
 
PDF
Building Robust Pipelines with Airflow
Erin Shellman
 
PDF
Why should I care about REST?
Miguel Sánchez Villafán
 
PDF
High Available Drupal
Bram Vogelaar
 
PPTX
What's new in c# 8.0
Moaid Hathot
 
PPTX
grlc: Bridging the Gap Between RESTful APIs and Linked Data
Albert Meroño-Peñuela
 
PDF
Faraday Blackhat 2011 Arsenal
Francisco Müller Amato
 
PDF
Real-time search in Drupal. Meet Elasticsearch
Alexei Gorobets
 
PDF
Generic Parse Server
davidolesch
 
PDF
Python + STIX = Awesome
stixproject
 
PPTX
Eyeing the Onion
bsidesaugusta
 
PDF
Python in the land of serverless
David Przybilla
 
PDF
kRouter
Kelp Chen
 
PPT
Learn ELK in docker
Larry Cai
 
PPTX
Collo -02 , en
지현 이
 
elk_stack_alexander_szalonnas
Alexander Szalonnas
 
Parse Server Open Source
George Batschinski
 
Drupal and Elasticsearch
Nikolay Ignatov
 
Hypershell - Sameen Jalal, Facebook - DevOpsDays Tel Aviv 2016
DevOpsDays Tel Aviv
 
Everything you wanted to know about writing async, concurrent http apps in java
Baruch Sadogursky
 
A gentle intro of Apache zeppelin
Ahyoung Ryu
 
Building Robust Pipelines with Airflow
Erin Shellman
 
Why should I care about REST?
Miguel Sánchez Villafán
 
High Available Drupal
Bram Vogelaar
 
What's new in c# 8.0
Moaid Hathot
 
grlc: Bridging the Gap Between RESTful APIs and Linked Data
Albert Meroño-Peñuela
 
Faraday Blackhat 2011 Arsenal
Francisco Müller Amato
 
Real-time search in Drupal. Meet Elasticsearch
Alexei Gorobets
 
Generic Parse Server
davidolesch
 
Python + STIX = Awesome
stixproject
 
Eyeing the Onion
bsidesaugusta
 
Python in the land of serverless
David Przybilla
 
kRouter
Kelp Chen
 
Learn ELK in docker
Larry Cai
 
Collo -02 , en
지현 이
 

Viewers also liked (20)

PDF
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Flink Forward
 
PPT
Slides chapter 11
Priyanka Shetty
 
PPTX
Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger
 
PDF
Apache Flink internals
Kostas Tzoumas
 
PDF
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Taiwan User Group
 
PPTX
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
PPTX
Apache Flink Hands On
Robert Metzger
 
PPT
Slides chapter 10
Priyanka Shetty
 
PDF
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
PDF
Streaming Analytics & CEP - Two sides of the same coin?
Till Rohrmann
 
PDF
A look at Flink 1.2
Stefan Richter
 
PPTX
The Stream Processor as a Database Apache Flink
DataWorks Summit/Hadoop Summit
 
PPTX
Unified Batch and Real-Time Stream Processing Using Apache Flink
Slim Baltagi
 
PDF
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Till Rohrmann
 
PPTX
Stateful Stream Processing at In-Memory Speed
Jamie Grier
 
PPTX
Streaming in the Wild with Apache Flink
DataWorks Summit/Hadoop Summit
 
PPTX
Stephan Ewen - Scaling to large State
Flink Forward
 
PPTX
Apache Flink Training: DataStream API Part 1 Basic
Flink Forward
 
PPTX
Stephan Ewen - Running Flink Everywhere
Flink Forward
 
PPTX
Flink Streaming
Gyula Fóra
 
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Flink Forward
 
Slides chapter 11
Priyanka Shetty
 
Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger
 
Apache Flink internals
Kostas Tzoumas
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Taiwan User Group
 
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
Apache Flink Hands On
Robert Metzger
 
Slides chapter 10
Priyanka Shetty
 
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Streaming Analytics & CEP - Two sides of the same coin?
Till Rohrmann
 
A look at Flink 1.2
Stefan Richter
 
The Stream Processor as a Database Apache Flink
DataWorks Summit/Hadoop Summit
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Slim Baltagi
 
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Till Rohrmann
 
Stateful Stream Processing at In-Memory Speed
Jamie Grier
 
Streaming in the Wild with Apache Flink
DataWorks Summit/Hadoop Summit
 
Stephan Ewen - Scaling to large State
Flink Forward
 
Apache Flink Training: DataStream API Part 1 Basic
Flink Forward
 
Stephan Ewen - Running Flink Everywhere
Flink Forward
 
Flink Streaming
Gyula Fóra
 
Ad

Similar to Apache Flink Hands-On (20)

PPTX
Flink in action
Artem Semenenko
 
PDF
Presentation on Japanese doc sprint
Go Chiba
 
PDF
Boost Your API with Asynchronous Programming in FastAPI
techprane
 
PPTX
Phalcon 2 - PHP Brazil Conference
Jackson F. de A. Mafra
 
PDF
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Databricks
 
PPTX
PyCon Canada 2015 - Is your python application secure
IMMUNIO
 
PDF
Is your python application secure? - PyCon Canada - 2015-11-07
Frédéric Harper
 
PPTX
Infrastructure as code, using Terraform
Harkamal Singh
 
PPTX
Hammock, a Good Place to Rest
Stratoscale
 
PPTX
PHP Conference - Phalcon hands-on
Jackson F. de A. Mafra
 
PPTX
Phalcon - Giant Killer
Jackson F. de A. Mafra
 
PPT
PHPExcel and OPENXML4J
Maarten Balliauw
 
PPTX
PHP from soup to nuts Course Deck
rICh morrow
 
PDF
From Zero to Stream Processing
Eventador
 
PDF
Automated User Tests with Apache Flex
Gert Poppe
 
PPTX
Presentation: Everything you wanted to know about writing async, high-concurr...
Baruch Sadogursky
 
PDF
Automated User Tests with Apache Flex
Gert Poppe
 
PDF
Cloud, Opensource, OPNFV and CI/CD for VNFs
Fatih Nar
 
PPTX
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Flink Forward
 
PPTX
Multi-tenant Flink as-a-service with Kafka on Hopsworks
Jim Dowling
 
Flink in action
Artem Semenenko
 
Presentation on Japanese doc sprint
Go Chiba
 
Boost Your API with Asynchronous Programming in FastAPI
techprane
 
Phalcon 2 - PHP Brazil Conference
Jackson F. de A. Mafra
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Databricks
 
PyCon Canada 2015 - Is your python application secure
IMMUNIO
 
Is your python application secure? - PyCon Canada - 2015-11-07
Frédéric Harper
 
Infrastructure as code, using Terraform
Harkamal Singh
 
Hammock, a Good Place to Rest
Stratoscale
 
PHP Conference - Phalcon hands-on
Jackson F. de A. Mafra
 
Phalcon - Giant Killer
Jackson F. de A. Mafra
 
PHPExcel and OPENXML4J
Maarten Balliauw
 
PHP from soup to nuts Course Deck
rICh morrow
 
From Zero to Stream Processing
Eventador
 
Automated User Tests with Apache Flex
Gert Poppe
 
Presentation: Everything you wanted to know about writing async, high-concurr...
Baruch Sadogursky
 
Automated User Tests with Apache Flex
Gert Poppe
 
Cloud, Opensource, OPNFV and CI/CD for VNFs
Fatih Nar
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Flink Forward
 
Multi-tenant Flink as-a-service with Kafka on Hopsworks
Jim Dowling
 
Ad

More from Aljoscha Krettek (16)

PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
Aljoscha Krettek
 
PPTX
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Aljoscha Krettek
 
PPTX
The Evolution of (Open Source) Data Processing
Aljoscha Krettek
 
PPTX
Apache Flink and what it is used for
Aljoscha Krettek
 
PPTX
The Past, Present, and Future of Apache Flink®
Aljoscha Krettek
 
PPTX
(Past), Present, and Future of Apache Flink
Aljoscha Krettek
 
PPTX
Python Streaming Pipelines with Beam on Flink
Aljoscha Krettek
 
PPTX
The Past, Present, and Future of Apache Flink
Aljoscha Krettek
 
PPTX
Robust stream processing with Apache Flink
Aljoscha Krettek
 
PDF
Unified stateful big data processing in Apache Beam (incubating)
Aljoscha Krettek
 
PPTX
Stream processing for the practitioner: Blueprints for common stream processi...
Aljoscha Krettek
 
PPTX
Advanced Flink Training - Design patterns for streaming applications
Aljoscha Krettek
 
PPTX
Apache Flink - A Stream Processing Engine
Aljoscha Krettek
 
PPTX
Adventures in Timespace - How Apache Flink Handles Time and Windows
Aljoscha Krettek
 
PPTX
Flink 0.10 - Upcoming Features
Aljoscha Krettek
 
PPTX
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Aljoscha Krettek
 
Apache Flink(tm) - A Next-Generation Stream Processor
Aljoscha Krettek
 
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Aljoscha Krettek
 
The Evolution of (Open Source) Data Processing
Aljoscha Krettek
 
Apache Flink and what it is used for
Aljoscha Krettek
 
The Past, Present, and Future of Apache Flink®
Aljoscha Krettek
 
(Past), Present, and Future of Apache Flink
Aljoscha Krettek
 
Python Streaming Pipelines with Beam on Flink
Aljoscha Krettek
 
The Past, Present, and Future of Apache Flink
Aljoscha Krettek
 
Robust stream processing with Apache Flink
Aljoscha Krettek
 
Unified stateful big data processing in Apache Beam (incubating)
Aljoscha Krettek
 
Stream processing for the practitioner: Blueprints for common stream processi...
Aljoscha Krettek
 
Advanced Flink Training - Design patterns for streaming applications
Aljoscha Krettek
 
Apache Flink - A Stream Processing Engine
Aljoscha Krettek
 
Adventures in Timespace - How Apache Flink Handles Time and Windows
Aljoscha Krettek
 
Flink 0.10 - Upcoming Features
Aljoscha Krettek
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Aljoscha Krettek
 

Apache Flink Hands-On

Editor's Notes

  • #4: Data processing engine which let you write programs in a functional style and executes them automatically in parallel