DIVIDE, DISTRIBUTE AND CONQUER:

STREAM V. BATCH
Stream v. Batch
Who am I?
Solutions Architect
Who am I?
Solutions Architect
Developer Advocate
Who am I?
Solutions Architect
Developer Advocate
@gamussa in internetz
Who am I?
Solutions Architect
Developer Advocate
@gamussa in internetz
Hey you, yes, you, go follow me in twitter ©
Who am I?
@gamussa @confluentinc @DataSciCon
BATCH PROCESSING
Data at rest
@gamussa @confluentinc @DataSciCon
Data and Queries
Origin and processing
@gamussa @confluentinc @DataSciCon
@gamussa @confluentinc @DataSciCon
Data…
@gamussa @confluentinc @DataSciCon
Data…
@gamussa @confluentinc @DataSciCon
✓ … inherently immutable
Data…
✓ … time-based
@gamussa @confluentinc @DataSciCon
CRUD -> CR
@gamussa @confluentinc @DataSciCon
Processing is a query
@gamussa @confluentinc @DataSciCon
Processing is a query
Function on full data set
@gamussa @confluentinc @DataSciCon
Processing is a query
Function on full data set
Projection
@gamussa @confluentinc @DataSciCon
Processing is a query
Function on full data set
Projection
Aggregations
@gamussa @confluentinc @DataSciCon
Processing is a query
Function on full data set
Projection
Aggregations
Joins
@gamussa @confluentinc @DataSciCon
Lambda architecture origins
http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
[DataSciCon] Divide, distribute and conquer  stream v. batch
[DataSciCon] Divide, distribute and conquer  stream v. batch
@gamussa @confluentinc @DataSciCon
https://mapr.com/developercentral/lambda-architecture/
Lambda Architecture
@gamussa @confluentinc @DataSciCon
@gamussa @confluentinc @DataSciCon
TFW Trying to explain modern big data
landscape
@gamussa @confluentinc @DataSciCon
@gamussa @confluentinc @DataSciCon
STREAM PROCESSING
Data is motion
@gamussa @confluentinc @DataSciCon
Streaming Platform
@gamussa @confluentinc @DataSciCon
Streaming Platform
@gamussa @confluentinc @DataSciCon
@gamussa @confluentinc @DataSciCon
Interesting cases
Before You Go
I FOUND YOUR LACK OF FAULT TOLERANCE
DISTURBING
Data is too important to
store it in one computer
[DataSciCon] Divide, distribute and conquer  stream v. batch
[DataSciCon] Divide, distribute and conquer  stream v. batch
[DataSciCon] Divide, distribute and conquer  stream v. batch
[DataSciCon] Divide, distribute and conquer  stream v. batch
@gamussa @confluentinc @DataSciCon
How to process
«infinite» data?
@gamussa @confluentinc @DataSciCon
Time model
@gamussa @confluentinc @DataSciCon
Time model
Different use cases time semantics
@gamussa @confluentinc @DataSciCon
Time model
Different use cases time semantics
Majority of use cases require event-
time semantics
@gamussa @confluentinc @DataSciCon
Time model
Different use cases time semantics
Majority of use cases require event-
time semantics
Other use cases may require
processing-time or special variants
like ingestion-time
@gamussa @confluentinc @DataSciCon
Time Model
@gamussa @confluentinc @DataSciCon
Time Model
@gamussa @confluentinc @DataSciCon
Time Model
@gamussa @confluentinc @DataSciCon
Windowing
Input data, where
colors represent

different users events
Rectangles denote

different event-time

windows
processing-time
event-time
windowing
alice
bob
dave
@gamussa @confluentinc @DataSciCon
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
@gamussa @confluentinc @DataSciCon
Windowing
Windowing is an operation that groups
events
Most commonly needed: time windows,
session windows
Examples:
✗Real-time monitoring: 5-minute averages
✗Reader behavior on a website: user browsing sessions
@gamussa @confluentinc @DataSciCon
Out-of-order and late data
Is very common in practice, not a rare
corner case
✗Related to time model discussion
@gamussa @confluentinc @DataSciCon
Out-of-order and late data
@gamussa @confluentinc @DataSciCon
Out-of-order and late data
Users with mobile phones enter

airplane, lose Internet connectivity
@gamussa @confluentinc @DataSciCon
Out-of-order and late data
Users with mobile phones enter

airplane, lose Internet connectivity
Emails are being written

during the 10h flight
@gamussa @confluentinc @DataSciCon
Out-of-order and late data
Users with mobile phones enter

airplane, lose Internet connectivity
Emails are being written

during the 10h flight
Internet connectivity is restored,

phones will send queued emails now
@gamussa @confluentinc @DataSciCon
Stream Processing: results
@gamussa @confluentinc @DataSciCon
Stream Processing: results
• Yes, it’s possible to get computation
results in real time
@gamussa @confluentinc @DataSciCon
Stream Processing: results
• Yes, it’s possible to get computation
results in real time
• Windows – finite view of infinite data
• Based on temporal characteristics of the evet
@gamussa @confluentinc @DataSciCon
Stream Processing: results
• Yes, it’s possible to get computation
results in real time
• Windows – finite view of infinite data
• Based on temporal characteristics of the evet
• Late event processing
• You choose how long to wait
@gamussa @confluentinc @DataSciCon
DEMO
Let’s analyze flights
@gamussa @confluentinc @DataSciCon
https://www.confluent.io/blog/predicting-flight-arrivals-with-the-apache-kafka-streams-api/
@gamussa @confluentinc @DataSciCon
Example: Training Flight Prediction Model
@gamussa @confluentinc @DataSciCon
https://github.com/confluentinc/online-inferencing-blog-
application
@gamussa @confluentinc @DataSciCon
Thanks!
questions?
@gamussa
viktor@confluent.io

More Related Content

PDF
2014-06-19 - HRSSUG - Getting Started with Office 365
PPTX
Issues You Will Confront When Using Third Parties To Build Out Sites
PPTX
Issues You Will Confront When Using Third Parties To Build Out Sites
PPTX
Website Redirects to Reduce Duplicate Content
PPTX
Html 5 a step towards semantic web
PDF
[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch
PDF
Top 10 Tips for Google Tag Manager
PPTX
Data Modeling 101 DDM 101 (PBI).pptx 2024
2014-06-19 - HRSSUG - Getting Started with Office 365
Issues You Will Confront When Using Third Parties To Build Out Sites
Issues You Will Confront When Using Third Parties To Build Out Sites
Website Redirects to Reduce Duplicate Content
Html 5 a step towards semantic web
[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch
Top 10 Tips for Google Tag Manager
Data Modeling 101 DDM 101 (PBI).pptx 2024

Similar to [DataSciCon] Divide, distribute and conquer stream v. batch (20)

PDF
What is Apache Kafka®?
PDF
Distributed caching for your next node.js project cf summit - 06-15-2017
PDF
Architecture of Big Data Solutions
ODP
Managing Creativity
PPTX
GrabCAD Print Announcement
PDF
2020 06-03 cukenfest-bdd-and-sl_os
PDF
Reactive data analysis with vert.x
PPTX
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
ODP
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
PDF
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
PDF
Damag - EmPower your BI Architecture
PDF
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
PDF
Introduction to Stream Processing
PDF
Inextricably linked: reproducibility and productivity in data science and AI
PPTX
Designing for Everyone: Building great web experiences for any device
PDF
Our application got popular and now it breaks
PDF
Our application got popular and now it breaks
PDF
PDF
Milestones, SHUV, Roadmaps - Oh My!
PDF
Crafting an Analytics Strategy
What is Apache Kafka®?
Distributed caching for your next node.js project cf summit - 06-15-2017
Architecture of Big Data Solutions
Managing Creativity
GrabCAD Print Announcement
2020 06-03 cukenfest-bdd-and-sl_os
Reactive data analysis with vert.x
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
Damag - EmPower your BI Architecture
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Introduction to Stream Processing
Inextricably linked: reproducibility and productivity in data science and AI
Designing for Everyone: Building great web experiences for any device
Our application got popular and now it breaks
Our application got popular and now it breaks
Milestones, SHUV, Roadmaps - Oh My!
Crafting an Analytics Strategy

More from Viktor Gamov (14)

PDF
Testing containers with TestContainers @ AJUG 7/18/2017
PDF
[Philly ETE] Java Puzzlers NG
PDF
Распределяй и властвуй — 2: Потоки данных наносят ответный удар
PDF
[JBreak] Блеск И Нищета Распределенных Стримов - 04-04-2017
PDF
[OracleCode - SF] Distributed caching for your next node.js project
PDF
[OracleCode SF] In memory analytics with apache spark and hazelcast
PDF
[Jfokus] Riding the Jet Streams
PPTX
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
PPTX
[Codemash] Caching Made "Bootiful"!
PDF
[JokerConf] Верхом на реактивных стримах, 10/13/2016
PDF
JavaOne 2013: «Java and JavaScript - Shaken, Not Stirred»
PDF
WebSockets: The Current State of the Most Valuable HTML5 API for Java Developers
KEY
Functional UI testing of Adobe Flex RIA
KEY
Testing Flex RIAs for NJ Flex user group
Testing containers with TestContainers @ AJUG 7/18/2017
[Philly ETE] Java Puzzlers NG
Распределяй и властвуй — 2: Потоки данных наносят ответный удар
[JBreak] Блеск И Нищета Распределенных Стримов - 04-04-2017
[OracleCode - SF] Distributed caching for your next node.js project
[OracleCode SF] In memory analytics with apache spark and hazelcast
[Jfokus] Riding the Jet Streams
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
[Codemash] Caching Made "Bootiful"!
[JokerConf] Верхом на реактивных стримах, 10/13/2016
JavaOne 2013: «Java and JavaScript - Shaken, Not Stirred»
WebSockets: The Current State of the Most Valuable HTML5 API for Java Developers
Functional UI testing of Adobe Flex RIA
Testing Flex RIAs for NJ Flex user group

Recently uploaded (20)

PPTX
Beige and Black Minimalist Project Deck Presentation (1).pptx
PDF
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
PPTX
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
PPT
3.Software Design for software engineering
PPTX
Presentation - Summer Internship at Samatrix.io_template_2.pptx
PDF
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
PPTX
Greedy best-first search algorithm always selects the path which appears best...
PDF
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
PDF
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
PDF
Odoo Construction Management System by CandidRoot
PDF
solman-7.0-ehp1-sp21-incident-management
PPTX
Lesson-3-Operation-System-Support.pptx-I
PPTX
Human Computer Interaction lecture Chapter 2.pptx
PDF
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
PDF
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
PPTX
FLIGHT TICKET API | API INTEGRATION PLATFORM
PPTX
AI Tools Revolutionizing Software Development Workflows
PDF
Top 10 Project Management Software for Small Teams in 2025.pdf
PPTX
StacksandQueuesCLASS 12 COMPUTER SCIENCE.pptx
Beige and Black Minimalist Project Deck Presentation (1).pptx
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
3.Software Design for software engineering
Presentation - Summer Internship at Samatrix.io_template_2.pptx
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
Greedy best-first search algorithm always selects the path which appears best...
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
Odoo Construction Management System by CandidRoot
solman-7.0-ehp1-sp21-incident-management
Lesson-3-Operation-System-Support.pptx-I
Human Computer Interaction lecture Chapter 2.pptx
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
Understanding the Need for Systemic Change in Open Source Through Intersectio...
FLIGHT TICKET API | API INTEGRATION PLATFORM
AI Tools Revolutionizing Software Development Workflows
Top 10 Project Management Software for Small Teams in 2025.pdf
StacksandQueuesCLASS 12 COMPUTER SCIENCE.pptx

[DataSciCon] Divide, distribute and conquer stream v. batch