Building Modern ML/AI Pipelines
With the Latest Open Source Technologies
KubeFlow + TFX + Airflow + MLflow
Chris Fregly, Founder @ PipelineAI
(Note: Slides Available on Twitter @cfregly)
Who am I?
Founder @ PipelineAI
Real-time Machine Learning and AI in Production
Databricks
Machine Learning Engineer
O’Reilly Author
“High Performance TensorFlow in Production”
Meetup Organizer
Advanced Spark and TensorFlow Meetup (SF)
Netflix
Personalization Engineer
Silicon Valley Venture Capital + Open Source
4,000 Stars = $6,000,000 Seed
$1,500 per GitHub Star
(According to one prominent VC on Sand Hill Road)
Popular Open Source ML Frameworks
TensorFlow / Keras (Google)
PyTorch (Facebook)
MXNet (Amazon)
July 2019 Survey
What is an ML Pipeline?
System 6
System 5System 4
Training
At Scale
System 3
System 1
Data
Ingestion
Data
Analysis
Data
Transform
Data
Validation
System 2
Build
Model
Model
Validation
Serving Logging
Monitoring
Roll-out
Data
Splitting
Ad-Hoc
Training
Distributed
Training
Configure
Complexity Brings Technical Debt
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
What is the Problem?
Infrastructure Complexity
Consistency Between Workflows and Tools
Fragmentation
One-Shot “Train and Forget” vs. Iterative
ML Infrastructure
Docker
Unit of deployment for model training and model serving
1 Docker image per batch job
1 Docker image per model server
Kubernetes
Manages and orchestrates Docker containers
The new “Linux” of the data center
First-class support for batch and cron jobs
Current State: Fun for DevOps, Clunky for Data Scientists
July 2019 Survey
Characteristics of a Modern ML Pipeline
Experiment Tracking
Compare different models or model hyper-parameters
Hyper Parameter Optimization (HPO)
Find the best hyper-parameters for a given model
Distributed, Parallel Execution
Perform independent pipeline components in parallel
Metadata, Experiment Reproduce-ability, Data Versioning
Back-in-time analysis of models and hyper-params on exact dataset
Detect Model Bias / Decay / Drift
Over time, models become crusty and irrelevant
Feature Store
Catalog of pre-engineered machine learning features across the entire organization
July 2019 Survey
Experiment Tracking
Compare Different Models or Model Hyper-Parameters
Send all experiment results to a tracking server for comparison
Offline Experiment Tracking is the Most Common
Batch training jobs
Online Experiment Tracking is an Emerging Field
Compare model performance in live production
A/B tests
Multi-armed bandit tests
Demo: Experiment Tracking
Hyper-Parameter Optimization (HPO)
Search for Best Hyper-Parameters
Naturally Parallelizable and Distributed
Model Parameters != Model Hyper-Parameters
Parameters: What the model learns. è
Hyper-Parameters: How the model learns. è
ie. Neural net architecture, depth of decision tree, learning rate of gradient descent
Demo: Hyper-Parameter Optimization (HPO)
Parallel Coordinate Plot
Distributed, Parallel Execution
Distributed, Parallel Execution
Perform independent pipeline components in parallel
Reduce/aggregate when complete
TensorFlow Supports tf.distribute.HierarchicalCopyAllReduce
Demo: Distributed, Parallel Execution
Detect Model Bias / Decay / Drift
Metadata, Reproduce-ability, Data Versioning
Back-in-time Analysis of Model Experiments, Hyper-Parameters, and Datasets*
(*Requires Data to be Versioned for True Reproduce-ability)
MetadataStore Saves All Inputs/Outputs of all ML Pipeline Components
Individual Component
Feature Store
Store and Share Results of Feature-Engineering in Common Repository
Discover Pre-Engineered Features from Others in the Organization
Examples of Feature Engineering:
Remove data with `null` or `0` column
Cross age and income to create a new feature
Gender-balanced samples
Current State: Multiple Startups are Actively Working on This Problem
Popular ML Pipeline Tools
KubeFlow
Kubernetes + TensorFlow
Uses ModelDB open source project for experiment tracking and hyper-parameter tuning
Airflow
Initial focus is data engineering, emerging for feature engineering and ML / AI pipelines
MLflow
From-scratch implementation focused on experiment tracking & hyper-parameter tuning
TensorFlow Extended (TFX)
Set of TensorFlow-specific libraries to be combined with KubeFlow, Airflow, or Mlflow
Pachyderm
Initial focus is data versioning, emerging for ML / AI pipelines
KubeFlow
Created by Google for “Kubernetes + TensorFlow”
Many Existing OSS Projects Cobbled Together into 1 Tool
Attempts to Create a Standard Language for ML Pipeline Definition
Currently 0.6 Release
Clunky For Now, but Getting Better
Demo: KubeFlow Pipelines
Airflow
Created by Airbnb
Originally Developed for Data Engineering
Re-Purposed for Feature Engineering and ML Pipelines
Demo: Airflow Pipelines
MLflow
Demo: MLflow Experiment Tracking
TensorFlow Extended (TFX)
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evalute
Model
Deploy
Reproduce
Training
Demo: TensorFlow Extended (TFX) Libraries
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evalute
Model
Deploy
Reproduce
Training
Summary
ML Pipelines are a Hot Topic!
Multiple Tools and Multiple Standards Emerging
Multiple Companies including Google, Facebook, Pinterest, Airbnb, Databricks, etc…
We’re Still a Few Years from Mainstream Online Experiment Tracking and Optimization
Suggestion: Stay Simple and Use Only What You Need
Thank You!
Chris Fregly, Founder @ PipelineAI
Free Community Edition
https://community.pipeline.ai
Contact Me
chris@pipeline.ai
@cfregly

More Related Content

PPTX
MLOps in action
PDF
Seamless MLOps with Seldon and MLflow
PDF
Data ops: Machine Learning in production
PDF
Ml infra at an early stage
PDF
Monitoring AI with AI
PPTX
Machine Learning In Production
PDF
Machine learning model to production
PDF
Managers guide to effective building of machine learning products
MLOps in action
Seamless MLOps with Seldon and MLflow
Data ops: Machine Learning in production
Ml infra at an early stage
Monitoring AI with AI
Machine Learning In Production
Machine learning model to production
Managers guide to effective building of machine learning products

What's hot (20)

PDF
“Houston, we have a model...” Introduction to MLOps
PPTX
MLOps and Data Quality: Deploying Reliable ML Models in Production
PPTX
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
PDF
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
PDF
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
PPTX
Production machine learning_infrastructure
PDF
The Quest for an Open Source Data Science Platform
PDF
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
PDF
Reproducible AI using MLflow and PyTorch
PDF
The A-Z of Data: Introduction to MLOps
PPTX
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
PPTX
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
PDF
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
PDF
mlflow: Accelerating the End-to-End ML lifecycle
PDF
ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
PDF
Serverless machine learning operations
PPTX
Machine Learning with Apache Spark
PDF
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
PDF
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
PDF
Machine Learning Pipelines
“Houston, we have a model...” Introduction to MLOps
MLOps and Data Quality: Deploying Reliable ML Models in Production
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Production machine learning_infrastructure
The Quest for an Open Source Data Science Platform
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Reproducible AI using MLflow and PyTorch
The A-Z of Data: Introduction to MLOps
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
mlflow: Accelerating the End-to-End ML lifecycle
ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
Serverless machine learning operations
Machine Learning with Apache Spark
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Machine Learning Pipelines
Ad

Similar to AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeFlow, Airflow, and MLflow (20)

PDF
Hopsworks at Google AI Huddle, Sunnyvale
PDF
Use MLflow to manage and deploy Machine Learning model on Spark
PPTX
03_aiops-1.pptx
PDF
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
PDF
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PDF
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PDF
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
PDF
MLOps pipelines using MLFlow - From training to production
PDF
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PDF
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PDF
Sysml 2019 demo_paper
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
PDF
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
PDF
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
PPTX
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
PPTX
Open, Secure & Transparent AI Pipelines
PDF
MLflow with Databricks
PDF
Mlflow with databricks
PDF
A survey on Machine Learning In Production (July 2018)
Hopsworks at Google AI Huddle, Sunnyvale
Use MLflow to manage and deploy Machine Learning model on Spark
03_aiops-1.pptx
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PyData Berlin 2023 - Mythical ML Pipeline.pdf
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
MLOps pipelines using MLFlow - From training to production
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
Sysml 2019 demo_paper
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Open, Secure & Transparent AI Pipelines
MLflow with Databricks
Mlflow with databricks
A survey on Machine Learning In Production (July 2018)
Ad

More from Bill Liu (20)

PDF
Walk Through a Real World ML Production Project
PDF
Redefining MLOps with Model Deployment, Management and Observability in Produ...
PDF
Productizing Machine Learning at the Edge
PPTX
Transformers in Vision: From Zero to Hero
PDF
Deep AutoViML For Tensorflow Models and MLOps Workflows
PDF
Metaflow: The ML Infrastructure at Netflix
PDF
Practical Crowdsourcing for ML at Scale
PDF
Building large scale transactional data lake using apache hudi
PDF
Deep Reinforcement Learning and Its Applications
PDF
Big Data and AI in Fighting Against COVID-19
PDF
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
PDF
Build computer vision models to perform object detection and classification w...
PDF
Causal Inference in Data Science and Machine Learning
PDF
Weekly #106: Deep Learning on Mobile
PDF
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
PDF
AISF19 - On Blending Machine Learning with Microeconomics
PDF
AISF19 - Travel in the AI-First World
PDF
AISF19 - Unleash Computer Vision at the Edge
PDF
Toronto meetup 20190917
PPTX
Feature Engineering for NLP
Walk Through a Real World ML Production Project
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Productizing Machine Learning at the Edge
Transformers in Vision: From Zero to Hero
Deep AutoViML For Tensorflow Models and MLOps Workflows
Metaflow: The ML Infrastructure at Netflix
Practical Crowdsourcing for ML at Scale
Building large scale transactional data lake using apache hudi
Deep Reinforcement Learning and Its Applications
Big Data and AI in Fighting Against COVID-19
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Build computer vision models to perform object detection and classification w...
Causal Inference in Data Science and Machine Learning
Weekly #106: Deep Learning on Mobile
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - Travel in the AI-First World
AISF19 - Unleash Computer Vision at the Edge
Toronto meetup 20190917
Feature Engineering for NLP

Recently uploaded (20)

PDF
substrate PowerPoint Presentation basic one
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PDF
Build Real-Time ML Apps with Python, Feast & NoSQL
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
PPTX
Build automations faster and more reliably with UiPath ScreenPlay
PDF
SaaS reusability assessment using machine learning techniques
PDF
Altius execution marketplace concept.pdf
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
PDF
LMS bot: enhanced learning management systems for improved student learning e...
substrate PowerPoint Presentation basic one
Co-training pseudo-labeling for text classification with support vector machi...
Build Real-Time ML Apps with Python, Feast & NoSQL
Module 1 Introduction to Web Programming .pptx
Introduction to MCP and A2A Protocols: Enabling Agent Communication
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
SGT Report The Beast Plan and Cyberphysical Systems of Control
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
Build automations faster and more reliably with UiPath ScreenPlay
SaaS reusability assessment using machine learning techniques
Altius execution marketplace concept.pdf
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Rapid Prototyping: A lecture on prototyping techniques for interface design
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
EIS-Webinar-Regulated-Industries-2025-08.pdf
LMS bot: enhanced learning management systems for improved student learning e...

AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeFlow, Airflow, and MLflow

  • 1. Building Modern ML/AI Pipelines With the Latest Open Source Technologies KubeFlow + TFX + Airflow + MLflow Chris Fregly, Founder @ PipelineAI (Note: Slides Available on Twitter @cfregly)
  • 2. Who am I? Founder @ PipelineAI Real-time Machine Learning and AI in Production Databricks Machine Learning Engineer O’Reilly Author “High Performance TensorFlow in Production” Meetup Organizer Advanced Spark and TensorFlow Meetup (SF) Netflix Personalization Engineer
  • 3. Silicon Valley Venture Capital + Open Source 4,000 Stars = $6,000,000 Seed $1,500 per GitHub Star (According to one prominent VC on Sand Hill Road)
  • 4. Popular Open Source ML Frameworks TensorFlow / Keras (Google) PyTorch (Facebook) MXNet (Amazon) July 2019 Survey
  • 5. What is an ML Pipeline? System 6 System 5System 4 Training At Scale System 3 System 1 Data Ingestion Data Analysis Data Transform Data Validation System 2 Build Model Model Validation Serving Logging Monitoring Roll-out Data Splitting Ad-Hoc Training Distributed Training Configure
  • 6. Complexity Brings Technical Debt https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
  • 7. What is the Problem? Infrastructure Complexity Consistency Between Workflows and Tools Fragmentation One-Shot “Train and Forget” vs. Iterative
  • 8. ML Infrastructure Docker Unit of deployment for model training and model serving 1 Docker image per batch job 1 Docker image per model server Kubernetes Manages and orchestrates Docker containers The new “Linux” of the data center First-class support for batch and cron jobs Current State: Fun for DevOps, Clunky for Data Scientists July 2019 Survey
  • 9. Characteristics of a Modern ML Pipeline Experiment Tracking Compare different models or model hyper-parameters Hyper Parameter Optimization (HPO) Find the best hyper-parameters for a given model Distributed, Parallel Execution Perform independent pipeline components in parallel Metadata, Experiment Reproduce-ability, Data Versioning Back-in-time analysis of models and hyper-params on exact dataset Detect Model Bias / Decay / Drift Over time, models become crusty and irrelevant Feature Store Catalog of pre-engineered machine learning features across the entire organization July 2019 Survey
  • 10. Experiment Tracking Compare Different Models or Model Hyper-Parameters Send all experiment results to a tracking server for comparison Offline Experiment Tracking is the Most Common Batch training jobs Online Experiment Tracking is an Emerging Field Compare model performance in live production A/B tests Multi-armed bandit tests
  • 12. Hyper-Parameter Optimization (HPO) Search for Best Hyper-Parameters Naturally Parallelizable and Distributed Model Parameters != Model Hyper-Parameters Parameters: What the model learns. è Hyper-Parameters: How the model learns. è ie. Neural net architecture, depth of decision tree, learning rate of gradient descent
  • 13. Demo: Hyper-Parameter Optimization (HPO) Parallel Coordinate Plot
  • 14. Distributed, Parallel Execution Distributed, Parallel Execution Perform independent pipeline components in parallel Reduce/aggregate when complete TensorFlow Supports tf.distribute.HierarchicalCopyAllReduce
  • 16. Detect Model Bias / Decay / Drift
  • 17. Metadata, Reproduce-ability, Data Versioning Back-in-time Analysis of Model Experiments, Hyper-Parameters, and Datasets* (*Requires Data to be Versioned for True Reproduce-ability) MetadataStore Saves All Inputs/Outputs of all ML Pipeline Components Individual Component
  • 18. Feature Store Store and Share Results of Feature-Engineering in Common Repository Discover Pre-Engineered Features from Others in the Organization Examples of Feature Engineering: Remove data with `null` or `0` column Cross age and income to create a new feature Gender-balanced samples Current State: Multiple Startups are Actively Working on This Problem
  • 19. Popular ML Pipeline Tools KubeFlow Kubernetes + TensorFlow Uses ModelDB open source project for experiment tracking and hyper-parameter tuning Airflow Initial focus is data engineering, emerging for feature engineering and ML / AI pipelines MLflow From-scratch implementation focused on experiment tracking & hyper-parameter tuning TensorFlow Extended (TFX) Set of TensorFlow-specific libraries to be combined with KubeFlow, Airflow, or Mlflow Pachyderm Initial focus is data versioning, emerging for ML / AI pipelines
  • 20. KubeFlow Created by Google for “Kubernetes + TensorFlow” Many Existing OSS Projects Cobbled Together into 1 Tool Attempts to Create a Standard Language for ML Pipeline Definition Currently 0.6 Release Clunky For Now, but Getting Better
  • 22. Airflow Created by Airbnb Originally Developed for Data Engineering Re-Purposed for Feature Engineering and ML Pipelines
  • 26. TensorFlow Extended (TFX) Feature Load Feature Analyze Feature Transform Model Train Model Evalute Model Deploy Reproduce Training
  • 27. Demo: TensorFlow Extended (TFX) Libraries Feature Load Feature Analyze Feature Transform Model Train Model Evalute Model Deploy Reproduce Training
  • 28. Summary ML Pipelines are a Hot Topic! Multiple Tools and Multiple Standards Emerging Multiple Companies including Google, Facebook, Pinterest, Airbnb, Databricks, etc… We’re Still a Few Years from Mainstream Online Experiment Tracking and Optimization Suggestion: Stay Simple and Use Only What You Need
  • 29. Thank You! Chris Fregly, Founder @ PipelineAI Free Community Edition https://community.pipeline.ai Contact Me [email protected] @cfregly