This document discusses modern machine learning pipelines and popular open source tools to build them. It defines key characteristics of ML pipelines like experiment tracking, hyperparameter optimization, distributed execution, and metadata/data versioning. Popular tools covered are KubeFlow for Kubernetes+TensorFlow, Airflow for data and feature engineering, MLflow for experiment tracking, and TensorFlow Extended (TFX) libraries. The document demonstrates these tools and argues that while the field is emerging, simplicity is important and one should only use necessary components of different tools.
Related topics: