10
Most read
11
Most read
14
Most read
Optimizing ML Data Access with Alluxio
Preprocessing, Pretraining, & Inference at
Scale
Bin Fan
Founding Engineer, VP of Technology @ Alluxio
March 6th 2025
About Me
2
Bin Fan
○ Founding Engineer, VP of Technology @ Alluxio
○ Email: binfan@alluxio.com
○ Linkedin: https://www.linkedin.com/in/bin-fan/
○ Previously worked in Google
○ PhD in CS at Carnegie Mellon University
Powered by Alluxio
Zhihu
TELCO & MEDIA
E-COMMERCE
FINANCIAL SERVICES
TECH & INTERNET
OTHERS
4
Alluxio Data Platform
Accelerate data-intensive AI & Analytics workloads
Pretraining
DeepSeek: Redefining Open-Source LLMs
● Performance on Par with SOTA Models Like GPT-4 at a fraction of the cost
● Disrupting the Competitive Landscape
○ Expanding accessibility to much broader audiences
○ Higher bar for upcoming general-purpose LLMs
○ Potentially more possibility on LLMs with private domain adaptation
● A key lesson: great LLMs can be created by small teams with extremely
efficient resource utilization
Engineering/Resource Efficiency in Pre-training
Data Lake (All Data)
us-east-1
Training
Distributed Cache
(Alluxio)
…
Fast Access with
Only Hot Data Cached
Only retrieve
Data on Demand
Distributed Cache
(Alluxio)
us-west-1
Training
● High and consistent I/O performance
→ Comparable I/O performance to HPC storage
● Cloud agnostic
→ Easy to extend the prod env to multi-region/cloud
● Transparent Cache Mgmt
→ Avoid repeatedly preparing (same) data, and the
overhead to maintain local storage
Inference
LLM Inference: Two Key Metrics
Throughput (System Perspective)
● Measures tokens / sec
● Higher throughput → Better resource utilization, lower system cost
First-time to token (User Perspective)
● Measures time from request submission to the first token generation
● < 100ms → Smooth user experience
GPU Memory capacity: Primary Bottleneck
● VRAM is needed for Model Weight & KV-cache
● A typical 13B model inference on A100
● GPT-3 (175B) requires 350GB GPU RAM to load
model weights.
● Large KV-cache is needed for longer context
windows
KV Cache Offloading
● A critical optimization for speeding up Transformer models
○ Significantly speeding up text generation by reusing previous context instead of recalculating
attention for all tokens at each step.
○ Example KV Cache systems :
■ LMCache (vLLM Production Stack), MoonCake, etc
● Experimenting Alluxio as a Tiered KV cache
○ Talk to me if you are interested in this
Mooncake
Deepseek 3FS
DeepSeek 3FS: High-Performance Parallel Filesystem
● Newly Open-Source Parallel Filesystem by DeepSeek
○ Purpose-Built for RDMA + NVMe hardware
○ Powered by FoundationDB Scalable metadata
○ Achieves 40GB/s per node throughput (8TB/s with 180 nodes)
● Optimized for High-Throughput Workloads
○ Focused on large file read/write performance (not for general-purpose use)
○ Recommended using FFRecord format for efficient small file aggregation
Complementary Technologies
● 3FS: Modern Parallel Filesystem (Similar to GPFS, Lustre)
○ Optimized for I/O-intensive workloads with RDMA + NVMe
● Alluxio: Distributed Caching & Access Layer
○ Bridges Compute & Data Lakes, accelerating I/O workloads
○ Achieves RDMA-comparable read speeds with intelligent caching
○ Provides namespace abstraction & indirection for S3, HDFS, GCP, and more → Cloud-agnostic I/O
● Alluxio can integrate with 3FS, just like S3 or HDFS
○ Enables high-mid-low tiered I/O solutions, allowing applications to optimize performance and cost
<<Scan code to register

More Related Content

PDF
AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
Accelerate Cloud Training with Alluxio
PDF
Spark Summit EU talk by Jiri Simsa
PDF
Spark Summit EU talk by Jiri Simsa
PDF
Alluxio: Unify Data at Memory Speed; 2016-11-18
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
Accelerate Cloud Training with Alluxio
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
Alluxio: Unify Data at Memory Speed; 2016-11-18
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...

Similar to AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, Pretraining, & Inference at Scale (20)

PDF
Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...
PDF
Unify Data at Memory Speed
PDF
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
PDF
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
PDF
Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...
PDF
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
PDF
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
PDF
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
PDF
Best Practices for Using Alluxio with Spark
PDF
Accelerating Cloud Training With Alluxio
PDF
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
PDF
Flexible and Fast Storage for Deep Learning with Alluxio
PDF
Alluxio @ Uber Seattle Meetup
PDF
Alluxio Monthly Webinar - Accelerate AI Path to Production
PDF
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
PDF
Best Practice in Accelerating Data Applications with Spark+Alluxio
PDF
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
PPTX
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...
Unify Data at Memory Speed
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
From Data Preparation to Inference: How Alluxio Speeds Up AI
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Best Practices for Using Alluxio with Spark
Accelerating Cloud Training With Alluxio
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Flexible and Fast Storage for Deep Learning with Alluxio
Alluxio @ Uber Seattle Meetup
Alluxio Monthly Webinar - Accelerate AI Path to Production
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Best Practice in Accelerating Data Applications with Spark+Alluxio
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Ad

More from Alluxio, Inc. (20)

PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
PDF
AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...
Ad

Recently uploaded (20)

PDF
How to Write Automated Test Scripts Using Selenium.pdf
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
PDF
C language slides for c programming book by ANSI
PPTX
ESDS_SAP Application Cloud Offerings.pptx
PPTX
SAP Business AI_L1 Overview_EXTERNAL.pptx
PDF
Top AI Tools for Project Managers: My 2025 AI Stack
PPTX
Phoenix Marketo User Group: Building Nurtures that Work for Your Audience. An...
PDF
10 Mistakes Agile Project Managers Still Make
PPTX
Advanced Heap Dump Analysis Techniques Webinar Deck
PDF
Mobile App for Guard Tour and Reporting.pdf
PDF
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
PDF
Science is Not Enough SPLC2009 Richard P. Gabriel
PPTX
Presentation - Summer Internship at Samatrix.io_template_2.pptx
PDF
Multiverse AI Review 2025_ The Ultimate All-in-One AI Platform.pdf
PPTX
oracle_ebs_12.2_project_cutoveroutage.pptx
PPTX
UNIT II: Software design, software .pptx
PDF
IDM Crack Activation Key 2025 Free Download
PPTX
MCP empowers AI Agents from Zero to Production
PDF
MaterialX Virtual Town Hall - August 2025
PDF
OpenTimelineIO Virtual Town Hall - August 2025
How to Write Automated Test Scripts Using Selenium.pdf
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
C language slides for c programming book by ANSI
ESDS_SAP Application Cloud Offerings.pptx
SAP Business AI_L1 Overview_EXTERNAL.pptx
Top AI Tools for Project Managers: My 2025 AI Stack
Phoenix Marketo User Group: Building Nurtures that Work for Your Audience. An...
10 Mistakes Agile Project Managers Still Make
Advanced Heap Dump Analysis Techniques Webinar Deck
Mobile App for Guard Tour and Reporting.pdf
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
Science is Not Enough SPLC2009 Richard P. Gabriel
Presentation - Summer Internship at Samatrix.io_template_2.pptx
Multiverse AI Review 2025_ The Ultimate All-in-One AI Platform.pdf
oracle_ebs_12.2_project_cutoveroutage.pptx
UNIT II: Software design, software .pptx
IDM Crack Activation Key 2025 Free Download
MCP empowers AI Agents from Zero to Production
MaterialX Virtual Town Hall - August 2025
OpenTimelineIO Virtual Town Hall - August 2025

AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, Pretraining, & Inference at Scale

  • 1. Optimizing ML Data Access with Alluxio Preprocessing, Pretraining, & Inference at Scale Bin Fan Founding Engineer, VP of Technology @ Alluxio March 6th 2025
  • 2. About Me 2 Bin Fan ○ Founding Engineer, VP of Technology @ Alluxio ○ Email: [email protected] ○ Linkedin: https://www.linkedin.com/in/bin-fan/ ○ Previously worked in Google ○ PhD in CS at Carnegie Mellon University
  • 3. Powered by Alluxio Zhihu TELCO & MEDIA E-COMMERCE FINANCIAL SERVICES TECH & INTERNET OTHERS
  • 4. 4 Alluxio Data Platform Accelerate data-intensive AI & Analytics workloads
  • 6. DeepSeek: Redefining Open-Source LLMs ● Performance on Par with SOTA Models Like GPT-4 at a fraction of the cost ● Disrupting the Competitive Landscape ○ Expanding accessibility to much broader audiences ○ Higher bar for upcoming general-purpose LLMs ○ Potentially more possibility on LLMs with private domain adaptation ● A key lesson: great LLMs can be created by small teams with extremely efficient resource utilization
  • 7. Engineering/Resource Efficiency in Pre-training Data Lake (All Data) us-east-1 Training Distributed Cache (Alluxio) … Fast Access with Only Hot Data Cached Only retrieve Data on Demand Distributed Cache (Alluxio) us-west-1 Training ● High and consistent I/O performance → Comparable I/O performance to HPC storage ● Cloud agnostic → Easy to extend the prod env to multi-region/cloud ● Transparent Cache Mgmt → Avoid repeatedly preparing (same) data, and the overhead to maintain local storage
  • 9. LLM Inference: Two Key Metrics Throughput (System Perspective) ● Measures tokens / sec ● Higher throughput → Better resource utilization, lower system cost First-time to token (User Perspective) ● Measures time from request submission to the first token generation ● < 100ms → Smooth user experience
  • 10. GPU Memory capacity: Primary Bottleneck ● VRAM is needed for Model Weight & KV-cache ● A typical 13B model inference on A100 ● GPT-3 (175B) requires 350GB GPU RAM to load model weights. ● Large KV-cache is needed for longer context windows
  • 11. KV Cache Offloading ● A critical optimization for speeding up Transformer models ○ Significantly speeding up text generation by reusing previous context instead of recalculating attention for all tokens at each step. ○ Example KV Cache systems : ■ LMCache (vLLM Production Stack), MoonCake, etc ● Experimenting Alluxio as a Tiered KV cache ○ Talk to me if you are interested in this Mooncake
  • 13. DeepSeek 3FS: High-Performance Parallel Filesystem ● Newly Open-Source Parallel Filesystem by DeepSeek ○ Purpose-Built for RDMA + NVMe hardware ○ Powered by FoundationDB Scalable metadata ○ Achieves 40GB/s per node throughput (8TB/s with 180 nodes) ● Optimized for High-Throughput Workloads ○ Focused on large file read/write performance (not for general-purpose use) ○ Recommended using FFRecord format for efficient small file aggregation
  • 14. Complementary Technologies ● 3FS: Modern Parallel Filesystem (Similar to GPFS, Lustre) ○ Optimized for I/O-intensive workloads with RDMA + NVMe ● Alluxio: Distributed Caching & Access Layer ○ Bridges Compute & Data Lakes, accelerating I/O workloads ○ Achieves RDMA-comparable read speeds with intelligent caching ○ Provides namespace abstraction & indirection for S3, HDFS, GCP, and more → Cloud-agnostic I/O ● Alluxio can integrate with 3FS, just like S3 or HDFS ○ Enables high-mid-low tiered I/O solutions, allowing applications to optimize performance and cost
  • 15. <<Scan code to register