10 Lessons Learned from 
building ML systems 
Xavier Amatriain - Director Algorithms Engineering 
November 2014
Machine Learning 
@ Netflix
10 Lessons Learned from Building Machine Learning Systems
Netflix Scale 
▪ > 50M members 
▪ > 40 countries 
▪ > 1000 device types 
▪ > 7B hours in Q2 2014 
▪ Plays: > 70M/day 
▪ Searches: > 4M/day 
▪ Ratings: > 6M/day 
▪ Log 100B events/day 
▪ 31.62% of peak US downstream 
traffic
Smart Models ■ Regression models (Logistic, 
Linear, Elastic nets) 
■ GBDT/RF 
■ SVD & other MF models 
■ Factorization Machines 
■ Restricted Boltzmann Machines 
■ Markov Chains & other graphical 
models 
■ Clustering (from k-means to 
HDP) 
■ Deep ANN 
■ LDA 
■ Association Rules 
■ …
10 Lessons
1. More data vs. & Better Models
More data or better models? 
Really? 
Anand Rajaraman: Former Stanford Prof. & 
Senior VP at Walmart
More data or better models? 
Sometimes, it’s not 
about more data
More data or better models? 
[Banko and Brill, 2001] 
Norvig: “Google does not 
have better Algorithms, 
only more Data” 
Many features/ 
low-bias models
More data or better models? 
Sometimes, it’s not 
about more data
2. You might not need all your 
Big Data
How useful is Big Data 
■ “Everybody” has Big Data 
■ But not everybody needs it 
■ E.g. Do you need many millions of users if the goal is to 
compute a MF of, say, 100 factors? 
■ Many times, doing some kind of smart (e.g. stratified) 
sampling can produce as good or even better results as 
using it all
3. The fact that a more complex 
model does not improve things 
does not mean you don’t need one
Better Models and features that “don’t work” 
■ Imagine the following scenario: 
■ You have a linear model and for some time you have been 
selecting and optimizing features for that model 
■ If you try a more complex (e.g. non-linear) model with the same 
features you are not likely to see any improvement 
■ If you try to add more expressive features, the existing model is 
likely not to capture them and you are not likely to see any 
improvement
Better Models and features that “don’t work” 
■ More complex features may require 
a more complex model 
■ A more complex model may not 
show improvements with a feature 
set that is too simple
4. Be thoughtful about your 
training data
Defining training/testing data 
■ Imagine you are training a simple binary 
classifier 
■ Defining positive and negative labels -> Non-trivial 
task 
■ E.g. Is this a positive or a negative? 
■ User watches a movie to completion and rates it 1 
star 
■ User watches the same movie again (maybe 
because she can’t find anything else) 
■ User abandons movie after 5 minutes, or 15 
minutes… or 1 hour 
■ User abandons TV show after 2 episodes, or 10 
episode… or 1 season 
■ User adds something to her list but never watches it
Other training data issues: Time traveling 
■ Time traveling: usage of features that originated after 
the event you are trying to predict 
■ E.g. Your rating a movie is a pretty good prediction of you 
watching that movie, especially because most ratings happen 
AFTER you watch the movie 
■ It can get tricky when you have many features that relate to 
each other 
■ Whenever we see an offline experiment with huge wins, the 
first question we ask ourselves is: “Is there time traveling?”
5. Learn to deal with (The curse 
of) Presentation Bias
2D Navigational Modeling 
More likely 
to see 
Less likely
The curse of presentation bias 
■ User can only click on what you decide to show 
■ But, what you decide to show is the result of what your model 
predicted is good 
■ Simply treating things you show as negatives is not 
likely to work 
■ Better options 
■ Correcting for the probability a user will click on a position -> 
Attention models 
■ Explore/exploit approaches such as MAB
6. The UI is the algorithm’s only 
communication channel with that 
which matters most: the users
UI->Algorithm->UI 
■ The UI generates the user 
feedback that we will input into the 
algorithms 
■ The UI is also where the results of 
our algorithms will be shown 
■ A change in the UI might require a 
change in algorithms and 
viceversa
7. Data and Models are great. You 
know what’s even better? The 
right evaluation approach
Offline/Online testing process
Executing A/B tests 
Measure differences in metrics across statistically 
identical populations that each experience a different 
algorithm. 
■ Decisions on the product always data-driven 
■ Overall Evaluation Criteria (OEC) = member retention 
■ Use long-term metrics whenever possible 
■ Short-term metrics can be informative and allow faster 
decisions 
■ But, not always aligned with OEC
Offline testing 
■ Measure model performance, using 
(IR) metrics 
■ Offline performance used as an 
indication to make informed 
decisions on follow-up A/B tests 
■ A critical (and mostly unsolved) 
issue is how offline metrics can 
correlate with A/B test results. 
Problem 
Data 
Metrics 
Algorithm Model
8. Distributing algorithms is 
important, but knowing at what 
level to do it is even more 
important
The three levels of Distribution/Parallelization 
1. For each subset of the population (e.g. 
region) 
2. For each combination of the 
hyperparameters 
3. For each subset of the training data 
Each level has different requirements
ANN Training over GPUS and AWS
9. It pays off to be smart about 
choosing your hyperparameters
Hyperparameter optimization 
■ Automate hyperparameter 
optimization by choosing the right 
metric. 
■ But, is it enough to choose the value 
that maximizes the metric? 
■ E.g. Is a regularization lambda of 0 
better than a lambda = 1000 that 
decreases your metric by only 1%? 
■ Also, think about using Bayesian 
Optimization (Gaussian Processes) 
instead of grid search
10. There are things you can do 
offline and there are things you 
can’t… and there is nearline for 
everything in between
System Overview 
▪ Blueprint for multiple 
personalization algorithm 
services 
▪ Ranking 
▪ Row selection 
▪ Ratings 
▪ … 
▪ Recommendation involving 
multi-layered Machine 
Learning
Matrix Factorization Example
Conclusions 
1. Choose the right metric 
2. Be thoughtful about your data 
3. Understand dependencies 
between data and models 
4. Optimize only what matters
Xavier Amatriain (@xamat) 
xavier@netflix.com 
Thanks! 
(and yes, we are hiring)

More Related Content

PDF
Lessons learned from building practical deep learning systems
PPTX
Top 5 Deep Learning and AI Stories - October 6, 2017
PDF
Real World End to End machine Learning Pipeline
PDF
Data Science, Machine Learning and Neural Networks
PDF
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
PPTX
A Gentle Introduction to AI, ML and DL
PDF
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
PDF
Marketplace in motion - AdKDD keynote - 2020
Lessons learned from building practical deep learning systems
Top 5 Deep Learning and AI Stories - October 6, 2017
Real World End to End machine Learning Pipeline
Data Science, Machine Learning and Neural Networks
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
A Gentle Introduction to AI, ML and DL
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Marketplace in motion - AdKDD keynote - 2020

What's hot (20)

PPTX
Recommendation at Netflix Scale
PPTX
Netflix talk at ML Platform meetup Sep 2019
PDF
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
PDF
Overview of recommender system
PDF
Deep Learning for Recommender Systems
PDF
Landscape of AI/ML in 2023
PDF
Past, Present & Future of Recommender Systems: An Industry Perspective
PDF
Context Aware Recommendations at Netflix
PDF
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
PDF
Calibrated Recommendations
PDF
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
PPTX
Introduction to Machine Learning
PDF
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
PPTX
Learning a Personalized Homepage
PDF
Making Netflix Machine Learning Algorithms Reliable
PDF
Personalizing "The Netflix Experience" with Deep Learning
PDF
Industrial Machine Learning (SIGKDD17)
PDF
Recommendation System Explained
PDF
GAN - Theory and Applications
PDF
Intro to LLMs
Recommendation at Netflix Scale
Netflix talk at ML Platform meetup Sep 2019
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Overview of recommender system
Deep Learning for Recommender Systems
Landscape of AI/ML in 2023
Past, Present & Future of Recommender Systems: An Industry Perspective
Context Aware Recommendations at Netflix
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Calibrated Recommendations
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
Introduction to Machine Learning
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Learning a Personalized Homepage
Making Netflix Machine Learning Algorithms Reliable
Personalizing "The Netflix Experience" with Deep Learning
Industrial Machine Learning (SIGKDD17)
Recommendation System Explained
GAN - Theory and Applications
Intro to LLMs
Ad

Viewers also liked (20)

PDF
Data By The People, For The People
PDF
A Statistician's View on Big Data and Data Science (Version 1)
PPTX
Hadoop and Machine Learning
PDF
Hands-on Deep Learning in Python
PDF
How to Interview a Data Scientist
PDF
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
PDF
A tutorial on deep learning at icml 2013
PDF
How to Become a Data Scientist
PPTX
Deep Learning for Natural Language Processing
PDF
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
PDF
Introduction to Mahout and Machine Learning
PDF
Machine Learning and Data Mining: 12 Classification Rules
PDF
Myths and Mathemagical Superpowers of Data Scientists
PPTX
Tutorial on Deep learning and Applications
PDF
Tips for data science competitions
PPTX
Deep neural networks
PPTX
Introduction to Big Data/Machine Learning
PPTX
Artificial neural network
PPTX
10 R Packages to Win Kaggle Competitions
PPTX
Artificial Intelligence Presentation
Data By The People, For The People
A Statistician's View on Big Data and Data Science (Version 1)
Hadoop and Machine Learning
Hands-on Deep Learning in Python
How to Interview a Data Scientist
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
A tutorial on deep learning at icml 2013
How to Become a Data Scientist
Deep Learning for Natural Language Processing
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Introduction to Mahout and Machine Learning
Machine Learning and Data Mining: 12 Classification Rules
Myths and Mathemagical Superpowers of Data Scientists
Tutorial on Deep learning and Applications
Tips for data science competitions
Deep neural networks
Introduction to Big Data/Machine Learning
Artificial neural network
10 R Packages to Win Kaggle Competitions
Artificial Intelligence Presentation
Ad

Similar to 10 Lessons Learned from Building Machine Learning Systems (20)

PDF
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
PDF
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
PPTX
Lessons Learned from Building Machine Learning Software at Netflix
PDF
Demystifying ML/AI
PDF
Barcelona ML Meetup - Lessons Learned
PDF
Machine learning systems for engineers
PDF
Choosing a Machine Learning technique to solve your need
PDF
Cikm 2013 - Beyond Data From User Information to Business Value
PPTX
introduction to machine learning
PDF
Live predictions with schemaless data at scale. MLMU Kosice, Exponea
PDF
Executive Briefing: Why managing machines is harder than you think
PDF
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
PDF
10 more lessons learned from building Machine Learning systems - MLConf
PDF
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
PDF
10 more lessons learned from building Machine Learning systems
PDF
Managing machine learning
PDF
Lessons learned from Large Scale Real World Recommender Systems
PDF
Machine Learning in Production
PDF
Machine learning at b.e.s.t. summer university
PPTX
Ml2 production
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
Lessons Learned from Building Machine Learning Software at Netflix
Demystifying ML/AI
Barcelona ML Meetup - Lessons Learned
Machine learning systems for engineers
Choosing a Machine Learning technique to solve your need
Cikm 2013 - Beyond Data From User Information to Business Value
introduction to machine learning
Live predictions with schemaless data at scale. MLMU Kosice, Exponea
Executive Briefing: Why managing machines is harder than you think
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
10 more lessons learned from building Machine Learning systems - MLConf
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
10 more lessons learned from building Machine Learning systems
Managing machine learning
Lessons learned from Large Scale Real World Recommender Systems
Machine Learning in Production
Machine learning at b.e.s.t. summer university
Ml2 production

More from Xavier Amatriain (20)

PDF
Data/AI driven product development: from video streaming to telehealth
PDF
AI-driven product innovation: from Recommender Systems to COVID-19
PDF
AI for COVID-19 - Q42020 update
PDF
AI for COVID-19: An online virtual care approach
PDF
AI for healthcare: Scaling Access and Quality of Care for Everyone
PDF
Towards online universal quality healthcare through AI
PDF
From one to zero: Going smaller as a growth strategy
PDF
Learning to speak medicine
PDF
ML to cure the world
PDF
Recommender Systems In Industry
PDF
Medical advice as a Recommender System
PDF
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
PDF
Past present and future of Recommender Systems: an Industry Perspective
PDF
Staying Shallow & Lean in a Deep Learning World
PDF
Machine Learning for Q&A Sites: The Quora Example
PDF
Past, present, and future of Recommender Systems: an industry perspective
PDF
Machine Learning to Grow the World's Knowledge
PDF
MLConf Seattle 2015 - ML@Quora
PDF
Lean DevOps - Lessons Learned from Innovation-driven Companies
PDF
Recsys 2014 Tutorial - The Recommender Problem Revisited
Data/AI driven product development: from video streaming to telehealth
AI-driven product innovation: from Recommender Systems to COVID-19
AI for COVID-19 - Q42020 update
AI for COVID-19: An online virtual care approach
AI for healthcare: Scaling Access and Quality of Care for Everyone
Towards online universal quality healthcare through AI
From one to zero: Going smaller as a growth strategy
Learning to speak medicine
ML to cure the world
Recommender Systems In Industry
Medical advice as a Recommender System
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Past present and future of Recommender Systems: an Industry Perspective
Staying Shallow & Lean in a Deep Learning World
Machine Learning for Q&A Sites: The Quora Example
Past, present, and future of Recommender Systems: an industry perspective
Machine Learning to Grow the World's Knowledge
MLConf Seattle 2015 - ML@Quora
Lean DevOps - Lessons Learned from Innovation-driven Companies
Recsys 2014 Tutorial - The Recommender Problem Revisited

Recently uploaded (20)

PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PDF
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
Statistics on Ai - sourced from AIPRM.pdf
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PPTX
future_of_ai_comprehensive_20250822032121.pptx
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PPTX
Configure Apache Mutual Authentication
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Advancing precision in air quality forecasting through machine learning integ...
Enhancing plagiarism detection using data pre-processing and machine learning...
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
4 layer Arch & Reference Arch of IoT.pdf
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
Rapid Prototyping: A lecture on prototyping techniques for interface design
Co-training pseudo-labeling for text classification with support vector machi...
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
Statistics on Ai - sourced from AIPRM.pdf
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
future_of_ai_comprehensive_20250822032121.pptx
giants, standing on the shoulders of - by Daniel Stenberg
Configure Apache Mutual Authentication
Basics of Cloud Computing - Cloud Ecosystem
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Custom Battery Pack Design Considerations for Performance and Safety
Advancing precision in air quality forecasting through machine learning integ...

10 Lessons Learned from Building Machine Learning Systems

  • 1. 10 Lessons Learned from building ML systems Xavier Amatriain - Director Algorithms Engineering November 2014
  • 4. Netflix Scale ▪ > 50M members ▪ > 40 countries ▪ > 1000 device types ▪ > 7B hours in Q2 2014 ▪ Plays: > 70M/day ▪ Searches: > 4M/day ▪ Ratings: > 6M/day ▪ Log 100B events/day ▪ 31.62% of peak US downstream traffic
  • 5. Smart Models ■ Regression models (Logistic, Linear, Elastic nets) ■ GBDT/RF ■ SVD & other MF models ■ Factorization Machines ■ Restricted Boltzmann Machines ■ Markov Chains & other graphical models ■ Clustering (from k-means to HDP) ■ Deep ANN ■ LDA ■ Association Rules ■ …
  • 7. 1. More data vs. & Better Models
  • 8. More data or better models? Really? Anand Rajaraman: Former Stanford Prof. & Senior VP at Walmart
  • 9. More data or better models? Sometimes, it’s not about more data
  • 10. More data or better models? [Banko and Brill, 2001] Norvig: “Google does not have better Algorithms, only more Data” Many features/ low-bias models
  • 11. More data or better models? Sometimes, it’s not about more data
  • 12. 2. You might not need all your Big Data
  • 13. How useful is Big Data ■ “Everybody” has Big Data ■ But not everybody needs it ■ E.g. Do you need many millions of users if the goal is to compute a MF of, say, 100 factors? ■ Many times, doing some kind of smart (e.g. stratified) sampling can produce as good or even better results as using it all
  • 14. 3. The fact that a more complex model does not improve things does not mean you don’t need one
  • 15. Better Models and features that “don’t work” ■ Imagine the following scenario: ■ You have a linear model and for some time you have been selecting and optimizing features for that model ■ If you try a more complex (e.g. non-linear) model with the same features you are not likely to see any improvement ■ If you try to add more expressive features, the existing model is likely not to capture them and you are not likely to see any improvement
  • 16. Better Models and features that “don’t work” ■ More complex features may require a more complex model ■ A more complex model may not show improvements with a feature set that is too simple
  • 17. 4. Be thoughtful about your training data
  • 18. Defining training/testing data ■ Imagine you are training a simple binary classifier ■ Defining positive and negative labels -> Non-trivial task ■ E.g. Is this a positive or a negative? ■ User watches a movie to completion and rates it 1 star ■ User watches the same movie again (maybe because she can’t find anything else) ■ User abandons movie after 5 minutes, or 15 minutes… or 1 hour ■ User abandons TV show after 2 episodes, or 10 episode… or 1 season ■ User adds something to her list but never watches it
  • 19. Other training data issues: Time traveling ■ Time traveling: usage of features that originated after the event you are trying to predict ■ E.g. Your rating a movie is a pretty good prediction of you watching that movie, especially because most ratings happen AFTER you watch the movie ■ It can get tricky when you have many features that relate to each other ■ Whenever we see an offline experiment with huge wins, the first question we ask ourselves is: “Is there time traveling?”
  • 20. 5. Learn to deal with (The curse of) Presentation Bias
  • 21. 2D Navigational Modeling More likely to see Less likely
  • 22. The curse of presentation bias ■ User can only click on what you decide to show ■ But, what you decide to show is the result of what your model predicted is good ■ Simply treating things you show as negatives is not likely to work ■ Better options ■ Correcting for the probability a user will click on a position -> Attention models ■ Explore/exploit approaches such as MAB
  • 23. 6. The UI is the algorithm’s only communication channel with that which matters most: the users
  • 24. UI->Algorithm->UI ■ The UI generates the user feedback that we will input into the algorithms ■ The UI is also where the results of our algorithms will be shown ■ A change in the UI might require a change in algorithms and viceversa
  • 25. 7. Data and Models are great. You know what’s even better? The right evaluation approach
  • 27. Executing A/B tests Measure differences in metrics across statistically identical populations that each experience a different algorithm. ■ Decisions on the product always data-driven ■ Overall Evaluation Criteria (OEC) = member retention ■ Use long-term metrics whenever possible ■ Short-term metrics can be informative and allow faster decisions ■ But, not always aligned with OEC
  • 28. Offline testing ■ Measure model performance, using (IR) metrics ■ Offline performance used as an indication to make informed decisions on follow-up A/B tests ■ A critical (and mostly unsolved) issue is how offline metrics can correlate with A/B test results. Problem Data Metrics Algorithm Model
  • 29. 8. Distributing algorithms is important, but knowing at what level to do it is even more important
  • 30. The three levels of Distribution/Parallelization 1. For each subset of the population (e.g. region) 2. For each combination of the hyperparameters 3. For each subset of the training data Each level has different requirements
  • 31. ANN Training over GPUS and AWS
  • 32. 9. It pays off to be smart about choosing your hyperparameters
  • 33. Hyperparameter optimization ■ Automate hyperparameter optimization by choosing the right metric. ■ But, is it enough to choose the value that maximizes the metric? ■ E.g. Is a regularization lambda of 0 better than a lambda = 1000 that decreases your metric by only 1%? ■ Also, think about using Bayesian Optimization (Gaussian Processes) instead of grid search
  • 34. 10. There are things you can do offline and there are things you can’t… and there is nearline for everything in between
  • 35. System Overview ▪ Blueprint for multiple personalization algorithm services ▪ Ranking ▪ Row selection ▪ Ratings ▪ … ▪ Recommendation involving multi-layered Machine Learning
  • 37. Conclusions 1. Choose the right metric 2. Be thoughtful about your data 3. Understand dependencies between data and models 4. Optimize only what matters
  • 38. Xavier Amatriain (@xamat) [email protected] Thanks! (and yes, we are hiring)