DataEngConf 2017 - Machine Learning Models in Production

Lessons From Integrating Machine Learning Models
Into Data Products
Sharath Rao
Engineering Manager/ML Engineer
Search and Discovery

About me
• Currently leading full stack search and discovery team at Instacart
• Built data products in search, recommendations and ad targeting over
the past 10 years
@sharathrao

v
Talk Outline
• Integrating machine learning models into customer workflows
• Economies of scope with data products
• Our use of shared features store for reusing features across models

v
The Instacart Value Proposition
Groceries from stores
you love
delivered
to your
doorstep
in as little as
an hour
+ + + =

v
Customer Experience
Select a 
Store
Shop for Groceries Checkout Select Delivery Time Delivered to
Doorstep

v
Shopper Experience
Accept Order Find the Groceries Out for Delivery
Delivered to
Doorstep
Scan Barcode

v
Four Sided Marketplace
Customers Shoppers
Products 
(Advertisers)
Search
Advertising
Shopping
Delivery
Customer Service
Inventory
Picking
Loyalty
Stores 
(Retailers)

v
Integrating models into the customer workflows

v
At first there was probably just one rule model
making a decision for the customer

v
Probably training the first ever model
Extract Train
Examples
Model.fit(X)
Production DB
Feature
Extraction
Model File/Weights

v
and the simplest scoring pipeline
Extract Test
Examples
Model.predict(X)
Production DB
Feature
Extraction
Model Predictions
• Scoring pipeline identical to training pipeline
• Scoring decoupled from serving
• Low risk strategy, if possible at all

v
• all you need is one model for one customer workflow in perpetuity
• scale/big data isn’t a thing
• microservices/SOA isn’t a thing
Works OK (for a while) as long as …
and SLAs isn’t a thing

v
Most products eventually become more complex than just a model

v
Context: Customer on storefront
Action: Select and rank rows and
product within each row
Storefront Optimization

v
Context: Searching for “ice cream”
Action: Match products for the
query and rank to maximize clicks/
conversions
Search Result Ranking

v
Context: Customer adds product to cart
Action: Recommend products that user
might buy after purchasing pack of chips
Next Item Recommendation

v
Context: Customer considering
replacement for a product
Replacement Item Recommendation
Action: Recommend products
that user might accept as a
substitute for original choice

v
Entities and relationships to (statistically) model
items
products
aisles
departments
retailers
queriescustomers brands
Most of our data products are about
modeling relationships between them
Catalog
Taxonomy

v
Most Products Eventually Become More Complex
• Customer is seen in several contexts
• Product/Market fit and Scale increases upside to predictive modeling
• Many models must share common inputs. Eg: User profiles, product taxonomy
• Model outputs may be input features into other models
These realities change how ML models are
operationalized and integrated into the product

v
Scoring Latency: How fast do they want it?
• Studies show slower response times causally linked
to lower engagement
• Customer expectations of latency are product
dependent
‣ Web search > Travel search > Loan applications
• In general there is more tolerance for high leverage,
one-off interactions

v
Context Sensitivity: How much does recent information matter?
• Does the model materially benefit from real-time/contextual data
• What can be cached/pre-computed?
• Clever product changes can help you ‘buy time’ or
set customer expectations
‣ Eg: Get as much information upfront

v
Not everything is low latency and context sensitive
• Real-time scoring
• ~300 ms SLA
• Short-term cache for head
• Score in background
• few seconds
• Best effort response
• Batch jobs for model scoring
• Near identical pipeline for train and score
• Cache and serve
• Simple models/rules
• Long tail of inputs
• Independent inputs

v
Not everything is low latency and context sensitive
• re-rank search results
• query expansion
• Product Categorization
• Top N recommendations
• Complementary Products
• recommendations at checkout
• rerank storefront based on cart
• New Arrival recommendations
• Storefront optimization
• BM25 matching
• Autocomplete

v
Questions worth asking for many problems
• Is Q3 Q2 Q1 a reasonable scaling strategy?
‣ typical for recommendations/personalization
• How about Q4 Q1?
‣ typical for search
Q1Q2
Q3 Q4

v
Economies of Scope - Sharing Features Across Models

v
As a team builds more data products …
• More models
• More features
• Many models with overlapping feature sets
‣ different teams compute very similar features for different tasks
• ‘On the fly’ features extracted repeatedly through bespoke code

v
Economies of Scope (n): Average total cost of production decreases
as a result of increasing the number of different goods produced

v
With related modeling problems features can be shared
Is X a good replacement for Y for User U Would user U find product X relevant for query Q?
• Text match between X and Y
• Replacement Acceptance rate of X|Y
• Price difference
• Affinity of X for user U
• Text match between X and Q
• Historical Conversion rate of X for Q
• Affinity of X for user U

v
Types of Features
• Conversion rate (product, query)
• Conversion rate (product, brand)
• Replacement rate (product1, product2)
• Conversion rate (user, aisle)
• Popularity (product)
• ..
• Seasonality
‣ day of week
• State
‣ eg: Is product on sale now?
• Scores from another real-time
system Eg: tfidf/BM25
• word2vec (query)
• word2vec (product description)
• word2vec (user history)
• LDA (product description)
• ConvNet Features (product image)
• …
Computed on the fly Historical Aggregates Learned Representations

v
Benefits of a Shared Feature Store (SFS)
• Share features across models
• For features that expensive to compute this also speeds up training
• Maintain feature integrity (‘sacred contract’) across training/development/scoring

v
Features extracted from an observation must be invariant to
whether they were extracted in the
training pipeline or scoring pipeline
Sacred Contract

v
Towards honoring the sacred contract
• Log features during scoring and reuse during training
‣ works well except for new features
‣ if storage is a concern, write all features for a sample of users
• Use exact same code in both training and scoring phase
‣ don’t rewrite or translate across languages
‣ write cache-able data once and read during both training and scoring

v
Shared Features Store Requirements
• Need a low latency HA key-value store (cassandra/dynamo etc.)
• Make it easy for anyone to read/write features
• Maintain libraries that abstract out details of reading/writing features

v
Features Interface
• key: must be an n-tuple. Eg: product_id, {user, product_id}
• value: json payload (currently vector of doubles)

v
Writing Features to KV Store
• Periodic batch jobs write features
• Version features so that updates can be handled
• Also write feature metadata and default values (if applicable)

v
Reading Features
• Each model features extractor specifies list of all shared features it needs
• Shared library transparently* retrieves features for model training/scoring

v
Retrieve features from the KV store
Data in Feature Store Input Dataframe

v
Data in Feature Store Input Dataframe
Retrieved and Appended Features

v
Retrieved and Appended Features
Final Model Score

v
What Reading Features Really Entails
• Each model features extractor specifies list of all shared features it needs
• Shared library transparently retrieves features for model training/scoring
‣ talk to the right feature store (dev/staging/production)
‣ retrieve features for all keys
‣ munge to incorporate default values
‣ return data frame of features easily join-able with rest of feature extraction

vCurrently these shared features are used across 3 different models and growing …

v
Summary
• Think through deeply about how model decisions integrate into the product
• If possible consider product changes that maximize impact of model predictions
• Look for economies of scope while going from 1 to N models

v
Data Products in Search and Discovery
• query autocorrection
• query spell correction
• query expansion
• deep matching/document expansion
• search ranking
• search advertising
• Substitute/replacements products
• Next item recommendations
• Next basket recommendations
• Guided discovery
• Interpretable recommendations
Search Discovery

v
Thank you!
We are
hiring!
Senior Machine Learning Engineer http://bit.ly/2kzHpcg

DataEngConf 2017 - Machine Learning Models in Production

More Related Content

What's hot

Similar to DataEngConf 2017 - Machine Learning Models in Production

Recently uploaded

In this document

DataEngConf 2017 - Machine Learning Models in Production