Lessons From Integrating Machine Learning Models
Into Data Products
Sharath Rao
Engineering Manager/ML Engineer
Search and Discovery
About me
• Currently leading full stack search and discovery team at Instacart
• Built data products in search, recommendations and ad targeting over
the past 10 years
@sharathrao
v
Talk Outline
• Integrating machine learning models into customer workflows
• Economies of scope with data products
• Our use of shared features store for reusing features across models
v
The Instacart Value Proposition
Groceries from stores
you love
delivered
to your
doorstep
in as little as
an hour
+ + + =
v
Customer Experience
Select a

Store
Shop for Groceries Checkout Select Delivery Time Delivered to
Doorstep
v
Shopper Experience
Accept Order Find the Groceries Out for Delivery
Delivered to
Doorstep
Scan Barcode
v
Four Sided Marketplace
Customers Shoppers
Products

(Advertisers)
Search
Advertising
Shopping
Delivery
Customer Service
Inventory
Picking
Loyalty
Stores

(Retailers)
v
Integrating models into the customer workflows
v
At first there was probably just one rule model
making a decision for the customer
v
Probably training the first ever model
Extract Train
Examples
Model.fit(X)
Production DB
Feature
Extraction
Model File/Weights
v
and the simplest scoring pipeline
Extract Test
Examples
Model.predict(X)
Production DB
Feature
Extraction
Model Predictions
• Scoring pipeline identical to training pipeline
• Scoring decoupled from serving
• Low risk strategy, if possible at all
v
• all you need is one model for one customer workflow in perpetuity
• scale/big data isn’t a thing
• microservices/SOA isn’t a thing
Works OK (for a while) as long as …
and SLAs isn’t a thing
v
Most products eventually become more complex than just a model
v
Context: Customer on storefront
Action: Select and rank rows and
product within each row
Storefront Optimization
v
Context: Searching for “ice cream”
Action: Match products for the
query and rank to maximize clicks/
conversions
Search Result Ranking
v
Context: Customer adds product to cart
Action: Recommend products that user
might buy after purchasing pack of chips
Next Item Recommendation
v
Context: Customer considering
replacement for a product
Replacement Item Recommendation
Action: Recommend products
that user might accept as a
substitute for original choice
v
Entities and relationships to (statistically) model
items
products
aisles
departments
retailers
queriescustomers brands
Most of our data products are about
modeling relationships between them
Catalog
Taxonomy
v
Most Products Eventually Become More Complex
• Customer is seen in several contexts
• Product/Market fit and Scale increases upside to predictive modeling
• Many models must share common inputs. Eg: User profiles, product taxonomy
• Model outputs may be input features into other models
These realities change how ML models are
operationalized and integrated into the product
v
Scoring Latency: How fast do they want it?
• Studies show slower response times causally linked
to lower engagement
• Customer expectations of latency are product
dependent
‣ Web search > Travel search > Loan applications
• In general there is more tolerance for high leverage,
one-off interactions
v
Context Sensitivity: How much does recent information matter?
• Does the model materially benefit from real-time/contextual data
• What can be cached/pre-computed?
• Clever product changes can help you ‘buy time’ or
set customer expectations
‣ Eg: Get as much information upfront
v
Not everything is low latency and context sensitive
• Real-time scoring
• ~300 ms SLA
• Short-term cache for head
• Score in background
• few seconds
• Best effort response
• Batch jobs for model scoring
• Near identical pipeline for train and score
• Cache and serve
• Simple models/rules
• Long tail of inputs
• Independent inputs
v
Not everything is low latency and context sensitive
• re-rank search results
• query expansion
• Product Categorization
• Top N recommendations
• Complementary Products
• recommendations at checkout
• rerank storefront based on cart
• New Arrival recommendations
• Storefront optimization
• BM25 matching
• Autocomplete
v
Questions worth asking for many problems
• Is Q3 Q2 Q1 a reasonable scaling strategy?
‣ typical for recommendations/personalization
• How about Q4 Q1?
‣ typical for search
Q1Q2
Q3 Q4
v
Economies of Scope - Sharing Features Across Models
v
As a team builds more data products …
• More models
• More features
• Many models with overlapping feature sets
‣ different teams compute very similar features for different tasks
• ‘On the fly’ features extracted repeatedly through bespoke code
v
Economies of Scope (n): Average total cost of production decreases
as a result of increasing the number of different goods produced
v
With related modeling problems features can be shared
Is X a good replacement for Y for User U Would user U find product X relevant for query Q?
• Text match between X and Y
• Replacement Acceptance rate of X|Y
• Price difference
• Affinity of X for user U
• Text match between X and Q
• Historical Conversion rate of X for Q
• Affinity of X for user U
v
Types of Features
• Conversion rate (product, query)
• Conversion rate (product, brand)
• Replacement rate (product1, product2)
• Conversion rate (user, aisle)
• Popularity (product)
• ..
• Seasonality
‣ day of week
• State
‣ eg: Is product on sale now?
• Scores from another real-time
system Eg: tfidf/BM25
• word2vec (query)
• word2vec (product description)
• word2vec (user history)
• LDA (product description)
• ConvNet Features (product image)
• …
Computed on the fly Historical Aggregates Learned Representations
v
Benefits of a Shared Feature Store (SFS)
• Share features across models
• For features that expensive to compute this also speeds up training
• Maintain feature integrity (‘sacred contract’) across training/development/scoring
v
Features extracted from an observation must be invariant to
whether they were extracted in the
training pipeline or scoring pipeline
Sacred Contract
v
Towards honoring the sacred contract
• Log features during scoring and reuse during training
‣ works well except for new features
‣ if storage is a concern, write all features for a sample of users
• Use exact same code in both training and scoring phase
‣ don’t rewrite or translate across languages
‣ write cache-able data once and read during both training and scoring
v
Shared Features Store Requirements
• Need a low latency HA key-value store (cassandra/dynamo etc.)
• Make it easy for anyone to read/write features
• Maintain libraries that abstract out details of reading/writing features
v
Features Interface
• key: must be an n-tuple. Eg: product_id, {user, product_id}
• value: json payload (currently vector of doubles)
v
Writing Features to KV Store
• Periodic batch jobs write features
• Version features so that updates can be handled
• Also write feature metadata and default values (if applicable)
v
Reading Features
• Each model features extractor specifies list of all shared features it needs
• Shared library transparently* retrieves features for model training/scoring
v
Retrieve features from the KV store
Data in Feature Store Input Dataframe
v
Retrieve features from the KV store
Data in Feature Store Input Dataframe
Retrieved and Appended Features
v
Retrieve features from the KV store
Retrieved and Appended Features
Final Model Score
v
What Reading Features Really Entails
• Each model features extractor specifies list of all shared features it needs
• Shared library transparently retrieves features for model training/scoring
‣ talk to the right feature store (dev/staging/production)
‣ retrieve features for all keys
‣ munge to incorporate default values
‣ return data frame of features easily join-able with rest of feature extraction
vCurrently these shared features are used across 3 different models and growing …
v
Summary
• Think through deeply about how model decisions integrate into the product
• If possible consider product changes that maximize impact of model predictions
• Look for economies of scope while going from 1 to N models
v
Data Products in Search and Discovery
• query autocorrection
• query spell correction
• query expansion
• deep matching/document expansion
• search ranking
• search advertising
• Substitute/replacements products
• Next item recommendations
• Next basket recommendations
• Guided discovery
• Interpretable recommendations
Search Discovery
v
Thank you!
We are
hiring!
Senior Machine Learning Engineer http://bit.ly/2kzHpcg

DataEngConf 2017 - Machine Learning Models in Production

  • 1.
    Lessons From IntegratingMachine Learning Models Into Data Products Sharath Rao Engineering Manager/ML Engineer Search and Discovery
  • 2.
    About me • Currentlyleading full stack search and discovery team at Instacart • Built data products in search, recommendations and ad targeting over the past 10 years @sharathrao
  • 3.
    v Talk Outline • Integratingmachine learning models into customer workflows • Economies of scope with data products • Our use of shared features store for reusing features across models
  • 4.
    v The Instacart ValueProposition Groceries from stores you love delivered to your doorstep in as little as an hour + + + =
  • 5.
    v Customer Experience Select a
 Store Shopfor Groceries Checkout Select Delivery Time Delivered to Doorstep
  • 6.
    v Shopper Experience Accept OrderFind the Groceries Out for Delivery Delivered to Doorstep Scan Barcode
  • 7.
    v Four Sided Marketplace CustomersShoppers Products
 (Advertisers) Search Advertising Shopping Delivery Customer Service Inventory Picking Loyalty Stores
 (Retailers)
  • 8.
    v Integrating models intothe customer workflows
  • 9.
    v At first therewas probably just one rule model making a decision for the customer
  • 10.
    v Probably training thefirst ever model Extract Train Examples Model.fit(X) Production DB Feature Extraction Model File/Weights
  • 11.
    v and the simplestscoring pipeline Extract Test Examples Model.predict(X) Production DB Feature Extraction Model Predictions • Scoring pipeline identical to training pipeline • Scoring decoupled from serving • Low risk strategy, if possible at all
  • 12.
    v • all youneed is one model for one customer workflow in perpetuity • scale/big data isn’t a thing • microservices/SOA isn’t a thing Works OK (for a while) as long as … and SLAs isn’t a thing
  • 13.
    v Most products eventuallybecome more complex than just a model
  • 14.
    v Context: Customer onstorefront Action: Select and rank rows and product within each row Storefront Optimization
  • 15.
    v Context: Searching for“ice cream” Action: Match products for the query and rank to maximize clicks/ conversions Search Result Ranking
  • 16.
    v Context: Customer addsproduct to cart Action: Recommend products that user might buy after purchasing pack of chips Next Item Recommendation
  • 17.
    v Context: Customer considering replacementfor a product Replacement Item Recommendation Action: Recommend products that user might accept as a substitute for original choice
  • 18.
    v Entities and relationshipsto (statistically) model items products aisles departments retailers queriescustomers brands Most of our data products are about modeling relationships between them Catalog Taxonomy
  • 19.
    v Most Products EventuallyBecome More Complex • Customer is seen in several contexts • Product/Market fit and Scale increases upside to predictive modeling • Many models must share common inputs. Eg: User profiles, product taxonomy • Model outputs may be input features into other models These realities change how ML models are operationalized and integrated into the product
  • 20.
    v Scoring Latency: Howfast do they want it? • Studies show slower response times causally linked to lower engagement • Customer expectations of latency are product dependent ‣ Web search > Travel search > Loan applications • In general there is more tolerance for high leverage, one-off interactions
  • 21.
    v Context Sensitivity: Howmuch does recent information matter? • Does the model materially benefit from real-time/contextual data • What can be cached/pre-computed? • Clever product changes can help you ‘buy time’ or set customer expectations ‣ Eg: Get as much information upfront
  • 22.
    v Not everything islow latency and context sensitive • Real-time scoring • ~300 ms SLA • Short-term cache for head • Score in background • few seconds • Best effort response • Batch jobs for model scoring • Near identical pipeline for train and score • Cache and serve • Simple models/rules • Long tail of inputs • Independent inputs
  • 23.
    v Not everything islow latency and context sensitive • re-rank search results • query expansion • Product Categorization • Top N recommendations • Complementary Products • recommendations at checkout • rerank storefront based on cart • New Arrival recommendations • Storefront optimization • BM25 matching • Autocomplete
  • 24.
    v Questions worth askingfor many problems • Is Q3 Q2 Q1 a reasonable scaling strategy? ‣ typical for recommendations/personalization • How about Q4 Q1? ‣ typical for search Q1Q2 Q3 Q4
  • 25.
    v Economies of Scope- Sharing Features Across Models
  • 26.
    v As a teambuilds more data products … • More models • More features • Many models with overlapping feature sets ‣ different teams compute very similar features for different tasks • ‘On the fly’ features extracted repeatedly through bespoke code
  • 27.
    v Economies of Scope(n): Average total cost of production decreases as a result of increasing the number of different goods produced
  • 28.
    v With related modelingproblems features can be shared Is X a good replacement for Y for User U Would user U find product X relevant for query Q? • Text match between X and Y • Replacement Acceptance rate of X|Y • Price difference • Affinity of X for user U • Text match between X and Q • Historical Conversion rate of X for Q • Affinity of X for user U
  • 29.
    v Types of Features •Conversion rate (product, query) • Conversion rate (product, brand) • Replacement rate (product1, product2) • Conversion rate (user, aisle) • Popularity (product) • .. • Seasonality ‣ day of week • State ‣ eg: Is product on sale now? • Scores from another real-time system Eg: tfidf/BM25 • word2vec (query) • word2vec (product description) • word2vec (user history) • LDA (product description) • ConvNet Features (product image) • … Computed on the fly Historical Aggregates Learned Representations
  • 30.
    v Benefits of aShared Feature Store (SFS) • Share features across models • For features that expensive to compute this also speeds up training • Maintain feature integrity (‘sacred contract’) across training/development/scoring
  • 31.
    v Features extracted froman observation must be invariant to whether they were extracted in the training pipeline or scoring pipeline Sacred Contract
  • 32.
    v Towards honoring thesacred contract • Log features during scoring and reuse during training ‣ works well except for new features ‣ if storage is a concern, write all features for a sample of users • Use exact same code in both training and scoring phase ‣ don’t rewrite or translate across languages ‣ write cache-able data once and read during both training and scoring
  • 33.
    v Shared Features StoreRequirements • Need a low latency HA key-value store (cassandra/dynamo etc.) • Make it easy for anyone to read/write features • Maintain libraries that abstract out details of reading/writing features
  • 34.
    v Features Interface • key:must be an n-tuple. Eg: product_id, {user, product_id} • value: json payload (currently vector of doubles)
  • 35.
    v Writing Features toKV Store • Periodic batch jobs write features • Version features so that updates can be handled • Also write feature metadata and default values (if applicable)
  • 36.
    v Reading Features • Eachmodel features extractor specifies list of all shared features it needs • Shared library transparently* retrieves features for model training/scoring
  • 37.
    v Retrieve features fromthe KV store Data in Feature Store Input Dataframe
  • 38.
    v Retrieve features fromthe KV store Data in Feature Store Input Dataframe Retrieved and Appended Features
  • 39.
    v Retrieve features fromthe KV store Retrieved and Appended Features Final Model Score
  • 40.
    v What Reading FeaturesReally Entails • Each model features extractor specifies list of all shared features it needs • Shared library transparently retrieves features for model training/scoring ‣ talk to the right feature store (dev/staging/production) ‣ retrieve features for all keys ‣ munge to incorporate default values ‣ return data frame of features easily join-able with rest of feature extraction
  • 41.
    vCurrently these sharedfeatures are used across 3 different models and growing …
  • 42.
    v Summary • Think throughdeeply about how model decisions integrate into the product • If possible consider product changes that maximize impact of model predictions • Look for economies of scope while going from 1 to N models
  • 43.
    v Data Products inSearch and Discovery • query autocorrection • query spell correction • query expansion • deep matching/document expansion • search ranking • search advertising • Substitute/replacements products • Next item recommendations • Next basket recommendations • Guided discovery • Interpretable recommendations Search Discovery
  • 44.
    v Thank you! We are hiring! SeniorMachine Learning Engineer http://bit.ly/2kzHpcg