Data/AI driven product
development
from video streaming to telehealth
Xavier Amatriain
Co-founder/CTO Curai
(with Anitha Kannan, Head of ML Research, Curai)
August 18, 2022
About me...
● Researcher in Recommender Systems
● Started and led ML Algorithms at Netflix
● Head of Engineering at Quora
● Currently co-founder/CTO at Curai
2
Outline
1. Data/AI driven product development: experiences in recommender
systems
2. Data/AI driven product development in healthcare: the Curai
experience
3. Principles for data/AI driven product development
Principles for data/AI driven product development (preview)
1. Make data trustworthy and accessible
2. Follow a hypothesis-driven offline/online
experimentation approach with clearly defined metrics
3. Start from the simplest approach, ensure AI improves
over time, with data/metrics driving improvement
4. More data only matters if it’s better data, and if the
model is complex enough to learn from it
5. AI affects UX and UX affects AI
1. Data/AI driven
product
development:
Recsys
5
What we were interested in:
▪ Improving the product with data + AI
▪ Hypothesis: higher quality recommendations
will lead to higher member retention
Proxy (offline) question:
▪ Accuracy in predicted rating
▪ Improve by 10% = $1million!
▪ Metric:
▪ Top 2 algorithms
▪ SVD - Prize RMSE: 0.8914
▪ RBM - Prize RMSE: 0.8990
▪ Linear blend Prize RMSE: 0.88
▪ Limitations
▪ Designed for 100M ratings, not XB ratings
▪ Not adaptable as users add ratings
▪ Performance issues
What about the final prize ensembles?
● Offline studies showed they were
too computationally intensive to
scale
● Expected improvement not worth
engineering effort
● Plus…. we uncovered that the
proxy question (offline
experiment) did not correlate with
online product gains
https://amatriain.net/blog/
Evolution of the Recommender Problem
Rating Ranking Page
Optimization
4.7
Context-aware
Recommendatio
ns
Context
Popularity
Predicted
Rating
1
2
3
4
5
Linear Model:
frank
(u,v) = w1
p(v) + w2
r(u,v) + b
Final
Ranking
Example: Two features, linear model
Popularity
1
2
3
4
5
Final
Ranking
Predicted
Rating
Example: Two features, linear model
Data/AI driven product development: from video streaming to telehealth
Ranking - Quora Feed
Goal: Present most interesting stories for
a user at a given time
Interesting = topical relevance +
social relevance + timeliness
Stories = questions + answers
Model: Personalized learning-to-rank
approach
Relevance-ordered vs time-ordered =
big gains in engagement
From ranking to page composition and beyond
From “Modeling User Attention and
Interaction on the Web” 2014 - D. Lagun
2. Data/AI driven
product
development
revisited:
Healthcare
14
● >50% world with no access
to essential health services
○ ~30% of US adults
under-insured
● ~15 min. to capture
information, diagnose,
recommend treatment
● 30% of the medical errors
causing ~400k deaths a
year are due to
misdiagnosis
Healthcare access, quality, and scalability
shortage of 120,000 physicians by 2030
Towards an AI powered learning health system
● Mobile-First Care, always
on, accessible, affordable
● AI + human providers in
the loop for quality care
● Always-Learning system
● AI to operate in-the-wild
(EHR)
FEEDBACK
DATA
MODEL
AI-augmented
medical
conversations
17
What does it mean for AI to be part of medical
practice?
Breakthroughs in AI & healthcare
Research areas at Curai
● Medical Reasoning and
Diagnosis
● NLP/Conversational AI
● Multimodal AI
Healthcare is knowledge intensive
● Medical terminologies/ontologies
○ SNOMED, UMLS, ICD 10
● Expert systems for clinical decision making
○ 1000s diseases and 3500+ findings
○ 30+ years of expert curation
● Electronic access to medical research
● Online reputed websites
Adding domain knowledge to modern AI
approaches is an active area of research 20
21
1. Differential
Diagnosis
SOTA Medical reasoning and diagnosis
ML + Expert systems for Dx models
female
middle aged
fever
cough
Influenza 16.9
bacterial pneumonia 16.9
acute sinusitis 10.9
asthma 10.9
common cold 10.9
influenza 0.753
bacterial pneumonia 0.205
asthma 0.017
acute sinusitis 0.008
pulmonary tuberculosis 0.007
Inputs
DDx with expert system DDx with ML model
Expert
system
Clinical case
simulator
Clinical cases
DDx
ML
model
Common cold
UTI
Acute bronchitis
Female
Middle-aged
Chronic cough
Nasal congestion
Other data
(e.g. EHR)
COVID-aware modeling
Expert
system
Clinical case
simulator
Clinical cases with
DDx
ML
model
Common cold
UTI
Acute bronchitis
Female
Middle-aged
Chronic cough
Nasal congestion
COVID-19
assessment data
COVID-19
COVID-19
female
middle-age
cough
headache
nose discharge
cigarette smoking
hospital personnel
Evaluation
Clinical cases from Semigran dataset.
No clinical case corresponding to COVID
top-1 top-3 top-5
Practitioners 72.1% 84.3% -
Razzaki et.al. - 46.6% 64.7%
Expert system 66% 75% 86%
Ours - Baseline 67.6% 85.8% 92.9%
Ours - COVID as label 61.8% 84.4% 93.3%
Semigran et.al. Evaluation of symptom checkers for self diagnosis and triage: audit study, BMJ 2015
Adding COVID does not
adversely affect
performance
Previous best result based on
inference on graphical model
25
26
2. Conversational
History Taking
27
P: Right now my stomach hurts.
P: It feels like I do need to do a clean out. If you know what I
mean
D: Sorry for the abdominal pain. When did you have last
bowel movement?
P: It was yesterday
D: What was the consistency of stool. Was it soft
well-formed or was it hard?
P: Right now I just want and its watery and very loosely
P: That was was causing with my stomach hurts
D: Any blood or mucus with stools? Was it foul smelling?
P: Nope for all three
D: Any fever
P: Nope
D: I asked as blood or mucus in stool can be due an
underlying infection
D: Any nausea/vomiting?
P: Nope
P: Why does this happen to me?
P: Is it something I have ate?
D: Diarrhea can be often due to indigestion or infection. Did
you eat any outside food or packaged food?
P: yes
Patient-provider dialogue
*The conversation has been de-identified for privacy protection
Combining SOTA LLMs with knowledge
● LLMs are great at:
○ Ability to adapt to a broad range
of tasks and situations
○ Ability to engage with the
audience
○ Giving empathetic responses
○ Showing personality and
sounding natural
28
Thoppilan et.al. LaMDA: Language Models for Dialog Applications, 2022
Roller et. al. Recipes for building an open-domain chatbot 2020
Adiwardna et.al. Towards a human-like open-domain chatbot 2020
● LLMs are not great at:
○ Staying truthful. I.e. they often
hallucinate knowledge
○ Dealing with long-range
dependencies and solving tasks
with large output space
○ Reasoning. They can “retrieve”
knowledge without deeper
understanding or reasoning
29
Conversational history taking
1. Natural language
understanding
a. What did the patient say?
2. Dialog management
a. What to ask when?
b. How to decide when to stop
3. Natural language generation
a. How to ask?
30
3. Summarization
Medical summarization using LLMs
● Insight 1: LLMs (e.g. GPT-3) can be
prompted to produce good
summaries in a few-shot setting
● Insight 2: LLMs can be ensembled
and used as data generators to
improve quality of summarization
results
● Insight 3: Medical domain knowledge
can be injected into these models so
that they produce medically correct
and complete summaries
Hasn't used any thing to
help
Priming Inference
GPT-3
Hasn’t used any thing to help
other than hydrocortisone
Used nothing else to help
other than Benadryl and
hydrocortisone.
10
Trials
21 Labeled Examples
per Priming Context
GPT-3
GPT-3
GPT-3-ENS Labeled Dataset
+
Doctor Labeled/Corrected Dataset
In-House Summarization Model
Confidentiality note: In accordance with our privacy policy, the illustrative examples included in this document do NOT correspond to real patients. They are either synthetic or fully anonymized.
Chat Snippet:
DR: Thanks for ...
PT: No that’s everything
Qualitative Results
Snippet
Model trained on 6400
doctor-labeled
Model trained on 6400
GPT-3 Ensembled
Model trained on
doctor-labeled + GPT-3
Ensembled
DR: Have you ever been tested
for any underlying health
conditions such as diabetes,
hypothyroidism or polycystic
ovarian syndrome?
PT: No
PT: I have been told I have
prediabetes.
Has not been tested for
any underlying health
conditions.
Hasn’t tested for any
underlying health
conditions such as
diabetes, hypothyroidism
or polycystic ovarian
syndrome
Has not been tested for any
underlying health conditions.
Has been told has
prediabetes.
DR: Do you have pus appearing
discharge from the site?
PT: Yes. If the bubbles pop it
leaks out a watery substance
Has pus appearing from
the site.
Pus appearing from the
site
Pus discharge from the site.
If bubbles pop it leaks out a
substance.
33
*The conversation has been de-identified for privacy protection
Chintagunta et.al. Medically aware GPT-3 as a data generator for medical dialog summarization, MLHC 2021
3. Principles
34
Do I need more
data, better data,
better AI algorithms,
or all the above?
What do I need to be “really” data/AI driven?
The case(s) for more/bigger data
Norvig:
“Google does not have
better Algorithms only
more Data”
The case(s) against more data
Is it about bigger models then?
The “Big data paradox” is not a paradox
● Not all data is good data (aka more data only matters if it is “better
data”)
● Only more complex models can benefit from more data -
bias/variance tradeoff
● We need to combine better data with better/more complex models
● And… all of this does not hold for highly parametrized deep learning
models where the bias/variance tradeoff breaks for still unknown
reasons (maybe related to double descent)
Better data leads to better models
Year Breakthrough in AI Datasets (First Available) Algorithms (First Proposal)
1994 Human-level spontaneous speech recognition Spoken Wall Street Journal articles and other
texts (1991)
Hidden Markov Model (1984)
1997 IBM Deep Blue defeated Garry Kasparov 700,000 Grandmaster chess games, aka “The
Extended Book” (1991)
Negascout planning algorithm (1983)
2005 Google’s Arabic- and Chinese-to-English
translation
1,8 trillion tokens from Google Web and News
pages (collected in 2005)
Statistical machine translation algorithm (1988)
2011 IBM watson become the world Jeopardy!
Champion
8,6 million documents from Wikipedia,
Wiktionary, Wikiquote, and Project Gutenberg
(updated in 2005)
Mixture-of-Experts algorithm (1991)
2014 Google’s GoogLeNet object classification at
near-human performance
ImageNet corpus of 1,5 million labeled images
and 1,000 object catagories (2010)
Convolution neural network algorithm (1989)
2015 Google’s Deepmind achieved human parity in
playing 29 Atari games by learning general
control from video
Arcade Learning Environment dataset of over
50 Atari games (2013)
Q-learning algorithm (1992)
Average No. Of Years to Breakthrough 3 years 18 years
The average elapsed time between key algorithm proposals and corresponding advances was about 18 years,
whereas the average elapsed time between key dataset availabilities and corresponding advances was less
than 3 years, or about 6 times faster.
WARNING! It is not
only about data +
models
Model learning depends on objective + metric
● Quora feed example:
○ Training data = implicit + explicit
○ Target function = Value of showing
a story to a user ~ weighted sum of
actions
■ Compute probability of each
action given a story, weight them
by their value to compute expected
value
○ Metric = Any ranking metric
UI is key: e.g. explanations
The importance of the experimentation framework
● Offline
○ Measure model performance, using metrics
○ Offline performance = indication to make decisions
on follow-up A/B tests
○ A critical (and mostly unsolved) issue is how offline
metrics correlate with A/B test results.
● Online
○ Measure differences in metrics across statistically
identical populations that each experience a
different algorithm.
○ Overall Evaluation Criteria (OEC)
■ Use long-term metrics whenever possible
■ Short-term metrics can be informative and
allow faster decisions. But, not always
aligned with OEC
PRINCIPLES
Principles for data/AI driven product development
● Make Data Trustworthy
● Make Data Accessible
● Follow a Hypothesis-driven approach
● Define Clear Metrics
● Measure offline/online
Principles for data/AI driven product development (I)
● Data/metrics drive AI
● AI should improve over time
● More data only matters if it’s better data
● Start with simplest model
● Increase model complexity and data size in parallel
● Connect AI to UI
Principles for data/AI driven product development (summary)
1. Make data trustworthy and accessible
2. Follow a hypothesis-driven offline/online
experimentation approach with clearly defined metrics
3. Start from the simplest approach, ensure AI improves
over time, with data/metrics driving improvement
4. More data only matters if it’s better data, and if the
model is complex enough to learn from it
5. AI affects UX and UX affects AI
2. Further
“reading”
49
4 hour lecture on recommendations
Carnegie Mellon (2014)
1 hour lecture on practical Deep Learning
UC Berkeley (2020)
10 minutes on AI for COVID
Stanford (2020)
1 hour podcast on AI for Healthcare
Gradient Dissent (2021)

More Related Content

PDF
Large Language Models - Chat AI.pdf
PPTX
Adversarial Attacks and Defense
PDF
An Introduction to Generative AI - May 18, 2023
PDF
Adversarial Attacks on A.I. Systems — NextCon, Jan 2019
PDF
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
PDF
Simplified Introduction to AI
PPTX
Cassie Kozyrkov. Journey to AI
PDF
Unlocking the Power of Generative AI An Executive's Guide.pdf
Large Language Models - Chat AI.pdf
Adversarial Attacks and Defense
An Introduction to Generative AI - May 18, 2023
Adversarial Attacks on A.I. Systems — NextCon, Jan 2019
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Simplified Introduction to AI
Cassie Kozyrkov. Journey to AI
Unlocking the Power of Generative AI An Executive's Guide.pdf

What's hot (20)

PPTX
Real time analytics
PPTX
How ChatGPT and AI-assisted coding changes software engineering profoundly
PPTX
Transformers AI PPT.pptx
PDF
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
PDF
Leveraging Generative AI & Best practices
PDF
LLMs Bootcamp
PPTX
Generative AI Masterclass - Model Risk Management.pptx
PPTX
Learning a Personalized Homepage
PDF
Introduction to Knowledge Graphs and Semantic AI
PDF
Recommending for the World
PPTX
Purple Team - Work it out: Organizing Effective Adversary Emulation Exercises
PDF
Generative adversarial networks
PPTX
ChatGPT in Education
PPTX
Netflix talk at ML Platform meetup Sep 2019
PDF
An introduction to computer vision with Hugging Face
PDF
Understanding GenAI/LLM and What is Google Offering - Felix Goh
PPTX
Adversarial machine learning
PDF
Dcgan
PPTX
Using Generative AI
PDF
DataEd Slides: Data Management Maturity - Achieving Best Practices Using DMM
Real time analytics
How ChatGPT and AI-assisted coding changes software engineering profoundly
Transformers AI PPT.pptx
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Leveraging Generative AI & Best practices
LLMs Bootcamp
Generative AI Masterclass - Model Risk Management.pptx
Learning a Personalized Homepage
Introduction to Knowledge Graphs and Semantic AI
Recommending for the World
Purple Team - Work it out: Organizing Effective Adversary Emulation Exercises
Generative adversarial networks
ChatGPT in Education
Netflix talk at ML Platform meetup Sep 2019
An introduction to computer vision with Hugging Face
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Adversarial machine learning
Dcgan
Using Generative AI
DataEd Slides: Data Management Maturity - Achieving Best Practices Using DMM
Ad

Similar to Data/AI driven product development: from video streaming to telehealth (20)

PDF
AI Driven Product Innovation
PDF
AI-driven product innovation: from Recommender Systems to COVID-19
PDF
台灣人工智慧學校北部智慧醫療專班開學典禮 - 主題演講:邁向智慧醫療新時代(陳昇瑋執行長)
PDF
邁向智慧醫療新時代_台灣人工智慧學校中部智慧醫療專班開學主題演講
PDF
台灣人工智慧學校南部智慧醫療專班開學典禮 - 主題演講:邁向智慧醫療新時代(陳昇瑋執行長)
PPTX
AI.pptx
PDF
AI in pharma & biotech: possibilities and realities
PDF
Where AI will (and won't) revolutionize biomedicine
PPTX
AI_for_Health_Professional_Workshop_
PDF
Ai applied in healthcare
PPTX
Role of AI in Transforming the Healthcare Industry
PPTX
ML, biomedical data & trust
PDF
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
PDF
ML to cure the world
PDF
[DSC Europe 24] Ivan Lorencin How LLMs are transforming modern medicine.pdf
PPTX
Module teaching for aft method for data science
PPTX
The Hive Think Tank: Unpacking AI for Healthcare
PPTX
The Why And How Of Machine Learning And AI: An Implementation Guide For Healt...
PDF
AI in Healthcare
PDF
AI in Healthcare: From Hype to Impact (updated)
AI Driven Product Innovation
AI-driven product innovation: from Recommender Systems to COVID-19
台灣人工智慧學校北部智慧醫療專班開學典禮 - 主題演講:邁向智慧醫療新時代(陳昇瑋執行長)
邁向智慧醫療新時代_台灣人工智慧學校中部智慧醫療專班開學主題演講
台灣人工智慧學校南部智慧醫療專班開學典禮 - 主題演講:邁向智慧醫療新時代(陳昇瑋執行長)
AI.pptx
AI in pharma & biotech: possibilities and realities
Where AI will (and won't) revolutionize biomedicine
AI_for_Health_Professional_Workshop_
Ai applied in healthcare
Role of AI in Transforming the Healthcare Industry
ML, biomedical data & trust
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
ML to cure the world
[DSC Europe 24] Ivan Lorencin How LLMs are transforming modern medicine.pdf
Module teaching for aft method for data science
The Hive Think Tank: Unpacking AI for Healthcare
The Why And How Of Machine Learning And AI: An Implementation Guide For Healt...
AI in Healthcare
AI in Healthcare: From Hype to Impact (updated)
Ad

More from Xavier Amatriain (20)

PDF
AI for COVID-19 - Q42020 update
PDF
AI for COVID-19: An online virtual care approach
PDF
Lessons learned from building practical deep learning systems
PDF
AI for healthcare: Scaling Access and Quality of Care for Everyone
PDF
Towards online universal quality healthcare through AI
PDF
From one to zero: Going smaller as a growth strategy
PDF
Learning to speak medicine
PDF
Recommender Systems In Industry
PDF
Medical advice as a Recommender System
PDF
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
PDF
Past present and future of Recommender Systems: an Industry Perspective
PDF
Staying Shallow & Lean in a Deep Learning World
PDF
Machine Learning for Q&A Sites: The Quora Example
PDF
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
PDF
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
PDF
Past, present, and future of Recommender Systems: an industry perspective
PDF
Barcelona ML Meetup - Lessons Learned
PDF
10 more lessons learned from building Machine Learning systems - MLConf
PDF
10 more lessons learned from building Machine Learning systems
PDF
Machine Learning to Grow the World's Knowledge
AI for COVID-19 - Q42020 update
AI for COVID-19: An online virtual care approach
Lessons learned from building practical deep learning systems
AI for healthcare: Scaling Access and Quality of Care for Everyone
Towards online universal quality healthcare through AI
From one to zero: Going smaller as a growth strategy
Learning to speak medicine
Recommender Systems In Industry
Medical advice as a Recommender System
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Past present and future of Recommender Systems: an Industry Perspective
Staying Shallow & Lean in a Deep Learning World
Machine Learning for Q&A Sites: The Quora Example
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Past, present, and future of Recommender Systems: an industry perspective
Barcelona ML Meetup - Lessons Learned
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems
Machine Learning to Grow the World's Knowledge

Recently uploaded (20)

PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PPTX
Internet of Everything -Basic concepts details
PDF
Decision Optimization - From Theory to Practice
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Human Computer Interaction Miterm Lesson
PDF
Ensemble model-based arrhythmia classification with local interpretable model...
PDF
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
PDF
zbrain.ai-Scope Key Metrics Configuration and Best Practices.pdf
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
substrate PowerPoint Presentation basic one
PDF
Examining Bias in AI Generated News Content.pdf
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
Data Virtualization in Action: Scaling APIs and Apps with FME
Lung cancer patients survival prediction using outlier detection and optimize...
Internet of Everything -Basic concepts details
Decision Optimization - From Theory to Practice
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Human Computer Interaction Miterm Lesson
Ensemble model-based arrhythmia classification with local interpretable model...
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
zbrain.ai-Scope Key Metrics Configuration and Best Practices.pdf
giants, standing on the shoulders of - by Daniel Stenberg
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
SGT Report The Beast Plan and Cyberphysical Systems of Control
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
4 layer Arch & Reference Arch of IoT.pdf
Advancing precision in air quality forecasting through machine learning integ...
substrate PowerPoint Presentation basic one
Examining Bias in AI Generated News Content.pdf
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf

Data/AI driven product development: from video streaming to telehealth

  • 1. Data/AI driven product development from video streaming to telehealth Xavier Amatriain Co-founder/CTO Curai (with Anitha Kannan, Head of ML Research, Curai) August 18, 2022
  • 2. About me... ● Researcher in Recommender Systems ● Started and led ML Algorithms at Netflix ● Head of Engineering at Quora ● Currently co-founder/CTO at Curai 2
  • 3. Outline 1. Data/AI driven product development: experiences in recommender systems 2. Data/AI driven product development in healthcare: the Curai experience 3. Principles for data/AI driven product development
  • 4. Principles for data/AI driven product development (preview) 1. Make data trustworthy and accessible 2. Follow a hypothesis-driven offline/online experimentation approach with clearly defined metrics 3. Start from the simplest approach, ensure AI improves over time, with data/metrics driving improvement 4. More data only matters if it’s better data, and if the model is complex enough to learn from it 5. AI affects UX and UX affects AI
  • 6. What we were interested in: ▪ Improving the product with data + AI ▪ Hypothesis: higher quality recommendations will lead to higher member retention Proxy (offline) question: ▪ Accuracy in predicted rating ▪ Improve by 10% = $1million! ▪ Metric: ▪ Top 2 algorithms ▪ SVD - Prize RMSE: 0.8914 ▪ RBM - Prize RMSE: 0.8990 ▪ Linear blend Prize RMSE: 0.88 ▪ Limitations ▪ Designed for 100M ratings, not XB ratings ▪ Not adaptable as users add ratings ▪ Performance issues
  • 7. What about the final prize ensembles? ● Offline studies showed they were too computationally intensive to scale ● Expected improvement not worth engineering effort ● Plus…. we uncovered that the proxy question (offline experiment) did not correlate with online product gains https://amatriain.net/blog/
  • 8. Evolution of the Recommender Problem Rating Ranking Page Optimization 4.7 Context-aware Recommendatio ns Context
  • 9. Popularity Predicted Rating 1 2 3 4 5 Linear Model: frank (u,v) = w1 p(v) + w2 r(u,v) + b Final Ranking Example: Two features, linear model
  • 12. Ranking - Quora Feed Goal: Present most interesting stories for a user at a given time Interesting = topical relevance + social relevance + timeliness Stories = questions + answers Model: Personalized learning-to-rank approach Relevance-ordered vs time-ordered = big gains in engagement
  • 13. From ranking to page composition and beyond From “Modeling User Attention and Interaction on the Web” 2014 - D. Lagun
  • 15. ● >50% world with no access to essential health services ○ ~30% of US adults under-insured ● ~15 min. to capture information, diagnose, recommend treatment ● 30% of the medical errors causing ~400k deaths a year are due to misdiagnosis Healthcare access, quality, and scalability shortage of 120,000 physicians by 2030
  • 16. Towards an AI powered learning health system ● Mobile-First Care, always on, accessible, affordable ● AI + human providers in the loop for quality care ● Always-Learning system ● AI to operate in-the-wild (EHR) FEEDBACK DATA MODEL AI-augmented medical conversations
  • 17. 17 What does it mean for AI to be part of medical practice?
  • 18. Breakthroughs in AI & healthcare
  • 19. Research areas at Curai ● Medical Reasoning and Diagnosis ● NLP/Conversational AI ● Multimodal AI
  • 20. Healthcare is knowledge intensive ● Medical terminologies/ontologies ○ SNOMED, UMLS, ICD 10 ● Expert systems for clinical decision making ○ 1000s diseases and 3500+ findings ○ 30+ years of expert curation ● Electronic access to medical research ● Online reputed websites Adding domain knowledge to modern AI approaches is an active area of research 20
  • 22. SOTA Medical reasoning and diagnosis
  • 23. ML + Expert systems for Dx models female middle aged fever cough Influenza 16.9 bacterial pneumonia 16.9 acute sinusitis 10.9 asthma 10.9 common cold 10.9 influenza 0.753 bacterial pneumonia 0.205 asthma 0.017 acute sinusitis 0.008 pulmonary tuberculosis 0.007 Inputs DDx with expert system DDx with ML model Expert system Clinical case simulator Clinical cases DDx ML model Common cold UTI Acute bronchitis Female Middle-aged Chronic cough Nasal congestion Other data (e.g. EHR)
  • 24. COVID-aware modeling Expert system Clinical case simulator Clinical cases with DDx ML model Common cold UTI Acute bronchitis Female Middle-aged Chronic cough Nasal congestion COVID-19 assessment data COVID-19 COVID-19 female middle-age cough headache nose discharge cigarette smoking hospital personnel
  • 25. Evaluation Clinical cases from Semigran dataset. No clinical case corresponding to COVID top-1 top-3 top-5 Practitioners 72.1% 84.3% - Razzaki et.al. - 46.6% 64.7% Expert system 66% 75% 86% Ours - Baseline 67.6% 85.8% 92.9% Ours - COVID as label 61.8% 84.4% 93.3% Semigran et.al. Evaluation of symptom checkers for self diagnosis and triage: audit study, BMJ 2015 Adding COVID does not adversely affect performance Previous best result based on inference on graphical model 25
  • 27. 27 P: Right now my stomach hurts. P: It feels like I do need to do a clean out. If you know what I mean D: Sorry for the abdominal pain. When did you have last bowel movement? P: It was yesterday D: What was the consistency of stool. Was it soft well-formed or was it hard? P: Right now I just want and its watery and very loosely P: That was was causing with my stomach hurts D: Any blood or mucus with stools? Was it foul smelling? P: Nope for all three D: Any fever P: Nope D: I asked as blood or mucus in stool can be due an underlying infection D: Any nausea/vomiting? P: Nope P: Why does this happen to me? P: Is it something I have ate? D: Diarrhea can be often due to indigestion or infection. Did you eat any outside food or packaged food? P: yes Patient-provider dialogue *The conversation has been de-identified for privacy protection
  • 28. Combining SOTA LLMs with knowledge ● LLMs are great at: ○ Ability to adapt to a broad range of tasks and situations ○ Ability to engage with the audience ○ Giving empathetic responses ○ Showing personality and sounding natural 28 Thoppilan et.al. LaMDA: Language Models for Dialog Applications, 2022 Roller et. al. Recipes for building an open-domain chatbot 2020 Adiwardna et.al. Towards a human-like open-domain chatbot 2020 ● LLMs are not great at: ○ Staying truthful. I.e. they often hallucinate knowledge ○ Dealing with long-range dependencies and solving tasks with large output space ○ Reasoning. They can “retrieve” knowledge without deeper understanding or reasoning
  • 29. 29 Conversational history taking 1. Natural language understanding a. What did the patient say? 2. Dialog management a. What to ask when? b. How to decide when to stop 3. Natural language generation a. How to ask?
  • 31. Medical summarization using LLMs ● Insight 1: LLMs (e.g. GPT-3) can be prompted to produce good summaries in a few-shot setting ● Insight 2: LLMs can be ensembled and used as data generators to improve quality of summarization results ● Insight 3: Medical domain knowledge can be injected into these models so that they produce medically correct and complete summaries
  • 32. Hasn't used any thing to help Priming Inference GPT-3 Hasn’t used any thing to help other than hydrocortisone Used nothing else to help other than Benadryl and hydrocortisone. 10 Trials 21 Labeled Examples per Priming Context GPT-3 GPT-3 GPT-3-ENS Labeled Dataset + Doctor Labeled/Corrected Dataset In-House Summarization Model Confidentiality note: In accordance with our privacy policy, the illustrative examples included in this document do NOT correspond to real patients. They are either synthetic or fully anonymized. Chat Snippet: DR: Thanks for ... PT: No that’s everything
  • 33. Qualitative Results Snippet Model trained on 6400 doctor-labeled Model trained on 6400 GPT-3 Ensembled Model trained on doctor-labeled + GPT-3 Ensembled DR: Have you ever been tested for any underlying health conditions such as diabetes, hypothyroidism or polycystic ovarian syndrome? PT: No PT: I have been told I have prediabetes. Has not been tested for any underlying health conditions. Hasn’t tested for any underlying health conditions such as diabetes, hypothyroidism or polycystic ovarian syndrome Has not been tested for any underlying health conditions. Has been told has prediabetes. DR: Do you have pus appearing discharge from the site? PT: Yes. If the bubbles pop it leaks out a watery substance Has pus appearing from the site. Pus appearing from the site Pus discharge from the site. If bubbles pop it leaks out a substance. 33 *The conversation has been de-identified for privacy protection Chintagunta et.al. Medically aware GPT-3 as a data generator for medical dialog summarization, MLHC 2021
  • 35. Do I need more data, better data, better AI algorithms, or all the above? What do I need to be “really” data/AI driven?
  • 36. The case(s) for more/bigger data Norvig: “Google does not have better Algorithms only more Data”
  • 37. The case(s) against more data
  • 38. Is it about bigger models then?
  • 39. The “Big data paradox” is not a paradox ● Not all data is good data (aka more data only matters if it is “better data”) ● Only more complex models can benefit from more data - bias/variance tradeoff ● We need to combine better data with better/more complex models ● And… all of this does not hold for highly parametrized deep learning models where the bias/variance tradeoff breaks for still unknown reasons (maybe related to double descent)
  • 40. Better data leads to better models Year Breakthrough in AI Datasets (First Available) Algorithms (First Proposal) 1994 Human-level spontaneous speech recognition Spoken Wall Street Journal articles and other texts (1991) Hidden Markov Model (1984) 1997 IBM Deep Blue defeated Garry Kasparov 700,000 Grandmaster chess games, aka “The Extended Book” (1991) Negascout planning algorithm (1983) 2005 Google’s Arabic- and Chinese-to-English translation 1,8 trillion tokens from Google Web and News pages (collected in 2005) Statistical machine translation algorithm (1988) 2011 IBM watson become the world Jeopardy! Champion 8,6 million documents from Wikipedia, Wiktionary, Wikiquote, and Project Gutenberg (updated in 2005) Mixture-of-Experts algorithm (1991) 2014 Google’s GoogLeNet object classification at near-human performance ImageNet corpus of 1,5 million labeled images and 1,000 object catagories (2010) Convolution neural network algorithm (1989) 2015 Google’s Deepmind achieved human parity in playing 29 Atari games by learning general control from video Arcade Learning Environment dataset of over 50 Atari games (2013) Q-learning algorithm (1992) Average No. Of Years to Breakthrough 3 years 18 years The average elapsed time between key algorithm proposals and corresponding advances was about 18 years, whereas the average elapsed time between key dataset availabilities and corresponding advances was less than 3 years, or about 6 times faster.
  • 41. WARNING! It is not only about data + models
  • 42. Model learning depends on objective + metric ● Quora feed example: ○ Training data = implicit + explicit ○ Target function = Value of showing a story to a user ~ weighted sum of actions ■ Compute probability of each action given a story, weight them by their value to compute expected value ○ Metric = Any ranking metric
  • 43. UI is key: e.g. explanations
  • 44. The importance of the experimentation framework ● Offline ○ Measure model performance, using metrics ○ Offline performance = indication to make decisions on follow-up A/B tests ○ A critical (and mostly unsolved) issue is how offline metrics correlate with A/B test results. ● Online ○ Measure differences in metrics across statistically identical populations that each experience a different algorithm. ○ Overall Evaluation Criteria (OEC) ■ Use long-term metrics whenever possible ■ Short-term metrics can be informative and allow faster decisions. But, not always aligned with OEC
  • 46. Principles for data/AI driven product development ● Make Data Trustworthy ● Make Data Accessible ● Follow a Hypothesis-driven approach ● Define Clear Metrics ● Measure offline/online
  • 47. Principles for data/AI driven product development (I) ● Data/metrics drive AI ● AI should improve over time ● More data only matters if it’s better data ● Start with simplest model ● Increase model complexity and data size in parallel ● Connect AI to UI
  • 48. Principles for data/AI driven product development (summary) 1. Make data trustworthy and accessible 2. Follow a hypothesis-driven offline/online experimentation approach with clearly defined metrics 3. Start from the simplest approach, ensure AI improves over time, with data/metrics driving improvement 4. More data only matters if it’s better data, and if the model is complex enough to learn from it 5. AI affects UX and UX affects AI
  • 50. 4 hour lecture on recommendations Carnegie Mellon (2014) 1 hour lecture on practical Deep Learning UC Berkeley (2020) 10 minutes on AI for COVID Stanford (2020) 1 hour podcast on AI for Healthcare Gradient Dissent (2021)