0% found this document useful (0 votes)

90 views64 pages

Inherently Interpretable Models 1 of 2

The document discusses two rule-based machine learning papers: 1. Interpretable Rule Lists (Letham et al.) which introduces Bayesian Rule Lists (BRL) to generate interpretable decision lists from pre-mined rules while maintaining predictive accuracy. 2. Interpretable Rule Sets (Lakkaraju et al.) which proposes Interpretable Decision Sets (IDS) as a framework to optimize for accuracy, interpretability, and class coverage using a novel submodular objective function and optimization algorithm.

Uploaded by

Paul George

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views64 pages

Inherently Interpretable Models 1 of 2

Uploaded by

Paul George

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Project Proposals

▪ Due next Monday (13th Feb) 11.59pm ET

▪ 2 page proposal (more details on “Course

Logistics” document on canvas) + References

▪ Today by 5pm ET, we will post:

▪ Project topics and some concrete problems
▪ Sample proposals and final reports from past iterations
▪ LaTeX and Word templates which you will use to write
the proposal

1
Office Hours and Paper Presentations

▪ Office hours switch this week

▪ Suraj and Jiaqi today
▪ Hima on Thursday

▪ Students signed up for presentations next week

should see us in office hours this week
▪ Full slide deck (ideally!)
▪ An overview of what you plan to present

2
Rule Based Approaches
Agenda

▪ Paper 1: Interpretable Rule Lists (Letham et. al.)

▪ Paper 2: Interpretable Rule Sets (Lakkaraju et. al.)

▪ Discussion

4
Interpretable Classifiers Using
Rules and Bayesian Analysis
Benjamin Letham, Cynthia Rudin, Tyler McCormick, David Madigan; 2015
Contributions

▪ Introducing a generative model called Bayesian

Rule Lists (BRL)
▪ Goal is to output a decision list (if then else-if)

▪ Novel prior structure to encourage sparsity

▪ Predictive accuracy on par with top algorithms

6
Decision List: Example

This is “an” accurate and interpretable decision list – possibly one of

many such lists

7
Introduction: BRL

▪ Produces a posterior distribution over

permutations of if.. then.. Else-if.. rules from a
large set of pre-mined rules

▪ Decision lists with high posterior probability tend

to be both accurate and interpretable
▪ Prior favors concise lists with small number of rules and
fewer terms in left hand side

8
Introduction: BRL

▪ New type of balance between accuracy,

interpretability, and computation

▪ What about using other similar models?

▪ Decision trees (CART)
▪ They employ greedy construction methods
▪ Not particularly computationally demanding but affects
quality of solution – both accuracy and interpretability

9
Pre-mined Rules

▪ A major source of practical feasibility: pre-mined

rules
▪ Reduces model space
▪ Complexity of problem depends on number of pre-
mined rules

▪ As long as pre-mined set is expressive, accurate

decision list can be found + smaller model space
means better generalization (Vapnik, 1995)

10
Pre-mined Rules: Intuition

Minimum Support = 3

This is Apriori algorithm. FP-growth is a single pass algorithm (more efficient)

11
Preliminaries: Notation

▪ Training data:

▪ Two labels: stroke or no stroke

12
Bayesian Decision Lists

where

13
Preliminaries: Multinomial

▪ Sampling from a multinomial:

▪ Parameters are probability values

14
Preliminaries: Dirichlet

▪ Dirichlet: sampling over a probability simplex

▪ E.g., (0.6, 0.4) is a sample from a Dirichlet distribution;

▪ K-dimensional Dirichlet has k parameters – any

positive number
▪ E.g., Dirichlet(60, 40)

15
Preliminaries: Dirichlet Prior

▪ Conjugate prior for multinomial distribution

▪ Conjugate prior: posterior in the same family as

prior

▪ Prior:

▪ Posterior:

16
Bayesian Association Rules

17
Generative Model

Our goal is to sample from the posterior distribution over antecedent lists:

is complete collection of pre-mined antecedents

18
Prior Probabilities

Truncated Poisson:

Ensures that sampled values are within bounds!

Also, ensures expected value is close to

when there are a large number of pre-mined rules

19
Prior Probabilities

Another Truncated Poisson,

is sampled uniformly from available antecedents with

appropriate cardinality.

20
Likelihood

▪ Likelihood is the product of multinomial probability

mass functions for the observed label counts at
each rule

Marginalize over → integrate out the intermediate parameter

21
Markov Chain Monte Carlo

▪ Generate a chain of random samples until

convergence

▪ Each random sample is a stepping stone for the

next one (chain)

▪ New samples do not depend on any samples

before the previous one (Markov)

22
Markov Chain Monte Carlo

▪ How to go to (optimal) d* from current dt

▪ Move an antecedent to a different position in the

list

▪ Add an antecedent that is not currently in the list

▪ Remove an antecedent from the list

23
Metropolis Hastings

▪ Start with a random decision list

▪ Choose a move based on “proposal distribution”

▪ After you choose your move, you compute an

acceptance probability A

▪ Generate a random number u

▪ If u <= A, then accept; otherwise reject

24
Metropolis Hastings

25
Proposal Probabilities

▪ Move chosen uniformly

▪ Which antecedents and their new position is also

chosen uniformly

26
Estimating label of a new observation

Match the antecedent by looking at feature values of new observation

27
Tic-Tac-Toe

5 fold cross validation; accuracy computed across 5 folds

29
Stroke Prediction

▪ N = 12,586, 14% had stroke

▪ 6000 times larger than data for CHADS2 score
▪ Pre-mining: support 10% and max cardinality 2
▪ 5 fold evaluation

30
Stroke Prediction

31
Stroke Prediction - AUC

33
Interpretable Decision Sets
Hima Lakkaraju, Stephen Bach, Jure Leskovec; 2016
Contributions

▪ A framework called Interpretable Decision Sets

(IDS) for classification

▪ Novel objective function + proof of submodularity

▪ Optimization procedure with optimality guarantees

▪ Detailed metrics for evaluating interpretability +

user studies

35
Motivation

▪ Traditional classification models optimize for predictive

accuracy

▪ Very little understanding of the model itself and its

predictions

▪ Model being “readable” is not enough

▪ Humans should be able to reason about predictions

and readily explain the functionality of the model

36
Decision Sets

38
Criteria for Interpretability

▪ Parsimony: Fewer rules with fewer conditions

▪ Cognitive limits of human understanding

▪ Distinctness: Minimal overlap of rules w.r.t the data

points they cover
▪ No redundant and contradicting explanations of data
points

▪ Class Coverage: Explain all the classes in the data

▪ Rules explaining minority classes are important

39
Problem Formulation

40
Desiderata

▪ We need to optimize for the following criteria

▪ Recall
▪ Precision
▪ Distinctness
▪ Parsimony
▪ Class Coverage

▪ Recall and Precision → Accurate predications

▪ Distinctness, Parsimony, and Class Coverage

→Interpretability

41
Objective Function

42
Objective Function

43
Objective Function

44
Objective Function

45
Objective Function

46
Objective Function

47
Submodularity

Diminishing returns characterization

F(A  d) – F(A) ≥ F(B  d) – F(B)

Gain of adding d to a small set Gain of adding d to a large set

B A + d Large improvement

+ d Small improvement

A non-negative linear combination of

submodular functions is submodular
Objective Function

The complete objective is non-negative,

non-normal, non-monotone, submodular

49
Optimizing the Objective

▪ Maximizing a non-monotone submodular function

is NP-hard

▪ Smooth local search [SLS] algorithm provides a

2/5 approximation [Feige, Mirrokni, Vondrak FOCS
07; SIAM Comp. J. 11 ]
▪ Will be at least 2/5 of the optimal solution

50
Submodular Maximization: Local Search

Each node here

corresponds to a
candidate
rule = (pattern, class) tuple

S and S’ correspond to the intermediate solution sets

51
Submodular Maximization: Local Search

S and S’ correspond to the intermediate solution sets

52
Submodular Maximization: Local Search

S and S’ correspond to the intermediate solution sets

53
Submodular Maximization: Local Search

S and S’ correspond to the intermediate solution sets

54
Submodular Maximization: Local Search

S and S’ correspond to the intermediate solution sets

55
Submodular Maximization: Local Search

S and S’ correspond to the intermediate solution sets

56
Submodular Maximization: Local Search

S and S’ correspond to the intermediate solution sets

57
Local Search

▪ ~1/3 approximation
▪ At least 1/3 of optimal solution

▪ we use a slightly different version of this algorithm

▪ Smooth local search
▪ 2/5 approximation

58
Smooth Local Search

Initialization

Marginal gain

If marginal gain > threshold,

Add element

If marginal gain < threshold,

Add element

59
Evaluation: Datasets

Dataset # of Features Classes

datapoints
Bail Outcomes 86K Gender, age, current offense No Risk, Failure to
details, past criminal record Appear, New
Criminal Activity
Student 21K Gender, age, grades, Graduated on Time,
Performance `absence rates & tardiness Delayed Graduation,
behavior through grades 6 to Dropped out
8, suspension/withdrawal
history
Medical Diagnosis 150K Current ailments, age, BMI, Asthma, Diabetes,
gender, smoking habits, Depression, Lung
medical history, family Cancer, Rare Blood
history Cancer

60
Evaluating Predictive Performance

Method AUC AUC AUC

Bail Data Student Medical Data
Data
Our Approach 69.78 75.12 61.19
Bayesian Decision Lists 67.18 72.54 59.18
(Letham et. al.)
Classification Based on 70.68 76.02 63.03
Association (Liu et. al.)
CN2 71.02 76.36 64.78
Decision Trees 70.08 75.31 63.28
Gradient Boosted Trees 71.23 77.18 64.21
Random Forests 70.87 77.12 63.92

62
Evaluating Goodness of Rules

▪ Results on Medical Diagnosis Data

Method Fraction of Fraction of Avg. Rule Num. Fraction
Overlap Data Points Width Rules of
Uncovered Classes
Covered
Our Approach 0.09 0.13 3.17 12 1.0
Bayesian 0.00 0.18 8.46 11 0.67
Decision Lists
(Letham et. al.)
Classification 0.00 0.14 8.60 32 1.00
Based on
Association (Liu
et. al.)
CN2 0.12 0.14 9.78 38 1.00

63
Ablation Study

▪ Results on Medical Diagnosis Data

64
Evaluating Interpretability:
User Study

▪ Compared our interpretable decision sets to

Bayesian Decision Lists (Letham et. al.)

▪ Each user is randomly assigned one of the two

models

▪ 10 objective and 2 descriptive questions per user

65
Interface for Objective Questions

66
Interface for Descriptive Questions

67
User Study Results

Task Metrics Our Bayesian

Approach Decision Lists
Descriptive Human 0.81 0.17
Accuracy
Avg. Time Spent 113.4 396.86
(secs.)
Avg. # of Words 31.11 120.57
Objective Human 0.97 0.82
Accuracy
Avg. Time Spent 28.18 36.34
(secs.)

Objective Questions: 17% more accurate, 22% faster;

Descriptive Questions: 74% fewer words, 71% faster.
68

Fairness Lectures-21
No ratings yet
Fairness Lectures-21
63 pages
Machine Learning Syllabus
No ratings yet
Machine Learning Syllabus
26 pages
DSS07 CLS Rule Induction, K NN, Naive Bayesian en Đã G P
No ratings yet
DSS07 CLS Rule Induction, K NN, Naive Bayesian en Đã G P
507 pages
AI Search & Decision Strategies
No ratings yet
AI Search & Decision Strategies
2 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
DLWSS551 - Knowledge Representation
No ratings yet
DLWSS551 - Knowledge Representation
43 pages
Supp 2
No ratings yet
Supp 2
214 pages
CS6601 Notes
No ratings yet
CS6601 Notes
66 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
IME672 - Lecture 48
No ratings yet
IME672 - Lecture 48
21 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
31 pages
Machine Learning Handbook - Radivojac and White
No ratings yet
Machine Learning Handbook - Radivojac and White
108 pages
Unit 1
No ratings yet
Unit 1
92 pages
281 Cheat Sheet
No ratings yet
281 Cheat Sheet
3 pages
Aiml University Ans Key
No ratings yet
Aiml University Ans Key
6 pages
EDAN96 2024 Last Lecture-1
No ratings yet
EDAN96 2024 Last Lecture-1
78 pages
April May 2024
No ratings yet
April May 2024
17 pages
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
No ratings yet
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
65 pages
Discernibility and Rough Sets
No ratings yet
Discernibility and Rough Sets
239 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Intro:: Part-1: Bayesian Learning
No ratings yet
Intro:: Part-1: Bayesian Learning
6 pages
Lecture 2 3
No ratings yet
Lecture 2 3
72 pages
AI Exam for Computer Science Students
No ratings yet
AI Exam for Computer Science Students
6 pages
Murphy Book Solution
No ratings yet
Murphy Book Solution
100 pages
Pattern Recognition Techniques
No ratings yet
Pattern Recognition Techniques
10 pages
On Evaluating The Quality of Machine Learning Classification Methods
No ratings yet
On Evaluating The Quality of Machine Learning Classification Methods
33 pages
Lecture AI PDF
No ratings yet
Lecture AI PDF
232 pages
JEE Main 2017 Official Question Paper 1 Set C, April 2
No ratings yet
JEE Main 2017 Official Question Paper 1 Set C, April 2
232 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
Unit 2
No ratings yet
Unit 2
88 pages
Ai Project Cycle Short Note
No ratings yet
Ai Project Cycle Short Note
9 pages
Ml2 Script v2
No ratings yet
Ml2 Script v2
123 pages
ECE 368 Course Review: Probabilistic Reasoning 2023
No ratings yet
ECE 368 Course Review: Probabilistic Reasoning 2023
138 pages
Bayesian Optimization for ML Experts
No ratings yet
Bayesian Optimization for ML Experts
84 pages
Tutorial
No ratings yet
Tutorial
81 pages
ML Merged Endsem
No ratings yet
ML Merged Endsem
1,117 pages
ML Merged
No ratings yet
ML Merged
729 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Ia1 ML Scheme Common To Is, Ai, Cs
No ratings yet
Ia1 ML Scheme Common To Is, Ai, Cs
10 pages
Learning Bayesian Networks (Neapolitan, Richard) PDF
100% (1)
Learning Bayesian Networks (Neapolitan, Richard) PDF
704 pages
pr2 Bayes
No ratings yet
pr2 Bayes
44 pages
Notes Machine Learning
No ratings yet
Notes Machine Learning
25 pages
DM See M4
No ratings yet
DM See M4
8 pages
Practical Statistical Relational AI: Pedro Domingos
No ratings yet
Practical Statistical Relational AI: Pedro Domingos
109 pages
Lec 04
No ratings yet
Lec 04
70 pages
Module 2
No ratings yet
Module 2
53 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
Super Cheatsheet Artificial Intelligence
No ratings yet
Super Cheatsheet Artificial Intelligence
18 pages
What Is A Digital Image Understanding Images Computer Vision
No ratings yet
What Is A Digital Image Understanding Images Computer Vision
16 pages
Complex Analysis
No ratings yet
Complex Analysis
2 pages
Explainable AI Question Answers
No ratings yet
Explainable AI Question Answers
36 pages
Explainable AI Introduction 2 of 2
No ratings yet
Explainable AI Introduction 2 of 2
39 pages
Microsoft Dynamics 365 Mock Test
No ratings yet
Microsoft Dynamics 365 Mock Test
40 pages
ML Interpretability for Practitioners
No ratings yet
ML Interpretability for Practitioners
41 pages
Post Hoc Explanations Feature Attributions 3 of 4
No ratings yet
Post Hoc Explanations Feature Attributions 3 of 4
22 pages
Human Factors in Explainability 2 of 2
No ratings yet
Human Factors in Explainability 2 of 2
41 pages
Laurel Creek Watershed Case Study
No ratings yet
Laurel Creek Watershed Case Study
37 pages
8051 Microcontroller
No ratings yet
8051 Microcontroller
8 pages
City of Waterloo - Laurel Creek Watershed Monitoring Program
No ratings yet
City of Waterloo - Laurel Creek Watershed Monitoring Program
15 pages
Class 11 Oscillation Formulas
No ratings yet
Class 11 Oscillation Formulas
1 page
Technical Proposal: Cured-In-Place Pipe Lining
No ratings yet
Technical Proposal: Cured-In-Place Pipe Lining
21 pages
UDFCD Pipe Material Tech Memo 2010
No ratings yet
UDFCD Pipe Material Tech Memo 2010
181 pages
Open Hannel Flow Notes Answers
100% (1)
Open Hannel Flow Notes Answers
39 pages
Physics Project Report On "Logic Gates"
No ratings yet
Physics Project Report On "Logic Gates"
4 pages
The Good Shepherd - Psalm 23
No ratings yet
The Good Shepherd - Psalm 23
10 pages
Class 11 - Important Formulas Chapter 7 - System of Particles and Rotational Motion
No ratings yet
Class 11 - Important Formulas Chapter 7 - System of Particles and Rotational Motion
2 pages
Class 11 - Important Formulas Chapter 6 - Work, Energy and Power
No ratings yet
Class 11 - Important Formulas Chapter 6 - Work, Energy and Power
1 page
Class 11 - Important Formulas Chapter 5 - Laws of Motion
No ratings yet
Class 11 - Important Formulas Chapter 5 - Laws of Motion
1 page
Electric Circuit
No ratings yet
Electric Circuit
60 pages
Soft Drink Analysis Final
100% (2)
Soft Drink Analysis Final
18 pages
Environment Madhav Gadgil Report
No ratings yet
Environment Madhav Gadgil Report
28 pages
Job Interview Questions
No ratings yet
Job Interview Questions
102 pages
Evaluation
No ratings yet
Evaluation
14 pages
ES209 LESSON 3 Random Variable
No ratings yet
ES209 LESSON 3 Random Variable
25 pages
Handout 2-Axiomatic Probability
No ratings yet
Handout 2-Axiomatic Probability
17 pages
LPG Cylinder Filling QRA Study
100% (1)
LPG Cylinder Filling QRA Study
8 pages
Revision Module 1,2,3
No ratings yet
Revision Module 1,2,3
129 pages
Foundation Unit 13 Topic Test
No ratings yet
Foundation Unit 13 Topic Test
22 pages
Chapter 7 Probability
No ratings yet
Chapter 7 Probability
4 pages
Solution To Problem Sheet For Conditional Expectation and Random Walk
No ratings yet
Solution To Problem Sheet For Conditional Expectation and Random Walk
4 pages
A Coin Is Tossed M+N Times, Where M N. The Probability of Getting at Least M Consecutive Heads Is
No ratings yet
A Coin Is Tossed M+N Times, Where M N. The Probability of Getting at Least M Consecutive Heads Is
1 page
Math Project for SPM Students
No ratings yet
Math Project for SPM Students
8 pages
Tutorial 6+
No ratings yet
Tutorial 6+
4 pages
CALQ Online Review Exercise 04
No ratings yet
CALQ Online Review Exercise 04
15 pages
McDougal Littell - Algebra 1 Ch13
No ratings yet
McDougal Littell - Algebra 1 Ch13
68 pages
Security Risk Analysis
No ratings yet
Security Risk Analysis
6 pages
Math 1280 Notes
No ratings yet
Math 1280 Notes
91 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Bunge, M (1995) in Praise of Intolerance To Charlatanism in Academia
100% (1)
Bunge, M (1995) in Praise of Intolerance To Charlatanism in Academia
20 pages
Basic Probability and Events Guide
No ratings yet
Basic Probability and Events Guide
5 pages
Probability Basics for Students
No ratings yet
Probability Basics for Students
22 pages
MQ11 Probability Chapter
No ratings yet
MQ11 Probability Chapter
54 pages
ELEN90054 Probability and Random Models
No ratings yet
ELEN90054 Probability and Random Models
5 pages
Ipc2012 90571
No ratings yet
Ipc2012 90571
7 pages
PROBABILITY
No ratings yet
PROBABILITY
3 pages
Sayan Pan 30
No ratings yet
Sayan Pan 30
8 pages
Concept of Probability
No ratings yet
Concept of Probability
52 pages
Decision Analysis
No ratings yet
Decision Analysis
40 pages
Understanding Risk Uncertainty
No ratings yet
Understanding Risk Uncertainty
8 pages
STa301 Final Term Important Topic A&i
No ratings yet
STa301 Final Term Important Topic A&i
3 pages
m4 PDF
No ratings yet
m4 PDF
23 pages
Wang Et Al (2023) - Time-Dependent Reliability Assessment of A Simply Supported Girder Bridge Based On The Third-Moment Method
No ratings yet
Wang Et Al (2023) - Time-Dependent Reliability Assessment of A Simply Supported Girder Bridge Based On The Third-Moment Method
15 pages