0% found this document useful (0 votes)
14 views

intro

Uploaded by

GregMG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

intro

Uploaded by

GregMG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Classification:

A machine learning perspective


Emily Fox & Carlos Guestrin
Machine Learning Specialization
University of Washington
©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Part of a specialization

©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


This course is a part of the
Machine Learning Specialization

1. Foundations

4. Clustering 5. Recommender
2. Regression 3. Classification
& Retrieval Systems

6. Capstone

3 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


What is the course about?

©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


What is classification?
From features to predictions

ML
Data Classifier Intelligence
Method

Input x:
features derived Learn xày
from data
relationship Predict y:
categorical “output”,
class or label
5 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Sentiment classifier
Input x: Easily best sushi in Seattle.

Sentence Sentiment
Classifier

Output: y
Sentiment

6 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Classifier

Sentence
Classifier
from
review MODEL
Output: y
Input: x Predicted
class

7 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization


Example multiclass classifier
Output y has more than 2 categories

Education

Finance

Technology

Input: x Output: y
Webpage
8 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization
Spam filtering
Not spam

Spam

Input: x Output: y
Text of email,
9
sender, IP,… ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization
Image classification

Input: x Output: y
Image pixels Predicted object
10 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization
Personalized medical diagnosis
Input: x Output: y
Healthy
Disease Cold
Classifier Flu
MODEL Pneumonia

11 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization


Reading your mind
Inputs x are
brain region Output y
intensities
“Hammer”

“House”
12 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Impact of classification

©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Impact of classification

14 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Course overview

©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Course philosophy: Always use case studies & …

Core
Visual Algorithm
concept

Advanced
Practical Implement
topics

I O N A L
OPT
16 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Overview of content

Models Algorithms Core ML


Linear Alleviating
Gradient
classifiers overfitting

Logistic Stochastic Handling


regression gradient missing data

Decision Recursive Precision-


trees greedy recall

Online
Ensembles Boosting
learning

17 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Course outline

©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Overview of modules

Models Algorithms Core ML


Alleviating
Linear classifiers Gradient
overfitting
Module 1 Modules 2 & 3
Modules 3 & 5

Handling missing
Logistic regression Stochastic gradient
data
Modules 1, 2, 3 Module 9
Module 6

Decision trees Recursive greedy Precision-recall


Modules 4 & 5 Module 4 Module 8

Ensembles Boosting Online learning


Module 7 Module 8 Module 9

19 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Module 1: Linear classifiers
Word Coefficient
#awesome 1.0
#awful -1.5
Score(x) = 1.0 #awesome – 1.5 #awful
#awful

Score(x) < 0

0
Score(x) > 0
0 1 2 3 4 …
#awesome
20 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 1: Logistic regression represents probabilities

P(y=+1|x,ŵ) = 1 .

1 + e-ŵ h(x)
T

21 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Module 2: Learning “best” classifier
Maximize likelihood over all possible w0,w1,w2

ℓ(w0=0, w1=1, w2=-1.5) = 10-6


#awful

ℓ(w0=1, w1=1, w2=-1.5) = 10-5

… Best model with


4 gradient ascent:
3 Highest likelihood ℓ(w)
2 ŵ = (w0=1, w1=0.5, w2=-1.5)
1
ℓ(w0=1, w1=0.5, w2=-1.5) = 10-4
0
0 1 2 3 4 …
#awesome
23 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 3: Overfitting & regularization
True error
Classification
error

Training error

Model complexity

Use regularization penalty 2


to mitigate overfitting
ℓ(w)
(w) - λ||w||2
25 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 4: Decision trees
Start

excellent poor
Credit?

fair
Income?
Safe Term?
high Low
3 years 5 years

Risky Safe Term? Risky

3 years 5 years

Risky Safe

26 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Module 5: Overfitting in decision trees
Decision Tree
Depth 1 Depth 3 Depth 10

Logistic Regression
Degree 1 features Degree 2 features Degree 6 features

27 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Module 5: Alleviate overfitting by learning simpler trees
Occam’s Razor: “Among competing hypotheses,
the one with fewest assumptions should be
selected”, William of Occam, 13th Century

Complex Tree Simpler Tree

Simplify

28 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Module 6: Handling missing data
Start

Credit Term Income y


excellent poor
excellent 3 yrs high safe Credit?

fair ? low risky fair


or unknown
fair 3 yrs high safe Income?
Safe Term?
poor 5 yrs high risky high Low
3 years 5 years or unknown
excellent 3 yrs low risky or unknown
fair 5 yrs high safe Risky Safe Term? Risky

poor ? high risky 3 years 5 years


or unknown
poor 5 yrs low safe
fair ? high safe Risky Safe

30 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Module 7: Boosting question
“Can a set of weak learners be combined to
create a stronger learner?” Kearns and Valiant (1988)

Yes! Schapire (1990)

Boosting

Amazing impact: Ÿ simple approach Ÿ widely used in


industry Ÿ wins most Kaggle competitions
32 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 7: Boosting using AdaBoost
Income>$100K? Credit history? Savings>$100K? Market conditions?

Yes No Bad Good Yes No Bad Good


Safe Risky Risky Safe Safe Risky Risky Safe

f1(xi) = +1 f2(xi) = -1 f3(xi) = -1 f4(xi) = +1

Ensemble: Combine votes from many simple


classifiers to learn complex classifiers

33 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Module 8: Precision-recall
Goal: increase
# guests by 30%

Need an automated,
“authentic”
Reviews marketing campaign

Great quotes Spokespeople


“Easily best sushi in Seattle.”

Accuracy not most important metric

PRECISION RECALL
Did I (mistakenly) show a Did I not show a (great)
negative sentence??? positive sentence???
34 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 9: Scaling to huge datasets & online learning

4.8B webpages 500M Tweets/day 5B views/day

Stochastic gradient: tiny modification to gradient,


a lot faster, but annoying in practice
Avg. log likelihood

Gradient
Better

35 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Assumed background

©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Courses 1 & 2 in this ML Specialization
• Course 1: Foundations
- Overview of ML case studies
- Black-box view of ML tasks
- Programming & data
manipulation skills

• Course 2: Regression
- Data representation (input, output, features)
- Linear regression model
- Basic ML concepts:
• ML algorithm
• Gradient descent
• Overfitting
• Validation set and cross-validation
• Bias-variance tradeoff
• Regularization

37 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Math background
• Basic calculus
- Concept of derivatives
• Basic vectors
• Basic functions
- Exponentiation ex
- Logarithm

38 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Programming experience
• Basic Python used
- Can pick up along the way if
knowledge of other language

39 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Reliance on GraphLab Create
• SFrames will be used, though not required
- open source project of Dato
(creators of GraphLab Create)
- can use pandas and numpy instead
• Assignments will:
1. Use GraphLab Create to
explore high-level concepts
2. Ask you to implement
all algorithms without GraphLab Create
• Net result:
- learn how to code methods in Python
40 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Computing needs
• Basic 64-bit desktop or laptop
• Access to internet
• Ability to:
- Install and run Python (and GraphLab Create)
- Store a few GB of data

41 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization


Let’s get started!

©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

You might also like