Machine Learning
Kan Ouivirach
Kan Ouivirach
Research & Development
Engineer



www.kanouivirach.com
Outline
• What is Machine Learning?
• Main Types of Learning
• Model Validation, Selection, and Evaluation
• Applied Machine Learning Process
• Cautions
What is Machine Learning?
http://www.bigdata-madesimple.com/
–Arthur Samuel (1959)
“Field of study that gives computers the ability
to learn without being explicitly programmed.”
–Tom Mitchell (1988)
“A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E.”
Statistics vs. Data Mining vs. Machine Learning vs. …?
Programming vs. Machine Learning?
Programming?
“Given a specification of a function f,
implement f that meets the specification.”
Machine Learning?
“Given example (x, y) pairs, induce f such
that y = f(x) for given pairs and generalizes
well for unseen x”
–Peter Norvig (2014)
Why is Machine Learning so hard?
http://veronicaforand.com/
http://www.thinkgeek.com/product/f0ba/
What do you see?
Dog and Cat?
http://thisvsthatshow.com/
Applications of Machine Learning
• Search Engines
• Medical Diagnosis
• Object Recognition
• Stock Market Analysis
• Credit Card Fraud Detection
• Speech Recognition
• etc.
Recommendation System on Amazon.com
Advertisement System on Facebook.com
Speech Recognition from Microsoft
Robot Localization
https://github.com/mjl/particle_filter_demo
Main Types of Learning
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
Supervised Learning
y = f(x)
Given x, y pairs, find a function f that will map
new x to a proper y.
Supervised Learning Problems
• Regression
• Classification
Regression
Linear Regression
y = wx + b
http://thisvsthatshow.com/
Classification
k-Nearest Neighbors
http://bdewilde.github.io/blog/blogger/2012/10/26/classification-of-hand-written-digits-3/
Perceptron
Processor
Input 0
Input 1
Output
One or more inputs, a processor, and a single output
Perceptron
https://datasciencelab.wordpress.com/2014/01/10/machine-learning-classics-the-perceptron/
w0x0 + w1x1
Perceptron
https://datasciencelab.wordpress.com/2014/01/10/machine-learning-classics-the-perceptron/
Probability Theory
https://seisanshi.wordpress.com/tag/probability/
A2A1 A3 An
Ck
. . .
P(Ck | A1, …, An) = P(Ck) * P(A1, …, An | Ck) / P(A1, …, An)
P(Ck | A1, …, An) P(Ck) * Prod P(Ai | C)
with independence assumption, we then have
Naive Bayes
Naive Bayes
No. Content Spam?
1 Party Yes
2 Sale Discount Yes
3 Party Sale Discount Yes
4 Python Party No
5 Python Programming No
Naive Bayes
P(Spam | Party, Programming) = P(Spam) * P(Party | Spam) * P(Programming | Spam)
P(NotSpam | Party, Programming) = P(NotSpam) * P(Party | NotSpam) * P(Programming | NotSpam)
We want to find if “Party Programming” is spam or not?
We need to know
P(Spam), P(NotSpam)
P(Party | Spam), P(Party | NotSpam)
P(Programming | Spam), P(Programming | NotSpam)
Naive Bayes
No. Content Spam?
1 Party Yes
2 Sale Discount Yes
3 Party Sale Discount Yes
4 Python Party No
5 Python Programming No
P(Spam) = ? P(NotSpam) = ?
P(Party | Spam) = ? P(Party | NotSpam) = ?
P(Programming | Spam) = ? P(Programming | NotSpam) = ?
Naive Bayes
No. Content Spam?
1 Party Yes
2 Sale Discount Yes
3 Party Sale Discount Yes
4 Python Party No
5 Python Programming No
P(Spam) = 3/5 P(NotSpam) = 2/5
P(Party | Spam) = 2/3 P(Party | NotSpam) = 1/2
P(Programming | Spam) = 0 P(Programming | NotSpam) = 1/2
Naive Bayes
P(Spam | Party, Programming) = 3/5 * 2/3 * 0 = 0
P(NotSpam | Party, Programming) = 2/5 * 1/2 * 1/2 = 0.1
P(NotSpam | Party, Programming) > P(Spam | Party, Programming)
“Party Programming” is NOT a spam.
Decision Tree
Outlook
Humidity Wind
Sunny
Overcast
Rain
Yes
High Normal Strong Weak
No Yes No Yes
Day Outlook Temp Humidity WInd Play
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Mild High Strong Yes
D4 Rain Cool Normal Strong No
Play tennis?
Support Vector Machines
x
y
Support Vector Machines
x
y
Current Coordinate System
x
z
New Coordinate System
“Kernel Trick”
Support Vector Machines
http://www.mblondel.org/journal/2010/09/19/support-vector-machines-in-python/
3 support vectors
Unsupervised Learning
f(x)
Given x, find a function f that gives a compact
description of x.
Unsupervised Learning
• k-Means Clustering
• Hierarchical Clustering
• Gaussian Mixture Models (GMMs)
k-Means Clustering
http://stackoverflow.com/questions/24645068/k-means-clustering-major-understanding-issue/24645894#24645894
Anomaly Detection
http://modernfarmer.com/2013/11/farm-pop-idioms/
http://boxesandarrows.com/designing-screens-using-cores-and-paths/
Reinforcement Learning
y = f(x)
Given x and z, find a function f that generates y.
z
Flappy Bird Hack using
Reinforcement Learning
http://sarvagyavaish.github.io/FlappyBirdRL/
Machine Learning at Geeky Base
Model Validation
I’ve got a perfect classifiers!
https://500px.com/photo/65907417/like-a-frog-trapped-inside-a-coconut-shell-by-ellena-susanti
http://blog.csdn.net/love_tea_cat/article/details/25972921
Overfitting (High Variance)
Normal fit Overfitting
http://blog.csdn.net/love_tea_cat/article/details/25972921
Underfitting (High Bias)
Normal fit Underfitting
How to Avoid Overfitting and Underfitting
• Using more data does NOT always help.
• Recommend to
• find a good number of features;
• perform cross validation;
• use regularization when overfitting is found.
Model Selection
Model Selection
• Use cross validation to find the best parameters for the
model.
Model Evaluation
Metrics
• Accuracy
• True Positive, False Positive, True Negative, False
Negative
• Precision and Recall
• F1 Score
• etc.
Precision and Recall
http://en.wikipedia.org/wiki/Precision_and_recall
Applied Machine Learning Process
http://machinelearningmastery.com/process-for-working-through-machine-learning-problems/
Define the Problem
https://youmustdesireit.wordpress.com/2014/03/05/developing-and-nurturing-creative-problem-solving/
Prepare Data
http://vpnexpress.net/big-data-use-a-vpn-block-data-collection/
Spot Check Algorithms
https://www.flickr.com/photos/withassociates/4385364607/sizes/l/
If two models fit the data equally well,
choose the simpler one.
Improve Results
http://www.mobilemechanicprosaustin.com/
Present Results
http://www.langevin.com/blog/2013/04/25/5-tips-for-projecting-confidence/presentation-skills-2/
http://newventurist.com/
• Curse of dimensionality
• Correlation does NOT 

imply causation.
• Learn many models, 

not just ONE.
• More data beats 

a cleaver algorithm.
• Data alone are not enough.
A Few Useful Things You Need to Know about Machine Learning, Pedro Domigos (2012)
Some Cautions
— Feature engineering is the key. —
Example of Feature Engineering
Width (m) Length (m) Cost (baht)
100 100 1,200,000
500 50 1,300,000
100 80 1,000,000
400 100 1,500,000
Are the data good to
model the area’s cost?
Size (m x m) Cost (baht)
100,000 1,200,000
25,000 1,300,000
8,000 1,000,000
400,00 1,500,000
Engineer features.
They look better here.
Deep Learning at Microsoft’s Speech Group
Let’s get our hands dirty!
https://github.com/zkan/intro-to-machine-learning

More Related Content

PDF
Machine Learning at Geeky Base 2
PDF
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
PDF
Begin with Machine Learning
PDF
Lecture4 - Machine Learning
PDF
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
PPTX
Machine Learning
PDF
machine_learning.pptx
Machine Learning at Geeky Base 2
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
Begin with Machine Learning
Lecture4 - Machine Learning
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Machine Learning
machine_learning.pptx

Similar to Machine Learning at Geeky Base (20)

PPTX
Introduction to Machine Learning
ODP
Introduction to Machine learning
PPTX
Machine Learning_PPT.pptx
PPTX
Intro to machine learning
PDF
Machine learning
PPTX
introduction to machine learning
PDF
Introduction to Data Science
PPT
Supervised and unsupervised learning
PDF
PWL Seattle #23 - A Few Useful Things to Know About Machine Learning
PDF
Introduction to conventional machine learning techniques
PPTX
Naïve Bayes Classifier Algorithm.pptx
PPTX
Machine learning and types
PDF
Machine learning
PDF
Machine learning
PPT
ai4.ppt
PPTX
Statistical Machine Learning Lecture notes
PDF
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
PPTX
Industrial training machine learning
PPT
Demystifying AI AND ml and its applications
PDF
Choosing a Machine Learning technique to solve your need
Introduction to Machine Learning
Introduction to Machine learning
Machine Learning_PPT.pptx
Intro to machine learning
Machine learning
introduction to machine learning
Introduction to Data Science
Supervised and unsupervised learning
PWL Seattle #23 - A Few Useful Things to Know About Machine Learning
Introduction to conventional machine learning techniques
Naïve Bayes Classifier Algorithm.pptx
Machine learning and types
Machine learning
Machine learning
ai4.ppt
Statistical Machine Learning Lecture notes
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
Industrial training machine learning
Demystifying AI AND ml and its applications
Choosing a Machine Learning technique to solve your need
Ad

More from Kan Ouivirach, Ph.D. (17)

PDF
Adoption of AI: The Great Opportunities for Everyone
PDF
Uncover Python's Potential in Machine Learning
PDF
WordPress Hooks: The Right Way to Extend Your WordPress
PDF
Lesson Learned from Using Docker Swarm at Pronto
PDF
Agile and Scrum Methodology
PDF
Machine Learning คือ? #bcbk
PDF
What We Do in This Weird Office Culture
PDF
The WordPress Way
PDF
Exploring Machine Learning in Python with Scikit-Learn
PDF
Thailand Hadoop Big Data Challenge #1
PDF
Achieving "Zero Downtime Deployment" with Automated Testing
PDF
Pronto R&D Presentation
PDF
Scrum at Pronto Marketing
PDF
Practical Experience in Automated Testing at Pronto Marketing
PDF
Extracting the Object from the Shadows: Maximum Likelihood Object/Shadow Disc...
PDF
Clustering Human Behaviors with Dynamic Time Warping and Hidden Markov Models...
PDF
Adapting Scrum to Managing a Research Group
Adoption of AI: The Great Opportunities for Everyone
Uncover Python's Potential in Machine Learning
WordPress Hooks: The Right Way to Extend Your WordPress
Lesson Learned from Using Docker Swarm at Pronto
Agile and Scrum Methodology
Machine Learning คือ? #bcbk
What We Do in This Weird Office Culture
The WordPress Way
Exploring Machine Learning in Python with Scikit-Learn
Thailand Hadoop Big Data Challenge #1
Achieving "Zero Downtime Deployment" with Automated Testing
Pronto R&D Presentation
Scrum at Pronto Marketing
Practical Experience in Automated Testing at Pronto Marketing
Extracting the Object from the Shadows: Maximum Likelihood Object/Shadow Disc...
Clustering Human Behaviors with Dynamic Time Warping and Hidden Markov Models...
Adapting Scrum to Managing a Research Group
Ad

Recently uploaded (20)

PPTX
4. Sustainability.pptxxxxxxxxxxxxxxxxxxx
PPTX
Understanding AI: Basics on Artificial Intelligence and Machine Learning
PPT
Drug treatment of Malbbbbbhhbbbbhharia.ppt
PPTX
Chapter_5_ network layer control plan v8.2.pptx
PDF
The-Physical-Self.pdf college students1-4
PPTX
logistic__regression_for_beginners_.pptx
PPTX
cardiac failure and associated notes.pptx
PPTX
Chapter_4_ network layer , data planv8.2.pptx
PDF
American Journal of Multidisciplinary Research and Review
PPTX
1.Introduction to orthodonti hhhgghhcs.pptx
PDF
Lesson 1 - intro Cybersecurity and Cybercrime.pptx.pdf
PPTX
REAL of PPT_P1_5019211081 (1).pdf_20250718_084609_0000.pptx
PPTX
Dkdkskakkakakakskskdjddidiiffiiddakaka.pptx
PPTX
An Introduction to Lean Six Sigma for Bilginer
PDF
TenneT-Integrated-Annual-Report-2018.pdf
PPT
genetics-16bbbbbbhhbbbjjjjjjjjffggg11-.ppt
PDF
MISO Deep-NARX Forecasting for Energy and Electricity Demand/Price Data
PPTX
Evaluasi program Bhs Inggris th 2023-2024 dan prog th 2024-2025-1.pptx
PPTX
PSU research training.pptxPSU research training.pptx
PDF
Stochastic Programming problem presentationLuedtke.pdf
4. Sustainability.pptxxxxxxxxxxxxxxxxxxx
Understanding AI: Basics on Artificial Intelligence and Machine Learning
Drug treatment of Malbbbbbhhbbbbhharia.ppt
Chapter_5_ network layer control plan v8.2.pptx
The-Physical-Self.pdf college students1-4
logistic__regression_for_beginners_.pptx
cardiac failure and associated notes.pptx
Chapter_4_ network layer , data planv8.2.pptx
American Journal of Multidisciplinary Research and Review
1.Introduction to orthodonti hhhgghhcs.pptx
Lesson 1 - intro Cybersecurity and Cybercrime.pptx.pdf
REAL of PPT_P1_5019211081 (1).pdf_20250718_084609_0000.pptx
Dkdkskakkakakakskskdjddidiiffiiddakaka.pptx
An Introduction to Lean Six Sigma for Bilginer
TenneT-Integrated-Annual-Report-2018.pdf
genetics-16bbbbbbhhbbbjjjjjjjjffggg11-.ppt
MISO Deep-NARX Forecasting for Energy and Electricity Demand/Price Data
Evaluasi program Bhs Inggris th 2023-2024 dan prog th 2024-2025-1.pptx
PSU research training.pptxPSU research training.pptx
Stochastic Programming problem presentationLuedtke.pdf

Machine Learning at Geeky Base