0% found this document useful (0 votes)
33 views26 pages

FoDS - L2

Uploaded by

f20221525
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views26 pages

FoDS - L2

Uploaded by

f20221525
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Foundations of Data Science –

L2
Prof.N.L.Bhanu Murthy
BITS Pilani
Hyderabad Campus
Data Science Pipeline

BITS Pilani, Hyderabad Campus


What is Learning?

“Gain knowledge or understanding of or skill in by


study, instruction or experience” - Webster

BITS Pilani, Hyderabad Campus


What is Learning?

“Learning is any process by which a system improves


performance from experience.” - Herbert Simon
Researcher in Professor @
Artificial Intelligence Carnegie Mellon University
Cognitive psychology University of California,
Computer science Berkeley
Economics Illinois Institute of Technology
Political science

Awards:
Turing Award, 1975
Nobel Prize in Economics1978
National Medal of Science1986 1916 - 2001
von Neumann Theory Prize1988

BITS Pilani, Hyderabad Campus


What is Machine Learning?

Machine Learning is study of


algorithms that
improve their performance P
at some task T
with experience E
Tom Mitchell (1990)

Well-defined learning task: <P,T,E>

BITS Pilani, Hyderabad Campus


Example - Machine Learning
Handwritten Digit Recognition
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words

BITS Pilani, Hyderabad Campus


BITS Pilani, Hyderabad Campus
Example - Machine Learning
T: Driving on four-lane highways using vision sensors
P: Average distance traveled before a human-judged error
E: A sequence of images and steering commands recorded while
observing a human driver.

BITS Pilani, Hyderabad Campus


Example - Machine Learning

BITS Pilani, Hyderabad Campus


Example - Machine Learning

BITS Pilani, Hyderabad Campus


Example - Machine Learning

Learning to drive an autonomous


vehicle (Pomerleau, 1989).

BITS Pilani, Hyderabad Campus


Example - Machine Learning

BITS Pilani, Hyderabad Campus


Example - Machine Learning

BITS Pilani, Hyderabad Campus


Examples of Successful Applications of Machine Learning
 Learning to recognize spoken words
(Lee, 1989; Waibel, 1989).

 Learning to classify new astronomical structures


 (Fayyad et al., 1995).

 Learning to play world-class backgammon


(Tesauro 1992, 1995).

 Categorize email messages as spam or legitimate.

BITS Pilani, Hyderabad Campus


Machine Learning - Examples

Employability Prediction
 CGPA
 Communication Skills
Features / Attributes / Predictors  Aptitude
 Programming Skills

S.No. CGPA Communication Aptitude Programming Job Offered?


Skills Skills
1 9.1 Average Good Excellent Yes
2 8.4 Good Good Good Yes
3 8.3 Poor Average Average No
4 7.1 Average Good Average No
5 8.2 Good Excellent Excellent No

BITS Pilani, Hyderabad Campus


Machine Learning - Examples

Predicting price of a used car


 Brand
 Year (Mfg)
 Engine Capacity
Features / Attributes / Predictors
 Mileage
 Distance travelled
 Cab?

S.No Brand Year Engine Mileage Distance Cab? Price


(Mfg) Capacity travelled (in Rs.)
1. Honda City ZX 2008 1100 10.5 45000 N 3,50,000
2
3
4

BITS Pilani, Hyderabad Campus


Machine Learning - Examples
Market Segmentation Study
Features / Attributes / PredictorsCustomers for a retailer may fall into
 two groups say big spenders and
Family income
 # of visits in a month low spenders

 Average money spent in a month three groups say big spenders,
 Zip code medium spenders and low spenders
Four groups, ….
S.N Zip Family # of visits in a Average Money Spent in a
o. Code Income month month
1 500078 11,50,000 4 8,000

BITS Pilani, Hyderabad Campus


Supervised Learning
Feature tuple: (CGPA, Communication Skills, Aptitude, Programming
Skills)
Response / Target:
Supervised JobFit
Learning: Offered
a model that relates response to the feature
tuples, with the aim of accurately predicting the response for future
observation or better understanding the relationship between response
and features.
S.No. CGP Communication Aptitude Programming Job Offered?
A Skills Skills
1 9.1 Average Good Excellent Yes
2 8.4 Good Good Good Yes
3 8.3 Poor Average Average No
4 7.1 Average Good Average No
5 8.2 Good Excellent Excellent No

BITS Pilani, Hyderabad Campus


Unsupervised Learning
Feature tuple: (Zip Code, Family Income, # of visits in a month,
Average Money spent in a month)
Response / Target: None
Unsupervised Learning: To discover groups of similar examples
within the data set

S.No. Zip Code Family # of visits in Average Money Spent in a


Income a month month
1 500078 11,50,000 4 8,000

BITS Pilani, Hyderabad Campus


Supervised Learning

Features
Employability  CGPA Response / Target
Prediction  Communication Skills  Job Offered?
 Aptitude
 Programming Skills

Features
 Brand
 Response / Target
Predicting price Year (Mfg)

 Engine Capacity Price (in Rs.)
of a used car
 Mileage
 Distance travelled
 Cab?

BITS Pilani, Hyderabad Campus


Classification & Regression
Classification problems are supervised Learning
problems where target/response variables take only
discrete (finite/countable) values.
Example: Employability prediction

Regression problems are supervised learning problems


where target / response is a continuous variable (or
equivalently can take any real number).
Example: Predicting price of a used car

BITS Pilani, Hyderabad Campus


Classification & Regression – Examples
Classification
 Predicting whether a patient has a particular disease or not.

 Hand written digit recognition

 Email spam detection

Regression
 Predicting house/property price

 Predicting stock market price

 Predicting sales of a product

BITS Pilani, Hyderabad Campus


Probability Foundations to Data Science
“Probability is a mathematical tool to model uncertainty”

 Frequentist vs Bayesian perspective of Probability

 Probability distributions – Gaussian, Beta, Bernoulli, and Dirichlet

 Maximum likelihood and Bayeisan Inference of Gaussian Distribution

 Probabilistic perspective of Polynomial Curve Fitting

 Bayesian Curve Fitting

 Mixture of Guassians and Probability Bounds

 Nonparametric Methods - Kernel density estimators, Nearest-neighbour


methods

BITS Pilani, Hyderabad Campus


Decision & Information Theory Foundations

 Minimizing Misclassification rate & expected loss

 Inference and decision

 Loss functions for regression

 Relative Entropy and Mutual Information

BITS Pilani, Hyderabad Campus


Computational Foundations to Data Science
 Unconstrained/Constrained optimization

 Equality/inequality constraints

 Convex optimization

 Lagrange multiplier

 Primal/dual concept

 Quadratic programming

 Kernel Machines for Regression

BITS Pilani, Hyderabad Campus


Thank You!!

BITS Pilani, Hyderabad Campus

You might also like