Foundations of Data Science –
L2
Prof.N.L.Bhanu Murthy
BITS Pilani
Hyderabad Campus
Data Science Pipeline
BITS Pilani, Hyderabad Campus
What is Learning?
“Gain knowledge or understanding of or skill in by
study, instruction or experience” - Webster
BITS Pilani, Hyderabad Campus
What is Learning?
“Learning is any process by which a system improves
performance from experience.” - Herbert Simon
Researcher in Professor @
Artificial Intelligence Carnegie Mellon University
Cognitive psychology University of California,
Computer science Berkeley
Economics Illinois Institute of Technology
Political science
Awards:
Turing Award, 1975
Nobel Prize in Economics1978
National Medal of Science1986 1916 - 2001
von Neumann Theory Prize1988
BITS Pilani, Hyderabad Campus
What is Machine Learning?
Machine Learning is study of
algorithms that
improve their performance P
at some task T
with experience E
Tom Mitchell (1990)
Well-defined learning task: <P,T,E>
BITS Pilani, Hyderabad Campus
Example - Machine Learning
Handwritten Digit Recognition
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words
BITS Pilani, Hyderabad Campus
BITS Pilani, Hyderabad Campus
Example - Machine Learning
T: Driving on four-lane highways using vision sensors
P: Average distance traveled before a human-judged error
E: A sequence of images and steering commands recorded while
observing a human driver.
BITS Pilani, Hyderabad Campus
Example - Machine Learning
BITS Pilani, Hyderabad Campus
Example - Machine Learning
BITS Pilani, Hyderabad Campus
Example - Machine Learning
Learning to drive an autonomous
vehicle (Pomerleau, 1989).
BITS Pilani, Hyderabad Campus
Example - Machine Learning
BITS Pilani, Hyderabad Campus
Example - Machine Learning
BITS Pilani, Hyderabad Campus
Examples of Successful Applications of Machine Learning
Learning to recognize spoken words
(Lee, 1989; Waibel, 1989).
Learning to classify new astronomical structures
(Fayyad et al., 1995).
Learning to play world-class backgammon
(Tesauro 1992, 1995).
Categorize email messages as spam or legitimate.
BITS Pilani, Hyderabad Campus
Machine Learning - Examples
Employability Prediction
CGPA
Communication Skills
Features / Attributes / Predictors Aptitude
Programming Skills
S.No. CGPA Communication Aptitude Programming Job Offered?
Skills Skills
1 9.1 Average Good Excellent Yes
2 8.4 Good Good Good Yes
3 8.3 Poor Average Average No
4 7.1 Average Good Average No
5 8.2 Good Excellent Excellent No
BITS Pilani, Hyderabad Campus
Machine Learning - Examples
Predicting price of a used car
Brand
Year (Mfg)
Engine Capacity
Features / Attributes / Predictors
Mileage
Distance travelled
Cab?
S.No Brand Year Engine Mileage Distance Cab? Price
(Mfg) Capacity travelled (in Rs.)
1. Honda City ZX 2008 1100 10.5 45000 N 3,50,000
2
3
4
BITS Pilani, Hyderabad Campus
Machine Learning - Examples
Market Segmentation Study
Features / Attributes / PredictorsCustomers for a retailer may fall into
two groups say big spenders and
Family income
# of visits in a month low spenders
Average money spent in a month three groups say big spenders,
Zip code medium spenders and low spenders
Four groups, ….
S.N Zip Family # of visits in a Average Money Spent in a
o. Code Income month month
1 500078 11,50,000 4 8,000
BITS Pilani, Hyderabad Campus
Supervised Learning
Feature tuple: (CGPA, Communication Skills, Aptitude, Programming
Skills)
Response / Target:
Supervised JobFit
Learning: Offered
a model that relates response to the feature
tuples, with the aim of accurately predicting the response for future
observation or better understanding the relationship between response
and features.
S.No. CGP Communication Aptitude Programming Job Offered?
A Skills Skills
1 9.1 Average Good Excellent Yes
2 8.4 Good Good Good Yes
3 8.3 Poor Average Average No
4 7.1 Average Good Average No
5 8.2 Good Excellent Excellent No
BITS Pilani, Hyderabad Campus
Unsupervised Learning
Feature tuple: (Zip Code, Family Income, # of visits in a month,
Average Money spent in a month)
Response / Target: None
Unsupervised Learning: To discover groups of similar examples
within the data set
S.No. Zip Code Family # of visits in Average Money Spent in a
Income a month month
1 500078 11,50,000 4 8,000
BITS Pilani, Hyderabad Campus
Supervised Learning
Features
Employability CGPA Response / Target
Prediction Communication Skills Job Offered?
Aptitude
Programming Skills
Features
Brand
Response / Target
Predicting price Year (Mfg)
Engine Capacity Price (in Rs.)
of a used car
Mileage
Distance travelled
Cab?
BITS Pilani, Hyderabad Campus
Classification & Regression
Classification problems are supervised Learning
problems where target/response variables take only
discrete (finite/countable) values.
Example: Employability prediction
Regression problems are supervised learning problems
where target / response is a continuous variable (or
equivalently can take any real number).
Example: Predicting price of a used car
BITS Pilani, Hyderabad Campus
Classification & Regression – Examples
Classification
Predicting whether a patient has a particular disease or not.
Hand written digit recognition
Email spam detection
Regression
Predicting house/property price
Predicting stock market price
Predicting sales of a product
BITS Pilani, Hyderabad Campus
Probability Foundations to Data Science
“Probability is a mathematical tool to model uncertainty”
Frequentist vs Bayesian perspective of Probability
Probability distributions – Gaussian, Beta, Bernoulli, and Dirichlet
Maximum likelihood and Bayeisan Inference of Gaussian Distribution
Probabilistic perspective of Polynomial Curve Fitting
Bayesian Curve Fitting
Mixture of Guassians and Probability Bounds
Nonparametric Methods - Kernel density estimators, Nearest-neighbour
methods
BITS Pilani, Hyderabad Campus
Decision & Information Theory Foundations
Minimizing Misclassification rate & expected loss
Inference and decision
Loss functions for regression
Relative Entropy and Mutual Information
BITS Pilani, Hyderabad Campus
Computational Foundations to Data Science
Unconstrained/Constrained optimization
Equality/inequality constraints
Convex optimization
Lagrange multiplier
Primal/dual concept
Quadratic programming
Kernel Machines for Regression
BITS Pilani, Hyderabad Campus
Thank You!!
BITS Pilani, Hyderabad Campus