Prerequisites
Jupyter Notebook
◦Used for data cleaning and transformation, numerical
simulation, statistical modeling, data visualization, machine
learning, and much more.
◦Easy to tinker with code and execute it in steps
◦Run in local browser: [Link]
◦Local installation: [Link]
◦Repository of Jupyter Notebooks:
[Link]
Colab
Google’s flavor of Jupyter notebooks tailored for machine learning
and data analysis
Runs entirely in the cloud.
free access to hardware accelerators like GPUs and TPUs (with some
restrictions).
[Link]
Popular Datasets
UC Irvine Machine Learning Repository: [Link]
Kaggle datasets: [Link]
Amazon’s AWS datasets: [Link]
Install a virtual environment via Anaconda
Free Anaconda Python distribution:
[Link]
Download Python 3.* version
Prerequisites
Python and several scientific libraries
Python
◦high-level, dynamically typed multiparadigm programming
language.
◦almost like pseudocode, very readable
◦Platform: Linux/Windows/MacOS
◦Documentation: [Link]
◦Interactive Python tutorial: [Link]
◦Stand alone installation: (preferably 3.7 or higher)
[Link]
scikit-learn
Open source library that supports supervised and unsupervised
learning.
Various model fitting, data preprocessing, model selection and
evaluation tools, and other utilities.
Platform: Linux/Windows/MacOS
User guide: [Link]
Stand alone installation: [Link]
[Link]/stable/[Link]
Prerequisites
Important libraries
Numpy
◦Scientific computing library with Python
◦Provides high-performance multidimensional array (e.g.,
vector, matrix) and basic tools to compute with and
manipulate these arrays, linear algebra, random number
generators
◦[Link]
◦[Link]
SciPy
Builds on numpy, provides functions that operate on numpy arrays
Useful for different types of scientific and engineering applications,
integration, image processing, …
[Link]
Standalone installation: [Link]
Matplotlib
Plotting library
[Link]
[Link]
[Link]
ML is like gardening
Seeds – Algorithms
Nutrients – Data
Gardener – you
Plants – Programs / ML Models
A Few Quotes
“A breakthrough in machine learning would be worth
ten Microsofts” (Bill Gates, Chairman, Microsoft)
“Machine learning is the next Internet”
(Tony Tether, Director, DARPA)
“Web rankings today are mostly a matter of machine learning” (Prabhakar
Raghavan, Dir. Research, Yahoo)
“Machine learning is going to result in a real revolution” (Greg Papadopoulos, CTO,
Sun)
“Machine learning is today’s discontinuity”
(Jerry Yang, CEO, Yahoo)
Traditional Programming
Data Output
Computer
Program
Machine Learning
Data Computer Program
Output
What is Machine Learning?
DEFINITION BY TOM MITCHELL (1998):
"A COMPUTER PROGRAM IS SAID TO LEARN FROM EXPERIENCE E WITH
RESPECT TO SOME CLASS OF TASKS T AND PERFORMANCE MEASURE P, IF I TS
PERFORMANCE AT TASKS IN T, AS MEASURED BY P, IMPROVES WITH
EXPERIENCE E.” EXAMPLE: PLAYING CHECKERS.
E = TH E EXP ER IENC E O F P LAYI NG M A NY G A M ES O F C H EC KER S
T = T H E TA S K O F P LAY I NG C H E C K E R S.
P = T H E P R O BA BILITY T H AT T H E P R OGRA M W I LL W I N T H E NE X T G A M E .
13 July 2025
What is Machine Learning?
To have a learning problem, we must identify
❖The class of tasks
❖The measure of performance to be improved
❖Source of experience
13 July 2025
Example of Learning Problems
13 July 2025
A Checker Learning Problem
Task T: Playing Checkers
Performance Measure P: Percent of games won against opponents
Training Experience E: To be selected ==> Games Played against itself
13 July 2025
A handwriting recognition learning problem
Task T: recognizing and classifying handwritten words within images
Performance measure P: percent of words correctly classified
Training Experience E: a database of handwritten words with given
classifications
13 July 2025
A robot driving learning problem
Task T: driving on public four-lane highways using vision sensors
Performance measure P: average distance travelled before an error (as judged by
human)
Training experience E: a sequence of images and steering commands recorded
while observing a human driver
13 July 2025
Where does ML fit in?
Slide credit: Dhruv Batra, Fei Sha
Why is Machine Learning Important?
Some tasks cannot be defined well, except by examples.
Relationships and correlations can be hidden within large amounts of data.
Machine Learning may be able to find these relationships.
Human designers often produce machines that do not work as well as desired
in the environments in which they are used.
13 July 2025
Why is Machine Learning Important ?
The amount of knowledge available about certain tasks might be too large for
explicit encoding by humans (e.g., medical diagnostic).
New knowledge about tasks is constantly being discovered by humans. It may
be difficult to continuously re-design systems “by hand”.
13 July 2025
When Do We Use Machine Learning?
ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)
Learning isn’t always useful:
• There is no need to “learn” to calculate payroll
Slide Credit: Eric Eaton
Application Domains
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Language Processing
Many more emerging…
Slide credit: Pedro Domingos
State of the Art Applications of
Machine Learning
.
Application Types
– Medical diagnosis
– Credit card applications or transactions
– Fraud detection in e-commerce
– Worm detection in network packets
– Spam filtering in email
– Recommended articles in a newspaper
– Recommended books, movies, music, or jokes
– Financial investments
– DNA sequences
– Spoken words
– Handwritten letters
– Astronomical images
Slide credit: Ray Mooney
Pattern recognition
It is very hard to say what makes a 2
Slide credit: Geoffrey Hinton
Autonomous Cars
• Nevada made it legal for
autonomous cars to drive
on roads in June 2011
• As of 2013, four states
(Nevada, Florida, California,
and Michigan) have legalized
autonomous cars
UPenn’s Autonomous Car →
Slide credit: Eric Eaton
Automatic Speech Recognition
ML used to predict phoneme states from sound
spectrogram Deep Learning Based Results
# Hidden Layers 1 2 4 8 10 12
Word Error Rate % 16.0 12.8 11.4 10.9 11.0 11.1
Baseline Gaussian Mixture Model based word
error rate = 15.4%
[Zeiler et al. “On rectified linear units for speech
recognition” ICASSP 2013]
Slide credit: Eric Eaton
Robotics
TYPES OF LEARNING
Case study 1(House Prices)
Price of the house Area Age of the house Floor
80 Lakhs 1800 SFT 5 years 2nd
60 Lakhs 1900 SFT 10 years 3rd
85 Lakhs 1750 SFT 2 years 2nd
40 Lakhs 1500 SFT 15 Years 3rd
If area = 1700 SFT , Age = 8 years ,Floor = 4th
then what is the expected price
Case study 2(Loan data)
Amount taken Period Credit Score Defaulter
40 lakhs 5 years 1000 No
10 Lakhs 5 months 550 YES
80 Lakhs 3 years 950 No
20 Lakhs 4 years 1500 No
If Amount = 50 Lakhs, Period = 2 years, Credit
Score = 800 then
Case study 3(COVID Data)
Cough With Chronic Fever Travel History COVIDer
Disease or Not
YES NO YES USA YES
NO NO YES Delhi NO
YES YES YES NO YES
NO YES YES NO No
If Cough = Yes, Disease = NO, Fever = No ,Travel
to OOHAN then
Case study 4(CRICKET DATA)
Player Number of runs Number of
Wickets
A 1500 0
B 100 40
C 50 20
D 100 45
Categorization of player as Batsman /Bowler /
All rounder
Case study 5(Shopping Data)
Customer ID BREAD BUTTER COOL DRINK
110 YES YES NO
112 NO YES YES
113 YES YES YES
114 YES NO YES
Types of Learning
• Supervised (inductive) learning
– Given: training data, desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Given: rewards from sequence of actions
Slide Credit: Eric Eaton
Supervised Learning: Regression
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f(x) to predict y given x
9
September Arctic Sea Ice Extent 8
(1,000,000 sq km) 7
6
5
4
3
2
1
0
1970 1980 1990 2000 2010 2020
Year
Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013) Slide Credit: Eric Eaton
Regression
y = f(x)
output prediction features
function
• Training: given a training set of labeled examples {(x1,y1), …,
(xN,yN)}, estimate the prediction function f by minimizing the
prediction error on the training set
• Testing: apply f to a never before seen test example x and output
the predicted value y = f(x)
Slide credit: L. Lazebnik
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor Size
Predict Benign Predict
Malignant
Tumor Size
Slide Credit: Eric Eaton
Self Check
Facebook Face recognition
Netflix movie
recommendation
Fraud detection
Classification
Training
Training Labels
Training
Images
Learned
Features Training model
Testing
Learned
Features Prediction
model
Test
Image
Slide credit: D. Hoiem and L. Lazebnik
Unsupervised Learning
• Given x1, x2, ..., xn (without labels)
• Output hidden structure behind the x’s
– E.g., clustering
Slide Credit: Eric Eaton
Open source ML programming tools
Algorithms or
Platform
Features
Scikit Learn Linux, Mac OS, Python, C, C++ Classification, Regression, Clustering
Windows Preprocessing, Model Selection
Dimensionality reduction.
PyTorch Linux, Mac OS, Python, C++ Autograd Module, Optim Module
Windows NN Module
TensorFlow Linux, Mac OS, Python, C++ Provides a library for dataflow programming.
Windows
Weka Linux, Mac OS, Java Data preparation, Classification
Windows Regression, Clustering, Visualization
Association rules mining
Open source ML programming tools
Colab Cloud Service - Supports libraries of PyTorch, Keras, TensorFlow, and
OpenCV
Apache Mahout Cross-platform Java Preprocessors,Regression
Scala Clustering, Recommenders
Distributed Linear Algebra.
[Link] Cross-platform C# Classification, Regression, Distribution
Clustering, Hypothesis Tests &
Kernel Methods, Image, Audio & Signal. & Vision
Shogun Windows C++ Regression, Classification, Clustering
Linux, UNIX Support vector machines. Dimensionality reduction,
Mac OS Online learning etc.
[Link] Cross-platform Python API for neural networks
Design a Learning System
Designing a Learning System
• Choose the training experience(data)
• Choose exactly what is to be learned
– i.e. the target function
• Choose how to represent the target function
• Choose a learning algorithm to infer the target
function from the experience
Training Learner
data
Environment/
Knowledge
Experience
Testing
Performance
data
Element
Slide Credit: Ray Mooney
Designing a Learning System: An Example
1. Problem Description (Ex: Playing checkers)
2. Choosing the Training Experience (data expressed as features)
3. Choosing the Target Function to be learnt (Ex: deciding next board position
4. Choosing a Representation for the Target Function (design a function as linear
etc)
5. Choosing a Function Approximation Algorithm (parameters learning using loss
function)
6. Final Design
13 July 2025
Choosing the training experience
In learning to play checkers, the system might learn from direct training examples
consisting of individual checkers board states and the correct move for each.
Alternatively, it might have available only indirect information consisting of the
move sequences and final outcomes of various games played.
13 July 2025
Choosing the training experience
In indirect training, information about the correctness of specific
moves early in the game must be inferred indirectly from the fact
that the game was eventually won or lost.
The learner faces an additional problem of credit assignment, or
determining the degree to which each move in the sequence
deserves credit or blame for the final outcome.
Credit assignment can be a particularly difficult problem because
the game can be lost even when early moves are optimal, if these
are followed later by poor moves. Hence, learning from direct
training feedback is typically easier than learning from indirect
feedback.
13 July 2025
Choosing the training experience
• A second important attribute of the training experience
is the degree to which the learner controls the sequence
of training examples.
– the learner might rely on the teacher to select informative
board states and to provide the correct move for each.
– the learner might itself propose board states that it finds
particularly confusing and ask the teacher for the correct
move.
13 July 2025
Choosing the training experience
• A third important attribute of the training experience is
how well it represents the distribution of examples over
which the final system performance P must be
measured.
– learning is most reliable when the training examples follow a
distribution similar to that of future test examples.
– the performance metric P is the percent of games the system
wins in the world tournament. If its training experience E
consists only of games played against itself, there is an obvious
danger that this training experience might not be fully
representative of the distribution of situations over which it
will later be tested.
13 July 2025
Choosing the training experience
– For example, the learner might never encounter
certain crucial board states that are very likely to be
played by the human checkers champion.
– it is often necessary to learn from a distribution of
examples that is somewhat different from those on
which the final system will be evaluated
– one distribution of examples will not necessary lead
to strong performance over some other distribution.
13 July 2025
Choosing a Representation for the
Target Function
Expressiveness versus Training set size
◦ More expressive the representation of the target function, the closer to the “truth” we can get.
◦ More expressive the representation, the more training examples are necessary to choose among
the large number of “representable” possibilities.
Example of a representation:
◦ x1 = # of red pieces on the board, x2 = # of black pieces on the board
◦ X3= # of black king on the board
◦ x4 = # of red king on the board
◦ X5 = # of black pieces threatened by black
◦ x6 = # of red pieces threatened by black
^
V(b) = w0+w1.x1+w2.x2+w3.x3+w4.x4+w5.x5+w6.x6
wi’s are adjustable
or “learnable”
coefficients
IS ZC464, Machine Learning 13 July 2025
Choosing a Representation for the
Target Function
w0 through w6 are numerical coefficients, or weights, to be chosen by the learning
algorithm.
Learned values for the weights w1 through w6 will determine the relative importance
of the various board features in determining the value of the board, whereas the
weight w0 will provide an additive constant to the board value.
IS ZC464, Machine Learning 13 July 2025
Choosing a Representation for the
Target Function
13 July 2025
Issues in Machine Learning
What algorithms are available for learning a concept? How well do they
perform?
How much training data is sufficient to learn a concept with high confidence?
When is it useful to use prior knowledge?
Are some training examples more useful than others?
What are the best tasks for a system to learn?
What is the best way for a system to represent its knowledge?
13 July 2025
ML in a Nutshell
• Tens of thousands of machine learning
algorithms
– Hundreds new every year
• Every ML algorithm has three components
– Representation
– Optimization
– Evaluation
Slide credit: Pedro Domingos
Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• etc.
Slide credit: Pedro Domingos