ML Intro Theory
ML Intro Theory
● We do not know or are unable to express ● There are clearly defined rules
transforming the input to output.
the exact rules transforming the input to
output.
Machine
Data Learning Model
Algorithm
Model of a Model
Categories of Machine Learning Algorithms
1. Supervised Learning 2. Unsupervised Learning
This is like seeing the answer key and This is like finding the odd one out or
finding which process gives the most grouping similar things. The machine
accurate answer. We teach or train the learning model tries to find any similarities,
machine using data that is well-labelled, differences, patterns, and structure in data
which means that some data is already by itself.
tagged with the correct answer.
Main types:
Two types:
A. Reducing dimensionality
A. Regression B. Grouping similar data
B. Classification C. Density Estimation
A. Regression B. Classification
Regression is used when the output variable is Classification is used when the output variable is
continuous like a number. Examples: discrete and categorical like yes or no.
Examples:
● The price of a house based on its area
and distance from metro. ● Whether or not a house is closer than 5
● Predicting the stock price of a company on KMs to metro based on its price and area.
a given day based on revenue growth and ● Find the gender of a person by analyzing
profit after tax. writing style.
● Predict the height of a person based on ● Predict whether there will be abnormally
his weight. heavy rainfall tomorrow or not based on
previous data.
Both regression and classification are predictive models; they aim to accurately predict the
answer.
Unsupervised Learning
It tries to simplify data by reducing the number It involves determining the distribution of data
of features (dimensions) while preserving as points in a given space.
much important information as possible.
The output of a density estimation process is a
● Encoder: The encoder is a function that function that provides the probability density for
transforms input data into a compressed any given point in the data space.
(lower-dimensional) representation. The
It gives a probabilistic model.
number of features in output is typically
less than input.
● Decoder: The decoder is a function that
C. Grouping similar data
tries to reconstruct the original data from
the compressed representation encoded
It is exactly what the name suggests, however
by the encoder. The output has more
there isn’t much mentioned about it in the lectures.
features than input.
Useful Terminologies:
Data Loss
➔ Training data: The dataset used to teach a Loss is a measure of how wrong the model's
machine learning model how to make predictions are compared to the actual
predictions or perform tasks. This is like outcomes.
Public Test Cases in your programming
It is important as Quiz 1 will usually have
subjects.
numerical questions to calculate loss. Easy to
➔ Test data: A separate dataset used to score.
evaluate the performance of the trained
model on unseen examples. This is like I will make a PPT just for Loss functions and
private test cases. their calculation with PYQs and Solved
Examples.
➔ Validation data: A portion of the dataset used
to tune and optimize the model's
hyperparameters and assess its
generalization ability. This is used for model
Link to the next PPT: Click!
selection.