0% found this document useful (0 votes)
3 views

ML Intro Theory

Uploaded by

sahanawaz9199
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

ML Intro Theory

Uploaded by

sahanawaz9199
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Getting Started with Machine

Learning: A Theoretical Introduction


Gir House Help Resource

Salutations to Śrī-Gaṇeśa and Devī Sarasvatī!


Credits :

Prepared by Vighnesh Mishra, Secretary of Gir House (September 23 - August 24)


Edited by Niraj Kumar (LinkedIN)

I would like to acknowledge the contribution of ChatGPT, GeeksForGeeks, Wikipedia,


Kartik Sir’s Linear Algebra notes and most importantly Activity Questions of MLF. A lot of
the material is directly lifted from the above sources and the rest is paraphrased.
What is Machine Imagine you're teaching a child to
recognize different types of fruit. You

Learning? could show them many pictures of


apples, bananas, and oranges, and over
time, they would learn to identify each
Machine learning is a type of fruit based on its features like shape,
color, and size. Similarly, in machine
artificial intelligence that enables
learning, we provide the computer with
computers to learn from and make lots of examples (data) so it can learn to
decisions based on data. Instead of identify patterns and make decisions.
being explicitly programmed with
specific instructions for every task, Let's also consider the problem of email
spam detection. Traditionally, spam
a machine learning system is
filters used a set of predefined rules
trained using large amounts of created by experts to identify spam
data and algorithms that allow it to emails. For example, if an email
recognize patterns and make contained certain words like "win" or
predictions or decisions. "free," it might be flagged as spam.
Machine Learning:
With machine learning, instead of
★ It is a study of algorithms that improve manually creating these rules, we can
through the use of data. train a spam filter by providing it with a
★ It is a sub field of AI that extracts patterns large dataset of emails labeled as
out of raw data. "spam" or "not spam." The machine
★ It allows computers to learn from learning algorithm analyzes this data
experience without being explicitly and learns to recognize the patterns
that are more likely to appear in spam
programmed or human intervention.
emails versus legitimate ones.
★ It is used for tasks where
programming/human labour fails. Once trained, the machine learning
model can predict whether a new
incoming email is spam or not based on
the patterns it has learned. This
approach is more flexible and can adapt
to new types of spam that may not be
covered by predefined rules.
ML vs Programming:

Machine Learning Traditional Programming

● We do not know or are unable to express ● There are clearly defined rules
transforming the input to output.
the exact rules transforming the input to
output.

Examples of Tasks and Problems: Examples of Tasks and Problems:

➔ Weather prediction ➔ Password verification


➔ Face tagging ➔ Array sorting
➔ Spam detection
➔ Recommendation system
➔ Social media marketing
➔ Speech recognition
➔ Image recognition
➔ Object detection
What is (and is not) a Model?
❖ A model is a mathematical representation of reality.
❖ A model is not an exact representation of a system.
❖ A model uses assumptions.

Machine
Data Learning Model
Algorithm

Model of a Model
Categories of Machine Learning Algorithms
1. Supervised Learning 2. Unsupervised Learning

This is like seeing the answer key and This is like finding the odd one out or
finding which process gives the most grouping similar things. The machine
accurate answer. We teach or train the learning model tries to find any similarities,
machine using data that is well-labelled, differences, patterns, and structure in data
which means that some data is already by itself.
tagged with the correct answer.
Main types:
Two types:
A. Reducing dimensionality
A. Regression B. Grouping similar data
B. Classification C. Density Estimation

“Supervised Learning is essentially curve “Unsupervised Learning is essentially


fitting” understanding data”
Supervised Learning

A. Regression B. Classification

Regression is used when the output variable is Classification is used when the output variable is
continuous like a number. Examples: discrete and categorical like yes or no.
Examples:
● The price of a house based on its area
and distance from metro. ● Whether or not a house is closer than 5
● Predicting the stock price of a company on KMs to metro based on its price and area.
a given day based on revenue growth and ● Find the gender of a person by analyzing
profit after tax. writing style.
● Predict the height of a person based on ● Predict whether there will be abnormally
his weight. heavy rainfall tomorrow or not based on
previous data.
Both regression and classification are predictive models; they aim to accurately predict the
answer.
Unsupervised Learning

A. Dimensionality Reduction B. Density Estimation

It tries to simplify data by reducing the number It involves determining the distribution of data
of features (dimensions) while preserving as points in a given space.
much important information as possible.
The output of a density estimation process is a
● Encoder: The encoder is a function that function that provides the probability density for
transforms input data into a compressed any given point in the data space.
(lower-dimensional) representation. The
It gives a probabilistic model.
number of features in output is typically
less than input.
● Decoder: The decoder is a function that
C. Grouping similar data
tries to reconstruct the original data from
the compressed representation encoded
It is exactly what the name suggests, however
by the encoder. The output has more
there isn’t much mentioned about it in the lectures.
features than input.
Useful Terminologies:
Data Loss

➔ Training data: The dataset used to teach a Loss is a measure of how wrong the model's
machine learning model how to make predictions are compared to the actual
predictions or perform tasks. This is like outcomes.
Public Test Cases in your programming
It is important as Quiz 1 will usually have
subjects.
numerical questions to calculate loss. Easy to
➔ Test data: A separate dataset used to score.
evaluate the performance of the trained
model on unseen examples. This is like I will make a PPT just for Loss functions and
private test cases. their calculation with PYQs and Solved
Examples.
➔ Validation data: A portion of the dataset used
to tune and optimize the model's
hyperparameters and assess its
generalization ability. This is used for model
Link to the next PPT: Click!
selection.

You might also like