Aimlf Unit 3
Aimlf Unit 3
1. Machine Learning
Machine learning is programming computers to optimize a performance criterion using example data
or past experience. We have a model defined up to some parameters, and learning is the execution of a
computer program to optimize the parameters of the model using the training data or past experience. The
model may be predictive to make predictions in the future, or descriptive to gain knowledge from data, or
both.
Definition of learning
A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks T, as measured by P, improves with experience E.
Examples
i) Handwriting recognition learning problem
• Task T: Recognising and classifying handwritten words within images
• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given classifications
ii) A robot driving learning problem
• Task T: Driving on highways using vision sensors
• Performance measure P: Average distance traveled before an error
• training experience: A sequence of images and steering commands recorded while
observing a human driver
Definition
A computer program which learns from experience is called a machine learning program or simply a
learning program. Such a program is sometimes also referred to as a learner.
1.2 Components of Learning
Basic components of learning process
The learning process, whether by a human or a machine, can be divided into four components,
namely, data storage, abstraction, generalization and evaluation.
1. Data storage
Facilities for storing and retrieving huge amounts of data are an important component of the learning
process. Humans and computers alike utilize data storage as a foundation for advanced
reasoning.
• In a human being, the data is stored in the brain and data is retrieved using electrochemical signals.
• Computers use hard disk drives, flash memory, random access memory and similar devices to store
data and use cables and other technology to retrieve data.
2. Abstraction
The second component of the learning process is known as abstraction.
Abstraction is the process of extracting knowledge about stored data. This involves creating general
concepts about the data as a whole. The creation of knowledge involves application of known models and
creation of new models.
3. Generalization
The third component of the learning process is known as generalisation.
The term generalization describes the process of turning the knowledge about stored data into a form
that can be utilized for future action. These actions are to be carried out on tasks that are similar, but not
identical, to those what have been seen before. In generalization, the goal is to discover those properties of
the data that will be most relevant to future tasks.
4. Evaluation
Evaluation is the last component of the learning process.
It is the process of giving feedback to the user to measure the utility of the learned knowledge. This
feedback is then utilised to effect improvements in the whole learning process
Applications of machine learning
Application of machine learning methods to large databases is called data mining. mining, a large
volume of data is processed to construct a simple model with valuable use, for example, having high
predictive accuracy.
The following is a list of some of the typical applications of machine learning.
Image Recognition
Speech Recognition
Recommender Systems
Fraud Detection
Self-Driving Cars
Medical Diagnosis
2. Classification
Classification is a supervised machine learning method where the model tries to predict the correct label
of a given input data. In classification, the model is fully trained using the training data, and then it is
evaluated on test data before being used to perform prediction on new unseen data.
For instance, an algorithm can learn to predict whether a
given email is spam or ham (no spam), as illustrated
below.
Classification Types
There are two main classification types in machine
learning:
Binary Classification
In binary classification, the goal is to classify the
input into one of two classes or categories. Example –
On the
basis given health conditions of a person, we
have to determine whether the person has a certain
disease or not.
Multiclass Classification
In multi-class classification, the goal is to
classify the input into one of several classes or
categories. For Example – On the basis of data about
different species of flowers, we have to determine
which specie our observation belongs to.
Multi-Label Classification
Multi-label Classification
The goal is to predict which of several labels a
new data point belongs to. This is different from
multiclass classification, where each data point
can only belong to one class. For example, a
multi-label classification algorithm could be
used to classify images of animals as belonging
to one or more of the categories cat, dog, bird, or
fish.
Imbalanced Classification
Classification Algorithms
There are various types of classifiers algorithms. Some of them are :
Linear Classifiers
Linear models create a linear decision boundary between classes. They are simple and computationally
efficient. Some of the linear classification models are as follows:
Logistic Regression
Support Vector Machines having kernel = ‘linear’
Single-layer Perceptron
Stochastic Gradient Descent (SGD) Classifier
Non-linear Classifiers
Non-linear models create a non-linear decision boundary between classes. They can capture more complex
relationships between the input features and the target variable. Some of the non-
linear classification models are as follows:
K-Nearest Neighbours
Kernel SVM
Naive Bayes
Decision Tree Classification
How does Classification Machine Learning Work?
The basic idea behind classification is to train a
model on a labeled dataset, where the input data is
associated with their corresponding output labels, to
learn the patterns and relationships between the input
data and output labels. Once the model is trained, it
can be used to predict the output labels for new unseen
data.
3. Regression
Regression is a statistical approach used to analyze the relationship between a dependent variable
(target variable) and one or more independent variables (predictor variables). The objective is to determine
the most suitable function that characterizes the connection between these variables.
Regression Types
There are two main types of regression:
Simple Regression
o Used to predict a continuous dependent variable based on a single independent variable.
o Simple linear regression should be used when there is only a single independent variable.
Multiple Regression
o Used to predict a continuous dependent variable based on multiple independent variables.
o Multiple linear regression should be used when there are multiple independent variables.
NonLinear Regression
o Relationship between the dependent variable and independent variable(s) follows a nonlinear
pattern.
o Provides flexibility in modeling a wide range of functional forms.
Regression Algorithms
There are many different types of regression algorithms, but some of the most common include:
Linear Regression
o Linear regression is one of the simplest and most widely used statistical models. This
assumes that there is a linear relationship between the independent and dependent variables.
This means that the change in the dependent variable is proportional to the change in the
independent variables.
Polynomial Regression
o Polynomial regression is used to model nonlinear relationships between the dependent
variable and the independent variables. It adds polynomial terms to the linear regression
model to capture more complex relationships.
Support Vector Regression (SVR)
o Support vector regression (SVR) is a type of regression algorithm that is based on the
support vector machine (SVM) algorithm. SVM is a type of algorithm that is used for
classification tasks, but it can also be used for regression tasks. SVR works by finding a
hyperplane that minimizes the sum of the squared residuals between the predicted and actual
values.
Decision Tree Regression
o Decision tree regression is a type of regression algorithm that builds a decision tree to
predict the target value. A decision tree is a tree-like structure that consists of nodes and
branches. Each node represents a decision, and each branch represents the outcome of that
decision. The goal of decision tree regression is to build a tree that can accurately predict the
target value for new data points.
Random Forest Regression
o Random forest regression is an ensemble method that combines multiple decision trees to
predict the target value. Ensemble methods are a type of machine learning algorithm that
combines multiple models to improve the performance of the overall model. Random forest
regression works by building a large number of decision trees, each of which is trained on a
different subset of the training data. The final prediction is made by averaging the predictions
of all of the trees.
Characteristics of Regression
Continuous Target Variable: Regression deals with predicting continuous target variables that
represent numerical values. Examples include predicting house prices, forecasting sales figures, or
estimating patient recovery times.
Error Measurement: Regression models are evaluated based on their ability to minimize the error
between the predicted and actual values of the target variable. Common error metrics include mean
absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE).
Model Complexity: Regression models range from simple linear models to more complex nonlinear
models. The choice of model complexity depends on the complexity of the relationship between the
input features and the target variable.
Overfitting and Underfitting: Regression models are susceptible to overfitting and underfitting.
Interpretability: The interpretability of regression models varies depending on the algorithm used.
Simple linear models are highly interpretable, while more complex models may be more difficult to
interpret.
Examples
Predicting age of a person
Predicting nationality of a person
Example: Consider a scenario where you have to build an image classifier to differentiate between
cats and dogs. If you feed the datasets of dogs and cats labelled images to the algorithm, the machine
will learn to classify between a dog or a cat from these labeled images. When we input new dog or cat
images that it has never seen before, it will use the learned algorithms and predict whether it is a dog or a
cat. This is how supervised learning works, and this is particularly an image classification.
There are two main categories of supervised learning that are mentioned below:
Classification
Regression
Classification
Classification deals with predicting categorical target variables, which represent discrete classes or
labels. For instance, classifying emails as spam or not spam, or predicting whether a patient has a high
risk of heart disease. Classification algorithms learn to map the input features to one of the predefined
classes.
Here are some classification algorithms:
Logistic Regression
Support Vector Machine
Random Forest
Regression
Regression, on the other hand, deals with predicting continuous target variables, which
represent numerical values. For example, predicting the price of a house based on its size, location,
and amenities, or forecasting the sales of a product. Regression algorithms learn to map the input
features to a continuous numerical value.
Here are some regression algorithms:
Linear Regression
Polynomial Regression
Ridge Regression
Advantages of Supervised Machine Learning
Supervised Learning models can have high accuracy as they are trained on labelled data.
The process of decision-making in supervised learning models is often interpretable.
Disadvantages of Supervised Machine Learning
It has limitations in knowing patterns and may struggle with unseen or unexpected patterns that are
not present in the training data.
It can be time-consuming and costly as it relies on labeled data only.
Applications of Supervised Learning
Image classification
Natural language processing
Speech recognition
2. Unsupervised Machine Learning
Unsupervised Learning Unsupervised learning is a type
of machine learning technique in which an algorithm
discovers patterns and relationships using unlabeled
data. Unlike supervised learning, unsupervised learning
doesn’t involve providing the algorithm with labeled
target outputs. The primary goal of Unsupervised
learning is often to discover hidden patterns,
similarities, or clusters within the data, which can then
be used for various purposes, such as data exploration,
visualization, dimensionality reduction, and more.
Example: Consider that you have a dataset that contains information about the purchases you made
from the shop. Through clustering, the algorithm can group the same purchasing behavior among you
and other customers, which reveals potential customers without predefined labels. This type of
information can help businesses get target customers as well as identify outliers.
There are two main categories of unsupervised learning that are mentioned below:
Clustering
Association
Clustering
Clustering is the process of grouping data points into clusters based on their similarity. This
technique is useful for identifying patterns and relationships in data without the need for labeled
examples.
Here are some clustering algorithms:
K-Means Clustering algorithm
Mean-shift algorithm
DBSCAN Algorithm
Association
Association rule learning is a technique for discovering relationships between items in a dataset. It
identifies rules that indicate the presence of one item implies the presence of another item with a specific
probability.
Here are some association rule learning algorithms:
Apriori Algorithm
Eclat
FP-growth Algorithm
Advantages of Unsupervised Machine Learning
It helps to discover hidden patterns and various relationships between the data.
Used for tasks such as customer segmentation, anomaly detection, and data exploration.
Disadvantages of Unsupervised Machine Learning
Without using labels, it may be difficult to predict the quality of the model’s output.
Cluster Interpretability may not be clear and may not have meaningful interpretations.
Applications of Unsupervised Learning
Clustering
Anomaly detection
Dimensionality reduction
3. Semi-Supervised Learning
Semi-Supervised learning is a machine learning algorithm that works between the supervised and
unsupervised learning so it uses both labelled and unlabelled data. It’s particularly useful when
obtaining labeled data is costly, time-consuming, or resource-intensive. This approach is useful when the
dataset is expensive and time-consuming. Semi-supervised learning is chosen when labeled data requires
skills and relevant resources in order to train or learn from it.
We use these techniques when we are dealing
with data that is a little bit labeled and the rest large
portion of it is unlabeled. We can use the unsupervised
techniques to predict labels and then feed these labels to
supervised techniques. This technique is mostly
applicable in the case of image data sets where usually
all images are not labeled.
Example: Consider that we are building a language translation model, having labeled translations for
every sentence pair can be resources intensive. It allows the models to learn from labeled and unlabeled
sentence pairs, making them more accurate. This technique has led to significant improvements in the
quality of machine translation services.
Types of Semi-Supervised Learning Methods
Graph-based semi-supervised learning: This approach uses a graph to represent the relationships
between the data points. The graph is then used to propagate labels from the labeled data points to
the unlabeled data points.
Label propagation: This approach iteratively propagates labels from the labeled data points to the
unlabeled data points, based on the similarities between the data points.
Co-training: This approach trains two different machine learning models on different subsets of the
unlabeled data. The two models are then used to label each other’s predictions.
Self-training: This approach trains a machine learning model on the labeled data and then uses the
model to predict labels for the unlabeled data. The model is then retrained on the labeled data and the
predicted labels for the unlabeled data.
Generative adversarial networks (GANs): GANs are a type of deep learning algorithm that can be
used to generate synthetic data. GANs can be used to generate unlabeled data for semi-supervised
learning by training two neural networks, a generator and a discriminator.
Advantages of Semi- Supervised Machine Learning
It leads to better generalization as compared to supervised learning, as it takes both labeled and
unlabeled data.
Can be applied to a wide range of data.
Disadvantages of Semi- Supervised Machine Learning
Semi-supervised methods can be more complex to implement compared to other approaches.
It still requires some labeled data that might not always be available or easy to obtain.
Applications of Semi-Supervised Learning
Image Classification and Object Recognition
Natural Language Processing (NLP
Speech Recognition
Recommendation Systems.
5. Probability
Probability can be calculated by the number of times the event occurs divided by the total number of
possible outcomes. Let's suppose we tossed a coin, then the probability of getting head as a possible
outcome can be calculated as below formula:
P (H) = Number of ways to head occur/ total number of possible outcomes
P (H) = ½
P (H) = 0.5
Where;
P (H) = Probability of occurring Head as outcome while tossing a coin.
Types of Probability
For better understanding the Probability, it can be categorized further in different types as follows:
Empirical Probability: Empirical Probability can be calculated as the number of times the event occurs
divided by the total number of incidents observed.
Theoretical Probability:Theoretical Probability can be calculated as the number of ways the particular
event can occur divided by the total number of possible outcomes.
Joint Probability:It tells the Probability of simultaneously occurring two random events.
P(A ∩ B) = P(A). P(B)
Where;
P(A ∩ B) = Probability of occurring events A and B both.
P (A) = Probability of event A
P (B) = Probability of event B
Conditional Probability:It is given by the Probability of event A given that event B occurred.
The Probability of an event A conditioned on an event B is denoted and defined as;
P(A|B) = P(A∩B)/P(B)
Similarly, P(B|A) = P(A ∩ B)/ P(A) . We can write the joint Probability of as A and B as P(A ∩ B)=
p(A).P(B|A), which means: "The chance of both things happening is the chance that the first one happens,
and then the second one is given when the first thing happened."
6.Linear Algebra
Linear algebra is the branch of mathematics that deals with vector spaces and linear mappings between these
spaces. It encompasses the study of vectors, matrices, linear equations, and their properties.
B. Fundamental Concepts
1. Vectors
Vectors are quantities that have both magnitude and direction, often represented as arrows in space.
v=[2−14]v=2−14
2. Matrices
Matrices are rectangular arrays of numbers, arranged in rows and columns.
Matrices are used to represent linear transformations, systems of linear equations, and data
transformations in machine learning.
Example: [123456789]147258369u = [3, 4] v = [-1, 2]
3. Scalars
Scalars are single numerical values, without direction, magnitude only.
Scalars are used to scale vectors or matrices through operations like multiplication.
Example: Let’s consider a scalar, k= 3, and a vector [v=[2−14]][v=2−14]
Scalar multiplication involves multiplying each component of the vector by the scalar. So, if we
multiply the vector v by the scalar k=3 we get:
k⋅v=3⋅[2−14]=[3⋅23⋅(−1)3⋅4]=[6−312]k⋅v=3⋅2−14=3⋅23⋅(−1)3⋅4=6−312
C. Operations in Linear Algebra
1. Addition and Subtraction
Addition and subtraction of vectors or matrices involve adding or subtracting corresponding
elements.
Example: [u=[2−14],v=[30−2]][u=2−14,v=30−2]
addition: [u+v=[2−14]+[30−2]=[2+3−1+04+(−2)]=[5−12]][u+v=2−14+30−2=2+3−1+04+(−2)=5−12
]
subtraction: [u–v=[2−14]–[30−2]=[2−3−1−04−(−2)]=[−1−16]][u–v=2−14–30−2=2−3−1−04−(−2)=
−1−16]
2. Scalar Multiplication
Scalar multiplication involves multiplying each element of a vector or matrix by a scalar.
Example: Consider the scalar k=3 and a vector v = [2−14]2−14
scalar multiplication involves multiplying each component of the vector by the scalar. So, if we
multiply the vector v by the scalar k=3 , we get :
k⋅v=3⋅[2−14]=[3⋅23⋅(−1)3⋅4]=[6−312]k⋅v=3⋅2−14=3⋅23⋅(−1)3⋅4=6−312
3. Dot Product (Vector Multiplication)
The dot product of two vectors measures the similarity of their directions.
It is computed by multiplying corresponding elements of two vectors and summing the results.
Example: For example, given two vectors(u=[u1,u2,u3] and v=[v1,v2,v3])(u=[u1,u2,u3] and v=[v1
,v2,v3]), their dot product is calculated as:
u⋅v=u1⋅v1+u2⋅v2+u3⋅v3u⋅v=u1⋅v1+u2⋅v2+u3⋅v3
4. Cross Product (Vector Multiplication)
The cross product of two vectors in three-dimensional space produces a vector orthogonal to the
plane containing the original vectors.
It is used less frequently in machine learning compared to the dot product.
Example: Given two vectors u and v, their cross product u×v is calculated as:
u×v=[u1u2u3]×[v1v2v3]=[u2v3–u3v2u3v1–u1v3u1v2–u2v1]u×v=u1u2u3×v1v2v3=u2v3–u3v2u3v1–u1v3
u1v2–u2v1
7.Hypothesis
A hypothesis in machine learning is the model’s presumption regarding the connection between the
input features and the result. It is an illustration of the mapping function that the algorithm is attempting to
discover using the training set. To minimize the discrepancy between the expected and actual outputs, the
learning process involves modifying the weights that parameterize the hypothesis.
How does a Hypothesis work?
In most supervised machine learning algorithms, our main goal is to find a possible hypothesis from the
hypothesis space that could map out the inputs to the proper outputs. The following figure shows the
common method to find out the possible hypothesis from the Hypothesis space:
The way in which the coordinate would be divided depends on the data, algorithm and constraints.
All these legal possible ways in which we can divide the coordinate plane to predict the outcome of
the test data composes of the Hypothesis Space.
Each individual possible way is known as the hypothesis.
Hence, in this example the hypothesis space would be like:
Hypothesis Space and Representation
in Machine Learning
The hypothesis space comprises all
possible legal hypotheses that a machine
learning algorithm can consider. Hypotheses are
formulated based on various algorithms and
techniques, including linear regression, decision
trees, and neural networks. These hypotheses
capture the mapping function transforming input
data into predictions.
Hypothesis Formulation and Representation in Machine Learning
Hypotheses in machine learning are formulated based on various algorithms and techniques, each with its
representation. For example:
Linear Regression: h(X)=θ0+θ1X1+θ2X2+…+θnXnh(X)=θ0+θ1X1+θ2X2+…+θnXn
Decision Trees: h(X)=Tree(X)h(X)=Tree(X)
Neural Networks: h(X)=NN(X)h(X)=NN(X)
Hypothesis Evaluation:
The process of machine learning involves not only formulating hypotheses but also evaluating their
performance. Common evaluation metrics include mean squared error (MSE), accuracy, precision, recall,
F1-score, and others. By comparing the predictions of the hypothesis with the actual outcomes on a
validation or test dataset, one can assess the effectiveness of the model.
Hypothesis Testing and Generalization:
Once a hypothesis is formulated and evaluated, the next step is to test its generalization capabilities.
Generalization refers to the ability of a model to make accurate predictions on unseen data
8.Inductive bias
Inductive bias can be defined as the set of assumptions or biases that a learning algorithm employs to
make predictions on unseen data based on its training data. These assumptions are inherent in the
algorithm’s design and serve as a foundation for learning and generalization.
The inductive bias of an algorithm influences how it selects a hypothesis (a possible explanation or model)
from the hypothesis space (the set of all possible hypotheses) that best fits the training data. It helps the
algorithm navigate the trade-off between fitting the training data perfectly (overfitting) and generalizing well
to unseen data (underfitting).
Types of Inductive Bias
Inductive bias can manifest in various forms, depending on the algorithm and its underlying assumptions.
Some common types of inductive bias include:
1. Bias towards simpler explanations: Many machine learning algorithms, such as decision trees and
linear models, have a bias towards simpler hypotheses. They prefer explanations that are more
parsimonious and less complex, as these are often more likely to generalize well to unseen data.
2. Bias towards smoother functions: Algorithms like kernel methods or Gaussian processes have a
bias towards smoother functions. They assume that neighboring points in the input space should have
similar outputs, leading to smooth decision boundaries.
3. Bias towards specific types of functions: Neural networks, for example, have a bias towards
learning complex, nonlinear functions. This bias allows them to capture intricate patterns in the data
but can also lead to overfitting if not regularized properly.
4. Bias towards sparsity: Some algorithms, like Lasso regression, have a bias towards sparsity. They
prefer solutions where only a few features are relevant, which can improve interpretability and
generalization.
Importance of Inductive Bias
Inductive bias is crucial in machine learning as it helps algorithms generalize from limited training data to
unseen data. Without a well-defined inductive bias, algorithms may struggle to make accurate predictions or
may overfit the training data, leading to poor performance on new data.
Understanding the inductive bias of an algorithm is essential for model selection, as different biases may be
more suitable for different types of data or tasks. It also provides insights into how the algorithm is learning
and what assumptions it is making about the data, which can aid in interpreting its predictions and results.
Challenges and Considerations
While inductive bias is essential for learning, it can also introduce limitations and challenges. Biases that are
too strong or inappropriate for the data can lead to poor generalization or biased predictions. Balancing bias
with variance (the variability of predictions) is a key challenge in machine learning, requiring careful tuning
and model selection.
Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.
Residuals: The distance between the actual value and predicted values is called residual. If the observed
points are far from the regression line, then the residual will be high, and so cost function will high. If the
scatter points are close to the regression line, then the residual will be small and hence the cost function.
14.Logistic Regression
o Logistic regression is one of the most popular Machine Learning algorithms, which comes under the
Supervised Learning technique. It is used for predicting the categorical dependent variable using a
given set of independent variables.
o Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome
must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0
and 1.
o Logistic Regression is much similar to the Linear Regression except that how they are used. Linear
Regression is used for solving Regression problems, whereas Logistic regression is used for solving
the classification problems.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function,
which predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the likelihood of something such as whether the cells
are cancerous or not, a mouse is obese or not based on its weight, etc.
o Logistic Regression is a significant machine learning algorithm because it has the ability to provide
probabilities and classify new data using continuous and discrete datasets.
o Logistic Regression can be used to classify the observations using different types of data and can
easily determine the most effective variables used for the classification. The below image is showing
the logistic function:
o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above equation by
(1-y):
o But we need range between -[infinity] to +[infinity], then take logarithm of the equation it will
become:
Linear regression is used to predict the continuous Logistic Regression is used to predict the categorical
dependent variable using a given set of independent dependent variable using a given set of independent
variables. variables.
Linear Regression is used for solving Regression Logistic regression is used for solving Classification
problem. problems.
In Linear regression, we predict the value of In logistic Regression, we predict the values of
continuous variables. categorical variables.
In linear regression, we find the best fit line, by In Logistic Regression, we find the S-curve by
which we can easily predict the output. which we can classify the samples.
Least square estimation method is used for Maximum likelihood estimation method is used for
estimation of accuracy. estimation of accuracy.
The output for Linear Regression must be a The output of Logistic Regression must be a
continuous value, such as price, age, etc. Categorical value such as 0 or 1, Yes or No, etc.
In Linear regression, it is required that relationship In Logistic regression, it is not required to have the
between dependent variable and independent linear relationship between the dependent and
variable must be linear. independent variable.
In linear regression, there may be collinearity bet In logistic regression, there should not be
ween he independent variables. collinearity between the independent variable.