0% found this document useful (0 votes)
4 views

Aimlf Unit 3

Uploaded by

Sheela Raj
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Aimlf Unit 3

Uploaded by

Sheela Raj
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT III LEARNINGM

Machine Learning: Definitions – Classification - Regression - approaches of


machine learning models - Types of learning - Probability - Basics - Linear Algebra – Hypothesis
space and inductive bias, Evaluation. Training and test sets, cross validation, Concept of over fitting,
under fitting, Bias and Variance - Regression: Linear Regression - Logistic Regression

1. Machine Learning
Machine learning is programming computers to optimize a performance criterion using example data
or past experience. We have a model defined up to some parameters, and learning is the execution of a
computer program to optimize the parameters of the model using the training data or past experience. The
model may be predictive to make predictions in the future, or descriptive to gain knowledge from data, or
both.
Definition of learning
A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks T, as measured by P, improves with experience E.
Examples
i) Handwriting recognition learning problem
• Task T: Recognising and classifying handwritten words within images
• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given classifications
ii) A robot driving learning problem
• Task T: Driving on highways using vision sensors
• Performance measure P: Average distance traveled before an error
• training experience: A sequence of images and steering commands recorded while
observing a human driver
Definition
A computer program which learns from experience is called a machine learning program or simply a
learning program. Such a program is sometimes also referred to as a learner.
1.2 Components of Learning
Basic components of learning process
The learning process, whether by a human or a machine, can be divided into four components,
namely, data storage, abstraction, generalization and evaluation.

1. Data storage
Facilities for storing and retrieving huge amounts of data are an important component of the learning
process. Humans and computers alike utilize data storage as a foundation for advanced
reasoning.
• In a human being, the data is stored in the brain and data is retrieved using electrochemical signals.
• Computers use hard disk drives, flash memory, random access memory and similar devices to store
data and use cables and other technology to retrieve data.
2. Abstraction
The second component of the learning process is known as abstraction.
Abstraction is the process of extracting knowledge about stored data. This involves creating general
concepts about the data as a whole. The creation of knowledge involves application of known models and
creation of new models.
3. Generalization
The third component of the learning process is known as generalisation.
The term generalization describes the process of turning the knowledge about stored data into a form
that can be utilized for future action. These actions are to be carried out on tasks that are similar, but not
identical, to those what have been seen before. In generalization, the goal is to discover those properties of
the data that will be most relevant to future tasks.
4. Evaluation
Evaluation is the last component of the learning process.
It is the process of giving feedback to the user to measure the utility of the learned knowledge. This
feedback is then utilised to effect improvements in the whole learning process
Applications of machine learning
Application of machine learning methods to large databases is called data mining. mining, a large
volume of data is processed to construct a simple model with valuable use, for example, having high
predictive accuracy.
The following is a list of some of the typical applications of machine learning.
 Image Recognition
 Speech Recognition
 Recommender Systems
 Fraud Detection
 Self-Driving Cars
 Medical Diagnosis

2. Classification
Classification is a supervised machine learning method where the model tries to predict the correct label
of a given input data. In classification, the model is fully trained using the training data, and then it is
evaluated on test data before being used to perform prediction on new unseen data.
 For instance, an algorithm can learn to predict whether a
given email is spam or ham (no spam), as illustrated
below.
Classification Types
There are two main classification types in machine
learning:

Binary Classification
In binary classification, the goal is to classify the
input into one of two classes or categories. Example –
On the
basis given health conditions of a person, we
have to determine whether the person has a certain
disease or not.

Multiclass Classification
In multi-class classification, the goal is to
classify the input into one of several classes or
categories. For Example – On the basis of data about
different species of flowers, we have to determine
which specie our observation belongs to.

Multi-Label Classification
Multi-label Classification
The goal is to predict which of several labels a
new data point belongs to. This is different from
multiclass classification, where each data point
can only belong to one class. For example, a
multi-label classification algorithm could be
used to classify images of animals as belonging
to one or more of the categories cat, dog, bird, or
fish.

Imbalanced Classification

The goal is to predict whether a new data point belongs to a


minority class, even though there are many more examples of
the majority class. For example, a medical diagnosis algorithm
could be used to predict whether a patient has a rare disease,
even though there are many more patients with common
diseases.

Classification Algorithms
There are various types of classifiers algorithms. Some of them are :
Linear Classifiers
Linear models create a linear decision boundary between classes. They are simple and computationally
efficient. Some of the linear classification models are as follows:
 Logistic Regression
 Support Vector Machines having kernel = ‘linear’
 Single-layer Perceptron
 Stochastic Gradient Descent (SGD) Classifier
Non-linear Classifiers
Non-linear models create a non-linear decision boundary between classes. They can capture more complex
relationships between the input features and the target variable. Some of the non-
linear classification models are as follows:
 K-Nearest Neighbours
 Kernel SVM
 Naive Bayes
 Decision Tree Classification
How does Classification Machine Learning Work?
The basic idea behind classification is to train a
model on a labeled dataset, where the input data is
associated with their corresponding output labels, to
learn the patterns and relationships between the input
data and output labels. Once the model is trained, it
can be used to predict the output labels for new unseen
data.
3. Regression
Regression is a statistical approach used to analyze the relationship between a dependent variable
(target variable) and one or more independent variables (predictor variables). The objective is to determine
the most suitable function that characterizes the connection between these variables.
Regression Types
There are two main types of regression:
 Simple Regression
o Used to predict a continuous dependent variable based on a single independent variable.
o Simple linear regression should be used when there is only a single independent variable.
 Multiple Regression
o Used to predict a continuous dependent variable based on multiple independent variables.
o Multiple linear regression should be used when there are multiple independent variables.
 NonLinear Regression
o Relationship between the dependent variable and independent variable(s) follows a nonlinear
pattern.
o Provides flexibility in modeling a wide range of functional forms.
Regression Algorithms
There are many different types of regression algorithms, but some of the most common include:
 Linear Regression
o Linear regression is one of the simplest and most widely used statistical models. This
assumes that there is a linear relationship between the independent and dependent variables.
This means that the change in the dependent variable is proportional to the change in the
independent variables.
 Polynomial Regression
o Polynomial regression is used to model nonlinear relationships between the dependent
variable and the independent variables. It adds polynomial terms to the linear regression
model to capture more complex relationships.
 Support Vector Regression (SVR)
o Support vector regression (SVR) is a type of regression algorithm that is based on the
support vector machine (SVM) algorithm. SVM is a type of algorithm that is used for
classification tasks, but it can also be used for regression tasks. SVR works by finding a
hyperplane that minimizes the sum of the squared residuals between the predicted and actual
values.
 Decision Tree Regression
o Decision tree regression is a type of regression algorithm that builds a decision tree to
predict the target value. A decision tree is a tree-like structure that consists of nodes and
branches. Each node represents a decision, and each branch represents the outcome of that
decision. The goal of decision tree regression is to build a tree that can accurately predict the
target value for new data points.
 Random Forest Regression
o Random forest regression is an ensemble method that combines multiple decision trees to
predict the target value. Ensemble methods are a type of machine learning algorithm that
combines multiple models to improve the performance of the overall model. Random forest
regression works by building a large number of decision trees, each of which is trained on a
different subset of the training data. The final prediction is made by averaging the predictions
of all of the trees.
Characteristics of Regression
 Continuous Target Variable: Regression deals with predicting continuous target variables that
represent numerical values. Examples include predicting house prices, forecasting sales figures, or
estimating patient recovery times.
 Error Measurement: Regression models are evaluated based on their ability to minimize the error
between the predicted and actual values of the target variable. Common error metrics include mean
absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE).
 Model Complexity: Regression models range from simple linear models to more complex nonlinear
models. The choice of model complexity depends on the complexity of the relationship between the
input features and the target variable.
 Overfitting and Underfitting: Regression models are susceptible to overfitting and underfitting.
 Interpretability: The interpretability of regression models varies depending on the algorithm used.
Simple linear models are highly interpretable, while more complex models may be more difficult to
interpret.
Examples
 Predicting age of a person
 Predicting nationality of a person

4.Approaches of machine learning models (Types of Machine Learning)


There are several types of machine learning, each with special characteristics and applications. Some
of the main types of machine learning algorithms are as follows:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
1. Supervised Machine Learning
Supervised learning is defined as when a model
gets trained on a “Labelled Dataset”. Labelled
datasets have both input and output parameters.
In Supervised Learning algorithms learn to
map points between inputs and correct outputs.
It has both training and validation datasets
labelled.

Example: Consider a scenario where you have to build an image classifier to differentiate between
cats and dogs. If you feed the datasets of dogs and cats labelled images to the algorithm, the machine
will learn to classify between a dog or a cat from these labeled images. When we input new dog or cat
images that it has never seen before, it will use the learned algorithms and predict whether it is a dog or a
cat. This is how supervised learning works, and this is particularly an image classification.
There are two main categories of supervised learning that are mentioned below:
 Classification
 Regression
Classification
Classification deals with predicting categorical target variables, which represent discrete classes or
labels. For instance, classifying emails as spam or not spam, or predicting whether a patient has a high
risk of heart disease. Classification algorithms learn to map the input features to one of the predefined
classes.
Here are some classification algorithms:
 Logistic Regression
 Support Vector Machine
 Random Forest
Regression
Regression, on the other hand, deals with predicting continuous target variables, which
represent numerical values. For example, predicting the price of a house based on its size, location,
and amenities, or forecasting the sales of a product. Regression algorithms learn to map the input
features to a continuous numerical value.
Here are some regression algorithms:
 Linear Regression
 Polynomial Regression
 Ridge Regression
Advantages of Supervised Machine Learning
 Supervised Learning models can have high accuracy as they are trained on labelled data.
 The process of decision-making in supervised learning models is often interpretable.
Disadvantages of Supervised Machine Learning
 It has limitations in knowing patterns and may struggle with unseen or unexpected patterns that are
not present in the training data.
 It can be time-consuming and costly as it relies on labeled data only.
Applications of Supervised Learning
 Image classification
 Natural language processing
 Speech recognition
2. Unsupervised Machine Learning
Unsupervised Learning Unsupervised learning is a type
of machine learning technique in which an algorithm
discovers patterns and relationships using unlabeled
data. Unlike supervised learning, unsupervised learning
doesn’t involve providing the algorithm with labeled
target outputs. The primary goal of Unsupervised
learning is often to discover hidden patterns,
similarities, or clusters within the data, which can then
be used for various purposes, such as data exploration,
visualization, dimensionality reduction, and more.
Example: Consider that you have a dataset that contains information about the purchases you made
from the shop. Through clustering, the algorithm can group the same purchasing behavior among you
and other customers, which reveals potential customers without predefined labels. This type of
information can help businesses get target customers as well as identify outliers.
There are two main categories of unsupervised learning that are mentioned below:
 Clustering
 Association
Clustering
Clustering is the process of grouping data points into clusters based on their similarity. This
technique is useful for identifying patterns and relationships in data without the need for labeled
examples.
Here are some clustering algorithms:
 K-Means Clustering algorithm
 Mean-shift algorithm
 DBSCAN Algorithm
Association
Association rule learning is a technique for discovering relationships between items in a dataset. It
identifies rules that indicate the presence of one item implies the presence of another item with a specific
probability.
Here are some association rule learning algorithms:
 Apriori Algorithm
 Eclat
 FP-growth Algorithm
Advantages of Unsupervised Machine Learning
 It helps to discover hidden patterns and various relationships between the data.
 Used for tasks such as customer segmentation, anomaly detection, and data exploration.
Disadvantages of Unsupervised Machine Learning
 Without using labels, it may be difficult to predict the quality of the model’s output.
 Cluster Interpretability may not be clear and may not have meaningful interpretations.
Applications of Unsupervised Learning
 Clustering
 Anomaly detection
 Dimensionality reduction
3. Semi-Supervised Learning
Semi-Supervised learning is a machine learning algorithm that works between the supervised and
unsupervised learning so it uses both labelled and unlabelled data. It’s particularly useful when
obtaining labeled data is costly, time-consuming, or resource-intensive. This approach is useful when the
dataset is expensive and time-consuming. Semi-supervised learning is chosen when labeled data requires
skills and relevant resources in order to train or learn from it.
We use these techniques when we are dealing
with data that is a little bit labeled and the rest large
portion of it is unlabeled. We can use the unsupervised
techniques to predict labels and then feed these labels to
supervised techniques. This technique is mostly
applicable in the case of image data sets where usually
all images are not labeled.

Example: Consider that we are building a language translation model, having labeled translations for
every sentence pair can be resources intensive. It allows the models to learn from labeled and unlabeled
sentence pairs, making them more accurate. This technique has led to significant improvements in the
quality of machine translation services.
Types of Semi-Supervised Learning Methods
 Graph-based semi-supervised learning: This approach uses a graph to represent the relationships
between the data points. The graph is then used to propagate labels from the labeled data points to
the unlabeled data points.
 Label propagation: This approach iteratively propagates labels from the labeled data points to the
unlabeled data points, based on the similarities between the data points.
 Co-training: This approach trains two different machine learning models on different subsets of the
unlabeled data. The two models are then used to label each other’s predictions.
 Self-training: This approach trains a machine learning model on the labeled data and then uses the
model to predict labels for the unlabeled data. The model is then retrained on the labeled data and the
predicted labels for the unlabeled data.
 Generative adversarial networks (GANs): GANs are a type of deep learning algorithm that can be
used to generate synthetic data. GANs can be used to generate unlabeled data for semi-supervised
learning by training two neural networks, a generator and a discriminator.
Advantages of Semi- Supervised Machine Learning
 It leads to better generalization as compared to supervised learning, as it takes both labeled and
unlabeled data.
 Can be applied to a wide range of data.
Disadvantages of Semi- Supervised Machine Learning
 Semi-supervised methods can be more complex to implement compared to other approaches.
 It still requires some labeled data that might not always be available or easy to obtain.
Applications of Semi-Supervised Learning
 Image Classification and Object Recognition
 Natural Language Processing (NLP
 Speech Recognition
 Recommendation Systems.

4. Reinforcement Machine Learning


Reinforcement machine learning algorithm is a learning method that interacts with the environment
by producing actions and discovering errors. Trial, error, and delay are the most relevant characteristics
of reinforcement learning. In this technique, the model keeps on increasing its performance using
Reward Feedback to learn the behavior or pattern. These algorithms are specific to a particular problem
e.g. Google Self Driving car,
Here are some of most common reinforcement learning algorithms:
 Q-learning: Q-learning is a model-free RL algorithm that learns a Q-function, which maps states to
actions. The Q-function estimates the expected reward of taking a particular action in a given state.
 SARSA (State-Action-Reward-State-Action): SARSA is another model-free RL algorithm that
learns a Q-function. However, unlike Q-learning, SARSA updates the Q-function for the action that
was actually taken, rather than the optimal action.
 Deep Q-learning: Deep Q-learning is a combination of Q-learning and deep learning. Deep Q-
learning uses a neural network to represent the Q-function, which allows it to learn complex
relationships between states and actions.

Example: Consider that you are training an AI agent to


play a game like chess. The agent explores different
moves and receives positive or negative feedback based
on the outcome. Reinforcement Learning also finds
applications in which they learn to perform tasks by
interacting with their surroundings.

Types of Reinforcement Machine Learning


There are two main types of reinforcement learning:
Positive reinforcement
 Rewards the agent for taking a desired action.
 Encourages the agent to repeat the behavior.
 Examples: Giving a treat to a dog for sitting, providing a point in a game for a correct answer.
Negative reinforcement
 Removes an undesirable stimulus to encourage a desired behavior.
 Discourages the agent from repeating the behavior.
 Examples: Turning off a loud buzzer when a lever is pressed, avoiding a penalty by completing a
task.
Advantages of Reinforcement Machine Learning
 It has autonomous decision-making that is well-suited for tasks and that can learn to make a
sequence of decisions, like robotics and game-playing.
 This technique is preferred to achieve long-term results that are very difficult to achieve.
Disadvantages of Reinforcement Machine Learning
 Training Reinforcement Learning agents can be computationally expensive and time-consuming.
 Reinforcement learning is not preferable to solving simple problems.
Applications of Reinforcement Machine Learning
Here are some applications of reinforcement learning:
 Game Playing
 Robotics
 Autonomous Vehicles
 Recommendation Systems
 Healthcare
 Natural Language Processing (NLP)

5. Probability
Probability can be calculated by the number of times the event occurs divided by the total number of
possible outcomes. Let's suppose we tossed a coin, then the probability of getting head as a possible
outcome can be calculated as below formula:
P (H) = Number of ways to head occur/ total number of possible outcomes
P (H) = ½
P (H) = 0.5
Where;
P (H) = Probability of occurring Head as outcome while tossing a coin.
Types of Probability
For better understanding the Probability, it can be categorized further in different types as follows:
Empirical Probability: Empirical Probability can be calculated as the number of times the event occurs
divided by the total number of incidents observed.
Theoretical Probability:Theoretical Probability can be calculated as the number of ways the particular
event can occur divided by the total number of possible outcomes.
Joint Probability:It tells the Probability of simultaneously occurring two random events.
P(A ∩ B) = P(A). P(B)
Where;
P(A ∩ B) = Probability of occurring events A and B both.
P (A) = Probability of event A
P (B) = Probability of event B
Conditional Probability:It is given by the Probability of event A given that event B occurred.
The Probability of an event A conditioned on an event B is denoted and defined as;
P(A|B) = P(A∩B)/P(B)
Similarly, P(B|A) = P(A ∩ B)/ P(A) . We can write the joint Probability of as A and B as P(A ∩ B)=
p(A).P(B|A), which means: "The chance of both things happening is the chance that the first one happens,
and then the second one is given when the first thing happened."

6.Linear Algebra
Linear algebra is the branch of mathematics that deals with vector spaces and linear mappings between these
spaces. It encompasses the study of vectors, matrices, linear equations, and their properties.
B. Fundamental Concepts
1. Vectors
 Vectors are quantities that have both magnitude and direction, often represented as arrows in space.
 v=[2−14]v=2−14
2. Matrices
 Matrices are rectangular arrays of numbers, arranged in rows and columns.
 Matrices are used to represent linear transformations, systems of linear equations, and data
transformations in machine learning.
 Example: [123456789]147258369u = [3, 4] v = [-1, 2]
3. Scalars
 Scalars are single numerical values, without direction, magnitude only.
 Scalars are used to scale vectors or matrices through operations like multiplication.
 Example: Let’s consider a scalar, k= 3, and a vector [v=[2−14]][v=2−14]
Scalar multiplication involves multiplying each component of the vector by the scalar. So, if we
multiply the vector v by the scalar k=3 we get:
k⋅v=3⋅[2−14]=[3⋅23⋅(−1)3⋅4]=[6−312]k⋅v=3⋅2−14=3⋅23⋅(−1)3⋅4=6−312
C. Operations in Linear Algebra
1. Addition and Subtraction
 Addition and subtraction of vectors or matrices involve adding or subtracting corresponding
elements.
 Example: [u=[2−14],v=[30−2]][u=2−14,v=30−2]

addition: [u+v=[2−14]+[30−2]=[2+3−1+04+(−2)]=[5−12]][u+v=2−14+30−2=2+3−1+04+(−2)=5−12
]

subtraction: [u–v=[2−14]–[30−2]=[2−3−1−04−(−2)]=[−1−16]][u–v=2−14–30−2=2−3−1−04−(−2)=
−1−16]

2. Scalar Multiplication
 Scalar multiplication involves multiplying each element of a vector or matrix by a scalar.
 Example: Consider the scalar k=3 and a vector v = [2−14]2−14
scalar multiplication involves multiplying each component of the vector by the scalar. So, if we
multiply the vector v by the scalar k=3 , we get :
k⋅v=3⋅[2−14]=[3⋅23⋅(−1)3⋅4]=[6−312]k⋅v=3⋅2−14=3⋅23⋅(−1)3⋅4=6−312
3. Dot Product (Vector Multiplication)
 The dot product of two vectors measures the similarity of their directions.
 It is computed by multiplying corresponding elements of two vectors and summing the results.
 Example: For example, given two vectors(u=[u1,u2,u3] and v=[v1,v2,v3])(u=[u1,u2,u3] and v=[v1
,v2,v3]), their dot product is calculated as:
u⋅v=u1⋅v1+u2⋅v2+u3⋅v3u⋅v=u1⋅v1+u2⋅v2+u3⋅v3
4. Cross Product (Vector Multiplication)
 The cross product of two vectors in three-dimensional space produces a vector orthogonal to the
plane containing the original vectors.
 It is used less frequently in machine learning compared to the dot product.
Example: Given two vectors u and v, their cross product u×v is calculated as:
u×v=[u1u2u3]×[v1v2v3]=[u2v3–u3v2u3v1–u1v3u1v2–u2v1]u×v=u1u2u3×v1v2v3=u2v3–u3v2u3v1–u1v3
u1v2–u2v1

7.Hypothesis
A hypothesis in machine learning is the model’s presumption regarding the connection between the
input features and the result. It is an illustration of the mapping function that the algorithm is attempting to
discover using the training set. To minimize the discrepancy between the expected and actual outputs, the
learning process involves modifying the weights that parameterize the hypothesis.
How does a Hypothesis work?
In most supervised machine learning algorithms, our main goal is to find a possible hypothesis from the
hypothesis space that could map out the inputs to the proper outputs. The following figure shows the
common method to find out the possible hypothesis from the Hypothesis space:

Hypothesis Space (H)


Hypothesis space is the set of all the possible legal hypothesis. This is the set from which the machine
learning algorithm would determine the best possible (only one) which would best describe the target
function or the outputs.
Hypothesis (h)
A hypothesis is a function that best describes the target in supervised machine learning. The hypothesis that
an algorithm would come up depends upon the data and also depends upon the restrictions and bias that we
have imposed on the data.
The Hypothesis can be calculated as:
y=mx+by=mx+b
Where,
 y = range
 m = slope of the lines
 x = domain
 b = intercept
To better understand the Hypothesis Space and Hypothesis
consider the following coordinate that shows the distribution of
some data:
Say suppose we have test data for which we
have to determine the outputs or results.
The test data is as shown below:

We can predict the outcomes by dividing the coordinate as shown below:

So the test data would yield the following result:


But note here that we could have divided the coordinate plane as:

The way in which the coordinate would be divided depends on the data, algorithm and constraints.
 All these legal possible ways in which we can divide the coordinate plane to predict the outcome of
the test data composes of the Hypothesis Space.
 Each individual possible way is known as the hypothesis.
Hence, in this example the hypothesis space would be like:
Hypothesis Space and Representation
in Machine Learning
The hypothesis space comprises all
possible legal hypotheses that a machine
learning algorithm can consider. Hypotheses are
formulated based on various algorithms and
techniques, including linear regression, decision
trees, and neural networks. These hypotheses
capture the mapping function transforming input
data into predictions.
Hypothesis Formulation and Representation in Machine Learning
Hypotheses in machine learning are formulated based on various algorithms and techniques, each with its
representation. For example:
 Linear Regression: h(X)=θ0+θ1X1+θ2X2+…+θnXnh(X)=θ0+θ1X1+θ2X2+…+θnXn
 Decision Trees: h(X)=Tree(X)h(X)=Tree(X)
 Neural Networks: h(X)=NN(X)h(X)=NN(X)
Hypothesis Evaluation:
The process of machine learning involves not only formulating hypotheses but also evaluating their
performance. Common evaluation metrics include mean squared error (MSE), accuracy, precision, recall,
F1-score, and others. By comparing the predictions of the hypothesis with the actual outcomes on a
validation or test dataset, one can assess the effectiveness of the model.
Hypothesis Testing and Generalization:
Once a hypothesis is formulated and evaluated, the next step is to test its generalization capabilities.
Generalization refers to the ability of a model to make accurate predictions on unseen data

8.Inductive bias
Inductive bias can be defined as the set of assumptions or biases that a learning algorithm employs to
make predictions on unseen data based on its training data. These assumptions are inherent in the
algorithm’s design and serve as a foundation for learning and generalization.
The inductive bias of an algorithm influences how it selects a hypothesis (a possible explanation or model)
from the hypothesis space (the set of all possible hypotheses) that best fits the training data. It helps the
algorithm navigate the trade-off between fitting the training data perfectly (overfitting) and generalizing well
to unseen data (underfitting).
Types of Inductive Bias
Inductive bias can manifest in various forms, depending on the algorithm and its underlying assumptions.
Some common types of inductive bias include:
1. Bias towards simpler explanations: Many machine learning algorithms, such as decision trees and
linear models, have a bias towards simpler hypotheses. They prefer explanations that are more
parsimonious and less complex, as these are often more likely to generalize well to unseen data.
2. Bias towards smoother functions: Algorithms like kernel methods or Gaussian processes have a
bias towards smoother functions. They assume that neighboring points in the input space should have
similar outputs, leading to smooth decision boundaries.
3. Bias towards specific types of functions: Neural networks, for example, have a bias towards
learning complex, nonlinear functions. This bias allows them to capture intricate patterns in the data
but can also lead to overfitting if not regularized properly.
4. Bias towards sparsity: Some algorithms, like Lasso regression, have a bias towards sparsity. They
prefer solutions where only a few features are relevant, which can improve interpretability and
generalization.
Importance of Inductive Bias
Inductive bias is crucial in machine learning as it helps algorithms generalize from limited training data to
unseen data. Without a well-defined inductive bias, algorithms may struggle to make accurate predictions or
may overfit the training data, leading to poor performance on new data.
Understanding the inductive bias of an algorithm is essential for model selection, as different biases may be
more suitable for different types of data or tasks. It also provides insights into how the algorithm is learning
and what assumptions it is making about the data, which can aid in interpreting its predictions and results.
Challenges and Considerations
While inductive bias is essential for learning, it can also introduce limitations and challenges. Biases that are
too strong or inappropriate for the data can lead to poor generalization or biased predictions. Balancing bias
with variance (the variability of predictions) is a key challenge in machine learning, requiring careful tuning
and model selection.

9. Training and test sets


What is Training data?
Testing data is used to determine the performance of the trained model, whereas training data is used
to train the machine learning model. Training data is the power that supplies the model in machine learning,
it is larger than testing data. Because more data helps to more effective predictive models. When a machine
learning algorithm receives data from our records, it recognizes patterns and creates a decision-making
model.
Algorithms allow a company’s past experience to be used to make decisions. It analyzes all previous cases
and their results and, using this data creates models to score and predict the outcome of current cases. The
more data ML models have access to, the more reliable their predictions get over time.
What is Testing Data?
Unknown information to test your machine learning model after it was created (using your training
data). This data is known as testing data, and it may be used to assess the progress and efficiency of your
algorithms’ training as well as to modify or optimize them for better results.
 Showing the original set of data.
 Be large enough to produce reliable projections
This dataset needs to be “unseen” and recent. This is because the training data was already “learned” by your
model. You can decide if it is operating successfully or when it need more training data to fulfill your
standards by observing how it performs on fresh test data. Test data provides as a last, real check if an
unknown dataset was correctly trained by the machine learning algorithm.

Difference between Training data and Testing data


Features Training Data Testing Data

The machine-learning model


is trained using training data. Testing data is used to
The more training data a evaluate the model’s
model has, the more accurate performance.
Purpose predictions it can make.

Until evaluation, the testing


By using the training data,
data is not exposed to the
the model can gain
model. This guarantees that
knowledge and become
the model cannot learn the
more accurate in its
testing data by heart and
predictions.
Exposure produce flawless forecasts.

This training data


distribution should be The distribution of the testing
similar to the distribution of data and the data from the
actual data that the model real world differs greatly.
Distribution will use.

By making predictions on the


testing data and comparing
To stop overfitting, training
them to the actual labels, the
data is utilized.
performance of the model is
Use assessed.

Size Typically larger Typically smaller

The training data collection procedure consists of three steps:


 Feed – Providing data to a model.
 Define – The model converts training data into text vectors (numbers corresponding to data features).
 Test – Lastly, you put your model to the test by feeding it test data (unseen data).
10.Cross-Validation
Cross-validation is a technique for validating the model efficiency by training it on the subset of
input data and testing on previously unseen subset of the input data. We can also say that it is a technique
to check how a statistical model generalizes to an independent dataset.
The basic steps of cross-validations are:
o Reserve a subset of the dataset as a validation set.
o Provide the training to the model using the training dataset.
o Now, evaluate model performance using the validation set. If the model performs well with the
validation set, perform the further step, else check for the issues.
There are some common methods that are used for cross-validation. These methods are given below:
1. Validation Set Approach
2. Leave-P-out cross-validation
3. Leave one out cross-validation
4. K-fold cross-validation
5. Stratified k-fold cross-validation
Validation Set Approach
We divide our input dataset into a training set and test or validation set in the validation set approach. Both
the subsets are given 50% of the dataset.
But it has one of the big disadvantages that we are just using a 50% dataset to train our model, so the model
may miss out to capture important information of the dataset. It also tends to give the underfitted model.
Leave-P-out cross-validation
In this approach, the p datasets are left out of the training data. It means, if there are total n datapoints in the
original input dataset, then n-p data points will be used as the training dataset and the p data points as the
validation set. This complete process is repeated for all the samples, and the average error is calculated to
know the effectiveness of the model.
There is a disadvantage of this technique; that is, it can be computationally difficult for the large p.
Leave one out cross-validation
This method is similar to the leave-p-out cross-validation, but instead of p, we need to take 1 dataset out of
training. It means, in this approach, for each learning set, only one datapoint is reserved, and the remaining
dataset is used to train the model. This process repeats for each datapoint. Hence for n samples, we get n
different training set and n test set. It has the following features:
o In this approach, the bias is minimum as all the data points are used.
o The process is executed for n times; hence execution time is high.
o This approach leads to high variation in testing the effectiveness of the model as we iteratively check
against one data point.
K-Fold Cross-Validation
K-fold cross-validation approach divides the input dataset into K groups of samples of equal
sizes. These samples are called folds. For each learning set, the prediction function uses k-1 folds,
and the rest of the folds are used for the test set. This approach is a very popular CV approach
because it is easy to understand, and the output is less biased than other methods.
The steps for k-fold cross-validation are:
o Split the input dataset into K groups
o For each group:
o Take one group as the reserve or test data set.
o Use remaining groups as the training dataset
o Fit the model on the training set and evaluate the performance of the model using the test set.
Let's take an example of 5-folds cross-validation. So, the dataset is grouped into 5 folds. On 1st iteration, the
first fold is reserved for test the model, and rest are used to train the model. On 2nd iteration, the second fold
is used to test the model, and rest are used to train the model. This process will continue until each fold is
not used for the test fold.
Consider the below diagram:
Stratified k-fold cross-validation
This technique is similar to k-fold cross-validation with some little changes. This approach works on
stratification concept, it is a process of rearranging the data to ensure that each fold or group is a good
representative of the complete dataset. To deal with the bias and variance, it is one of the best approaches.
It can be understood with an example of housing prices, such that the price of some houses can be much
high than other houses. To tackle such situations, a stratified k-fold cross-validation technique is useful.

11. Concept of over fitting and underfitting


Overfitting and Underfitting are the two main problems that occur in machine learning and degrade
the performance of the machine learning models.
The main goal of each machine learning model is to generalize well. Here generalization defines
the ability of an ML model to provide a suitable output by adapting the given set of unknown input. It means
after providing training on the dataset, it can produce reliable and accurate output. Hence, the underfitting
and overfitting are the two terms that need to be checked for the performance of the model and whether the
model is generalizing well or not.
Before understanding the overfitting and underfitting, let's understand some basic term that will help to
understand this topic well:
o Signal: It refers to the true underlying pattern of the data that helps the machine learning model to
learn from the data.
o Noise: Noise is unnecessary and irrelevant data that reduces the performance of the model.
o Bias: Bias is a prediction error that is introduced in the model due to oversimplifying the machine
learning algorithms. Or it is the difference between the predicted values and the actual values.
o Variance: If the machine learning model performs well with the training dataset, but does not
perform well with the test dataset, then variance occurs.
Overfitting
Overfitting occurs when our machine learning model tries to cover all the data points or more than the
required data points present in the given dataset. Because of this, the model starts caching noise and
inaccurate values present in the dataset, and all these factors reduce the efficiency and accuracy of the
model. The overfitted model has low bias and high variance.
The chances of occurrence of overfitting increase as much we provide training to our model. It means the
more we train our model, the more chances of occurring the overfitted model.
Overfitting is the main problem that occurs in supervised learning.
Example: The concept of the overfitting can be understood by the below graph of the linear regression
output:

As we can see from the above graph, the model


tries to cover all the data points present in the scatter
plot. It may look efficient, but in reality, it is not so.
Because the goal of the regression model to find the
best fit line, but here we have not got any best fit, so, it
will generate the prediction errors.

How to avoid the Overfitting in Model


Both overfitting and underfitting cause the degraded performance of the machine learning model. But the
main cause is overfitting, so there are some ways by which we can reduce the occurrence of overfitting in
our model.
o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling
Underfitting
Underfitting occurs when our machine learning model is not able to capture the underlying trend of the data.
To avoid the overfitting in the model, the fed of training data can be stopped at an early stage, due to which
the model may not learn enough from the training data. As a result, it may fail to find the best fit of the
dominant trend in the data.
In the case of underfitting, the model is not able to learn enough from the training data, and hence it reduces
the accuracy and produces unreliable predictions.
An underfitted model has high bias and low variance.
Example: We can understand the underfitting using below output of the linear regression model:

As we can see from the above diagram, the


model is unable to capture the data points present in the
plot.
How to avoid underfitting:
o By increasing the training time of the model.
 By increasing the number of features.

12.Bias and Variance


Bias is simply defined as the inability of the model because of that there is some difference or error
occurring between the model’s predicted value and the actual value. These differences between actual or
expected values and the predicted values are known as error or bias error or error due to bias. Bias is a
systematic error that occurs due to wrong assumptions in the machine learning process.
Let YY be the true value of a parameter, and let Y^Y^ be an estimator of YY based on a sample of data.
Then, the bias of the estimator Y^Y^ is given by:
Bias(Y^)=E(Y^)–YBias(Y^)=E(Y^)–Y
where E(Y^) E(Y^) is the expected value of the estimator Y^Y^. It is the measurement of the model that
how well it fits the data.
 Low Bias: Low bias value means fewer assumptions are taken to build the target function. In this
case, the model will closely match the training dataset.
 High Bias: High bias value means more assumptions are taken to build the target function. In this
case, the model will not match the training dataset closely.
Ways to reduce high bias in Machine Learning:
 Use a more complex model: One of the main reasons for high bias is the very simplified model. it
will not be able to capture the complexity of the data. In such cases, we can make our mode more
complex by increasing the number of hidden layers in the case of a deep neural network. Or we can
use a more complex model like Polynomial regression for non-linear datasets, CNN for image
processing, and RNN for sequence learning.
 Increase the number of features: By adding more features to train the dataset will increase the
complexity of the model. And improve its ability to capture the underlying patterns in the data.
 Reduce Regularization of the model: Regularization techniques such as L1 or L2
regularization can help to prevent overfitting and improve the generalization ability of the model. if
the model has a high bias, reducing the strength of regularization or removing it altogether can help
to improve its performance.
 Increase the size of the training data: Increasing the size of the training data can help to reduce
bias by providing the model with more examples to learn from the dataset.
What is Variance?
Variance is the measure of spread in data from its mean position. In machine learning variance is the amount
by which the performance of a predictive model changes when it is trained on different subsets of the
training data. More specifically, variance is the variability of the model that how much it is sensitive to
another subset of the training dataset. i.e. how much it can adjust on the new subset of the training dataset.
Let Y be the actual values of the target variable, and Y^ Y^ be the predicted values of the target variable.
Then the variance of a model can be measured as the expected value of the square of the difference between
predicted values and the expected value of the predicted values.
Variance=E[(Y^–E[Y^])2]Variance=E[(Y^–E[Y^])2]
where E[Yˉ]E[Yˉ] is the expected value of the predicted values. Here expected value is averaged over all the
training data.
Variance errors are either low or high-variance errors.
 Low variance: Low variance means that the model is less sensitive to changes in the training data
and can produce consistent estimates of the target function with different subsets of data from the
same distribution. This is the case of underfitting when the model fails to generalize on both training
and test data.
 High variance: High variance means that the model is very sensitive to changes in the training data
and can result in significant changes in the estimate of the target function when trained on different
subsets of data from the same distribution. This is .
 Cross-validation: By splitting the data into training and testing sets multiple times, cross-validation
can help identify if a model is overfitting or underfitting and can be used to tune hyperparameters to
reduce variance.
 Feature selection: By choosing the only relevant feature will decrease the model’s complexity. and
it can reduce the variance error.
 Regularization: We can use L1 or L2 regularization to reduce variance in machine learning models
 Ensemble methods: It will combine multiple models to improve generalization
performance. Bagging, boosting, and stacking are common ensemble methods that can help reduce
variance and improve generalization performance.
 Simplifying the model: Reducing the complexity of the model, such as decreasing the number of
parameters or layers in a neural network, can also help reduce variance and improve generalization
performance.
 Early stopping: Early stopping is a technique used to prevent overfitting by stopping the training of
the deep learning model when the performance on the validation set stops improving.

13. Regression: Linear Regression


Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a
statistical method that is used for predictive analysis. Linear regression makes predictions for
continuous/real or numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the linear
relationship, which means it finds how the value of the dependent variable is changing according to the
value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship between the
variables. Consider the below image:
Mathematically, we can represent a linear regression as:
y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of
freedom)
a1 = Linear regression coefficient (scale factor to each
input value).
ε = random erro
The values for x and y variables are training
datasets for Linear Regression model representation.
Types of Linear Regression
Linear regression can be further divided into two types of the algorithm:
o Simple Linear Regression:
If a single independent variable is used to predict the value of a numerical dependent variable, then
such a Linear Regression algorithm is called Simple Linear Regression.
o Multiple Linear regression:
If more than one independent variable is used to predict the value of a numerical dependent variable,
then such a Linear Regression algorithm is called Multiple Linear Regression.
Linear Regression Line
A linear line showing the relationship between the dependent and independent variables is called
a regression line. A regression line can show two types of relationship:
o Positive Linear Relationship:
If the dependent variable increases on the Y-axis and independent variable increases on X-axis, then
such a relationship is termed as a Positive linear relationship
o Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent variable increases on the X-axis,
then such a relationship is called a negative linear relationship.

Finding the best fit line:


When working with linear regression, our main goal is to find the best fit line that means the error between
predicted values and actual values should be minimized. The best fit line will have the least error.
The different values for weights or the coefficient of lines (a0, a1) gives a different line of regression, so we
need to calculate the best values for a0 and a1 to find the best fit line, so to calculate this we use cost
function.
Cost function-
o The different values for weights or coefficient of lines (a0, a1) gives the different line of regression,
and the cost function is used to estimate the values of the coefficient for the best fit line.
o Cost function optimizes the regression coefficients or weights. It measures how a linear regression
model is performing.
o We can use the cost function to find the accuracy of the mapping function, which maps the input
variable to the output variable. This mapping function is also known as Hypothesis function.
For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average of
squared error occurred between the predicted values and actual values. It can be written as:
For the above linear equation, MSE can be calculated as:

Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.
Residuals: The distance between the actual value and predicted values is called residual. If the observed
points are far from the regression line, then the residual will be high, and so cost function will high. If the
scatter points are close to the regression line, then the residual will be small and hence the cost function.
14.Logistic Regression
o Logistic regression is one of the most popular Machine Learning algorithms, which comes under the
Supervised Learning technique. It is used for predicting the categorical dependent variable using a
given set of independent variables.
o Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome
must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0
and 1.
o Logistic Regression is much similar to the Linear Regression except that how they are used. Linear
Regression is used for solving Regression problems, whereas Logistic regression is used for solving
the classification problems.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function,
which predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the likelihood of something such as whether the cells
are cancerous or not, a mouse is obese or not based on its weight, etc.
o Logistic Regression is a significant machine learning algorithm because it has the ability to provide
probabilities and classify new data using continuous and discrete datasets.
o Logistic Regression can be used to classify the observations using different types of data and can
easily determine the most effective variables used for the classification. The below image is showing
the logistic function:

Logistic Function (Sigmoid Function):


The sigmoid function is a mathematical function
used to map the predicted values to probabilities.
It maps any real value into another value within
a range of 0 and 1.
The value of the logistic regression must be
between 0 and 1, which cannot go beyond this limit, so
it forms a curve like the "S" form. The S form curve is
called the Sigmoid function or the logistic function.
o In logistic regression, we use the concept of the threshold value, which defines the probability of
either 0 or 1. Such as values above the threshold value tends to 1, and a value below the threshold
values tends to 0.
Assumptions for Logistic Regression:
o The dependent variable must be categorical in nature.
o The independent variable should not have multi-collinearity.
Logistic Regression Equation:
The Logistic regression equation can be obtained from the Linear Regression equation. The mathematical
steps to get Logistic Regression equations are given below:
o We know the equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above equation by
(1-y):

o But we need range between -[infinity] to +[infinity], then take logarithm of the equation it will
become:

The above equation is the final equation for Logistic Regression.


Type of Logistic Regression:
On the basis of the categories, Logistic Regression can be classified into three types:
o Binomial: In binomial Logistic regression, there can be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
o Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types of
the dependent variable, such as "cat", "dogs", or "sheep"
o Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent
variables, such as "low", "Medium", or "High".

Difference between Linear Regression and Logistic Regression

Linear Regression Logistic Regression

Linear regression is used to predict the continuous Logistic Regression is used to predict the categorical
dependent variable using a given set of independent dependent variable using a given set of independent
variables. variables.

Linear Regression is used for solving Regression Logistic regression is used for solving Classification
problem. problems.

In Linear regression, we predict the value of In logistic Regression, we predict the values of
continuous variables. categorical variables.

In linear regression, we find the best fit line, by In Logistic Regression, we find the S-curve by
which we can easily predict the output. which we can classify the samples.

Least square estimation method is used for Maximum likelihood estimation method is used for
estimation of accuracy. estimation of accuracy.

The output for Linear Regression must be a The output of Logistic Regression must be a
continuous value, such as price, age, etc. Categorical value such as 0 or 1, Yes or No, etc.

In Linear regression, it is required that relationship In Logistic regression, it is not required to have the
between dependent variable and independent linear relationship between the dependent and
variable must be linear. independent variable.

In linear regression, there may be collinearity bet In logistic regression, there should not be
ween he independent variables. collinearity between the independent variable.

You might also like