0% found this document useful (0 votes)
21 views

Unit1 ML

Uploaded by

bhuvana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Unit1 ML

Uploaded by

bhuvana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

lOMoARcPSD|9899035

UNIT-1 INTRODUCTION TO MACHINE LEARNING 9 8

Review of Linear Algebra for machine learning; Introduction and


motivation for machine learning; Examples of machine learning
applications, Vapnik-Chervonenkis (VC) dimension, Probably
Approximately Correct (PAC) learning, Hypothesis spaces,
Inductive bias, Generalization, Bias variance trade-off.
Explain in detail why to learn linear algebra before machine learning?

1.Linear Algebra for Machine learning

• Machine learning has a strong connection with mathematics. Each machine learning
algorithm is based on the concepts of mathematics & also with the help of
mathematics, one can choose the correct algorithm by considering training time,
complexity, number of features, etc.
• Linear Algebra is an essential field of mathematics, which defines the study of
vectors, matrices, planes, mapping, and lines required for linear transformation.

• Linear algebra plays a vital role and key foundation in machine learning, and it
enables ML algorithms to run on a huge number of datasets.
• The concepts of linear algebra are widely used in developing algorithms in machine
learning. Although it is used almost in each concept of Machine learning, specifically,
it can perform the following task:

o Optimization of data.
o Applicable in loss functions, regularisation, covariance matrices, Singular Value
Decomposition (SVD), Matrix Operations, and support vector machine classification.
o Implementation of Linear Regression in Machine Learning.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

Besides the above uses, linear algebra is also used in neural networks and the data science
field.

• Basic mathematics principles and concepts like Linear algebra are the foundation of
Machine Learning and Deep Learning systems.
• To learn and understand Machine Learning or Data Science, one needs to be familiar
with linear algebra and optimization theory.

1.1 Why learn Linear Algebra before learning Machine Learning?

Linear Algebra is just similar to the flour of bakery in Machine Learning. As the cake is based
on flour similarly, every Machine Learning Model is also based on Linear Algebra. Further,
the cake also needs more ingredients like egg, sugar, cream, soda. Similarly, Machine
Learning also requires more concepts as vector calculus, probability, and optimization
theory. So, we can say that Machine Learning creates a useful model with the help of the
above-mentioned mathematical concepts.

Below are some benefits of learning Linear Algebra before Machine learning:

o Better Graphic experience


o Improved Statistics
o Creating better Machine Learning algorithms
o Estimating the forecast of Machine Learning
o Easy to Learn

Better Graphics Experience:

• Linear Algebra helps to provide better graphical processing in Machine Learning like
Image, audio, video, and edge detection.
• These are the various graphical representations supported by Machine Learning
projects that you can work on.
• Further, parts of the given data set are trained based on their categories by classifiers
provided by machine learning algorithms.
• These classifiers also remove the errors from the trained data.

Moreover, Linear Algebra helps solve and compute large and complex data set through a
specific terminology named Matrix Decomposition Techniques. There are two most
popular matrix decomposition techniques, which are as follows:

o Q-R
o L-U

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

Improved Statistics:

Statistics is an important concept to organize and integrate data in Machine Learning. Also,
linear Algebra helps to understand the concept of statistics in a better manner. Advanced
statistical topics can be integrated using methods, operations, and notations of linear
algebra.

Creating better Machine Learning algorithms:

Linear Algebra also helps to create better supervised as well as unsupervised Machine
Learning algorithms.

Few supervised learning algorithms can be created using Linear Algebra, which is as follows:

o Logistic Regression
o Linear Regression
o Decision Trees
o Support Vector Machines (SVM)

Further, below are some unsupervised learning algorithms listed that can also be created
with the help of linear algebra as follows:

o Single Value Decomposition (SVD)


o Clustering
o Components Analysis

With the help of Linear Algebra concepts, you can also self-customize the various parameters
in the live project and understand in-depth knowledge to deliver the same with more
accuracy and precision.

Estimating the forecast of Machine Learning:

If you are working on a Machine Learning project, then you must be a broad-minded person
and also, you will be able to impart more perspectives. Hence, in this regard, you must
increase the awareness and affinity of Machine Learning concepts. You can begin with setting
up different graphs, visualization, using various parameters for diverse machine learning
algorithms or taking up things that others around you might find difficult to understand.

Easy to Learn:

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

Linear Algebra is an important department of Mathematics that is easy to understand. It is


taken into consideration whenever there is a requirement of advanced mathematics and its
applications.

1.2 Minimum Linear Algebra for Machine Learning

Notation:

Notation in linear algebra enables you to read algorithm descriptions in papers, books, and
websites to understand the algorithm's working. Even if you use for-loops rather than matrix
operations, you will be able to piece things together.

Operations:

Working with an advanced level of abstractions in vectors and matrices can make concepts
clearer, and it can also help in the description, coding, and even thinking capability. In linear
algebra, it is required to learn the basic operations such as addition, multiplication,
inversion, transposing of matrices, vectors, etc.

Matrix Factorization:

One of the most recommended areas of linear algebra is matrix factorization, specifically
matrix deposition methods such as SVD and QR.

Describe briefly about some examples of linear algebra in machine learning?

1.3 Examples of Linear Algebra in Machine Learning

Below are some popular examples of linear algebra in Machine learning:

o Datasets and Data Files


o Linear Regression
o Recommender Systems
o One-hot encoding
o Regularization
o Principal Component Analysis
o Images and Photographs
o Singular-Value Decomposition
o Deep Learning
o Latent Semantic Analysis

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

1. Datasets and Data Files

• Each machine learning project works on the dataset, and we fit the machine learning
model using this dataset.
• Each dataset resembles a table-like structure consisting of rows and columns. Where
each row represents observations, and each column represents features/Variables.
This dataset is handled as a Matrix, which is a key data structure in Linear Algebra.
• Further, when this dataset is divided into input and output for the supervised learning
model, it represents a Matrix(X) and Vector(y), where the vector is also an important
concept of linear algebra.

2. Images and Photographs

• In machine learning, images/photographs are used for computer vision applications.


Each Image is an example of the matrix from linear algebra because an image is a table
structure consisting of height and width for each pixel.
• Moreover, different operations on images, such as cropping, scaling, resizing, etc., are
performed using notations and operations of Linear Algebra.

3. One Hot Encoding

• In machine learning, sometimes, we need to work with categorical data. These


categorical variables are encoded to make them simpler and easier to work with, and
the popular encoding technique to encode these variables is known as one-hot
encoding.
• In the one-hot encoding technique, a table is created that shows a variable with one
column for each category and one row for each example in the dataset. Further, each
row is encoded as a binary vector, which contains either zero or one value. This is an
example of sparse representation, which is a subfield of Linear Algebra.

4. Linear Regression

• Linear regression is a popular technique of machine learning borrowed from


statistics.
• It describes the relationship between input and output variables and is used in
machine learning to predict numerical values.
• The most common way to solve linear regression problems using Least Square
Optimization is solved with the help of Matrix factorization methods.
• Some commonly used matrix factorization methods are LU decomposition, or
Singular-value decomposition, which are the concept of linear algebra.

5. Regularization

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

• In machine learning, we usually look for the simplest possible model to achieve the
best outcome for the specific problem.
• Simpler models generalize well, ranging from specific examples to unknown datasets.
These simpler models are often considered models with smaller coefficient values.
• A technique used to minimize the size of coefficients of a model while it is being fit on
data is known as regularization.
• Common regularization techniques are L1 and L2 regularization.
• Both of these forms of regularization are, in fact, a measure of the magnitude or length
of the coefficients as a vector and are methods lifted directly from linear algebra
called the vector norm.

6. Principal Component Analysis

• Generally, each dataset contains thousands of features, and fitting the model with
such a large dataset is one of the most challenging tasks of machine learning.
• Moreover, a model built with irrelevant features is less accurate than a model built
with relevant features.
• There are several methods in machine learning that automatically reduce the number
of columns of a dataset, and these methods are known as Dimensionality reduction.
• The most commonly used dimensionality reductions method in machine learning
is Principal Component Analysis or PCA.
• This technique makes projections of high-dimensional data for both visualizations
and training models. PCA uses the matrix factorization method from linear algebra.

7. Singular-Value Decomposition

• Singular-Value decomposition is also one of the popular dimensionality reduction


techniques and is also written as SVD in short form.
• It is the matrix-factorization method of linear algebra, and it is widely used in
different applications such as feature selection, visualization, noise reduction, and
many more.

8. Latent Semantic Analysis

• Natural Language Processing or NLP is a subfield of machine learning that works with
text and spoken words.
• NLP represents a text document as large matrices with the occurrence of words.
• For example, the matrix column may contain the known vocabulary words, and rows
may contain sentences, paragraphs, pages, etc., with cells in the matrix marked as the
count or frequency of the number of times the word occurred.
• It is a sparse matrix representation of text. Documents processed in this way are much
easier to compare, query, and use as the basis for a supervised machine learning
model.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

• This form of data preparation is called Latent Semantic Analysis, or LSA for short, and
is also known by the name Latent Semantic Indexing or LSI.

9. Recommender System

• A recommender system is a sub-field of machine learning, a predictive modelling


problem that provides recommendations of products.
• For example, online recommendation of books based on the customer's previous
purchase history, recommendation of movies and TV series, as we see in Amazon &
Netflix.
• The development of recommender systems is mainly based on linear algebra
methods.
• We can understand it as an example of calculating the similarity between sparse
customer behaviour vectors using distance measures such as Euclidean distance or
dot products.
• Different matrix factorization methods such as singular-value decomposition are
used in recommender systems to query, search, and compare user data.

10. Deep Learning

• Artificial Neural Networks or ANN are the non-linear ML algorithms that work to
process the brain and transfer information from one layer to another in a similar way.
• Deep learning studies these neural networks, which implement newer and faster
hardware for the training and development of larger networks with a huge dataset.
• All deep learning methods achieve great results for different challenging tasks such
as machine translation, speech recognition, etc.
• The core of processing neural networks is based on linear algebra data structures,
which are multiplied and added together.
• Deep learning algorithms also work with vectors, matrices, tensors (matrix with more
than two dimensions) of inputs and coefficients for multiple dimensions.

Explain in detail about machine learning concepts with differentlearning types.

Discuss with examples some useful applications of machine learning.

Differentiate supervised, unsupervised and reinforcement learning.

2.Introduction to machine learning

• Machine learning is the field of study that allows computers to learn without being
explicitly programmed.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

• Using machine learning, we don't need to provide explicit instructions to Computers


for reacting to some special situations.
• We need to provide training to the computers to find real-time solutions for the
specific problems.
• The chess game is a famous example where machine learning is being used to play
chess.
• The code lets the machine learn and optimizes itself over repeated games.

Machine Learning is broadly classified into two main categories.

1. Supervised Machine Learning


2. Unsupervised Machine Learning

2.1 Supervised Learning

• Supervised learning is similar to having a trainer or teacher who supervises all the
machines' reactions and tells step-by-step solutions to specific problems.
• It's like a hand-holding way of teaching the computers what to do.
• One real-time example of supervised learning is recognizing different types of images
using computers.
• Being humans, we also learn by this model as we are taught to recognize different
objects like a car by repeat exposures. In the same way, machines are taught.
• We feed a different set of some specific images into a machine where each image has
a specific identifier to identify the type of the image.
• However, computers are taught so that every time the particular blend of pixels
comes in front of the computer; it can recognize the type of image loaded into the
model dataset.
• Supervised learning works in a way that the computer can learn through the previous
exposures; for example, if a computer sees a car object and recognizes it like a car,
then next time, it should be able to identify any different image of car object by
identifying a lot of features that are similar to previously identified images of Car.
• When we train a machine learning model for image recognition, we present many
images where every single image is attached to a label so that the data can be clearly
labeled and gets stored in a machine learning model.
• Once we complete the training, we should present an object's image that is not part
of the training data, and the machine should be able to identify it by classifying all its
previous learnings.
• This is the most fundamental type of Supervised learning, which is called
Classification.
• Out machine learning model must be able to classify the different bunch of images.
• It must be programmed so that the different objects can be recognized according to
their unique characteristics.
• However, we can create a generic classifier so that it is not dependent on learning
data. We don't need to recode the entire model on changing the training data.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

2.2 Unsupervised Learning

• In Supervised learning, the specific kind of dataset is loaded into computers to learn
through the repeat exposures to the dataset.
• Instead of providing training data where every piece of data is clearly labeled, we
provide the unstructured training data in unsupervised learning.
• We want the model to sense the dataset so that it learns to find the structure in
unstructured data.
• In other words, we can say, in unsupervised learning, we don't tell computers the kind
of data.
• Instead, we want the computers to see the structure in the data by observing which
way the data is being organized.
• One type of Unsupervised learning is called clustering, in which the computer looks
at the dataset and its features and can figure out the separate clusters in which the
data is maintained.

2.3 Reinforcement Learning

• We have covered supervised learning in which we have loaded the labeled training
data in the machine learning models so that the computers can classify the data and
performs regressions to identify the dataset.
• We have also covered Unsupervised learning. We have loaded the unstructured
unlabeled dataset grouped in separate clusters, and we want computers to be smart
enough to identify the separate clusters.
• As humans, we are much experienced in reinforcement learning. We tend to learn
through reinforcement.
• For example, if driving through a route that is full of traffic jam, we'll ignore going
through the same route on other days.
• There are two kinds of reinforcements we generally come through, 1. Positive, 2.
Negative Reinforcement.
• The same way machine works in the case of Reinforcement Algorithms.
• One of the real-time examples of reinforcement learning is a Chess game where the
Computer with the Reinforcement learning algorithm calculates the winning
probability with every move.
• The computer might come through positive as well negative reinforcement with
every single move.
• However, through many cycles of training and by practicing more and more games,
the computers will learn about which moves in which situations will lead to an
increase in its winning percentage.
3. Applications of Machine learning

Machine learning is a buzzword for today's technology, and it is growing very rapidly day by
day. We are using machine learning in our daily life even without knowing it such as Google

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

Maps, Google assistant, Alexa, etc. Below are some most trending real-world applications of
Machine Learning:

1. Image Recognition:

Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image
recognition and face detection is, Automatic friend tagging suggestion:

Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a


photo with our Facebook friends, then we automatically get a tagging suggestion with name,
and the technology behind this is machine learning's face detection and recognition
algorithm.

It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.

2. Speech Recognition

While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.

Speech recognition is a process of converting voice instructions into text, and it is also known
as "Speech to text", or "Computer speech recognition." At present, machine learning
algorithms are widely used by various applications of speech recognition. Google
assistant, Siri, Cortana, and Alexa are using speech recognition technology to follow the
voice instructions.

3. Traffic prediction:

If we want to visit a new place, we take help of Google Maps, which shows us the correct path
with the shortest route and predicts the traffic conditions.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:

o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes information
from the user and sends back to its database to improve the performance.

4. Product recommendations:

Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for
some product on Amazon, then we started getting an advertisement for the same product
while internet surfing on the same browser and this is because of machine learning.

Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.

As similar, when we use Netflix, we find some recommendations for entertainment series,
movies, etc., and this is also done with the help of machine learning.

5. Self-driving cars:

One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars. Tesla, the most popular car
manufacturing company is working on self-driving car. It is using unsupervised learning
method to train the car models to detect people and objects while driving.

6. Email Spam and Malware Filtering:

Whenever we receive a new email, it is filtered automatically as important, normal, and


spam. We always receive an important mail in our inbox with the important symbol and
spam emails in our spam box, and the technology behind this is Machine learning. Below are
some spam filters used by Gmail:

o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree,


and Naïve Bayes classifier are used for email spam filtering and malware detection.

7. Virtual Personal Assistant:

We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri.
As the name suggests, they help us in finding the information using our voice instruction.
These assistants can help us in various ways just by our voice instructions such as Play music,
call someone, Open an email, Scheduling an appointment, etc.

These virtual assistants use machine learning algorithms as an important part.

These assistant record our voice instructions, send it over the server on a cloud, and decode
it using ML algorithms and act accordingly.

8. Online Fraud Detection:

Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various ways that
a fraudulent transaction can take place such as fake accounts, fake ids, and steal money in
the middle of a transaction. So to detect this, Feed Forward Neural network helps us by
checking whether it is a genuine transaction or a fraud transaction.

For each genuine transaction, the output is converted into some hash values, and these
values become the input for the next round. For each genuine transaction, there is a specific
pattern which gets change for the fraud transaction hence, it detects it and makes our online
transactions more secure.

9. Stock Market trading:

Machine learning is widely used in stock market trading. In the stock market, there is always
a risk of up and downs in shares, so for this machine learning's long short term memory
neural network is used for the prediction of stock market trends.

10. Medical Diagnosis:

In medical science, machine learning is used for diseases diagnoses. With this, medical
technology is growing very fast and able to build 3D models that can predict the exact
position of lesions in the brain.

It helps in finding brain tumors and other brain-related diseases easily.

11. Automatic Language Translation:

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

Nowadays, if we visit a new place and we are not aware of the language then it is not a
problem at all, as for this also machine learning helps us by converting the text into our
known languages. Google's GNMT (Google Neural Machine Translation) provide this feature,
which is a Neural Machine Learning that translates the text into our familiar language, and it
called as automatic translation.

The technology behind the automatic translation is a sequence to sequence learning


algorithm, which is used with image recognition and translates the text from one language
to another language.

Give a detailed note on hypothesis in machine learning and statistics?

4. Hypothesis in Machine Learning

• The hypothesis is a common term in Machine Learning and data science projects.
• As we know, machine learning is one of the most powerful technologies across the
world, which helps us to predict results based on past experiences.
• Moreover, data scientists and ML professionals conduct experiments that aim to solve
a problem.
• These ML professionals and data scientists make an initial assumption for the
solution of the problem.
• This assumption in Machine learning is known as Hypothesis.
• In Machine Learning, at various times, Hypothesis and Model are used
interchangeably.
• However, a Hypothesis is an assumption made by scientists, whereas a model is a
mathematical representation that is used to test the hypothesis.

What is Hypothesis?

The hypothesis is defined as the supposition or proposed explanation based on


insufficient evidence or assumptions. It is just a guess based on some known facts but has
not yet been proven. A good hypothesis is testable, which results in either true or false.

Example: Let's understand the hypothesis with a common example. Some scientist claims
that ultraviolet (UV) light can damage the eyes then it may also cause blindness.

In this example, a scientist just claims that UV rays are harmful to the eyes, but we assume
they may cause blindness. However, it may or may not be possible. Hence, these types of
assumptions are called a hypothesis.

4.1 Hypothesis in Machine Learning (ML)

The hypothesis is one of the commonly used concepts of statistics in Machine Learning. It is
specifically used in Supervised Machine learning, where an ML model learns a function that
best maps the input to corresponding outputs with the help of an available dataset.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

In supervised learning techniques, the main aim is to determine the possible hypothesis out
of hypothesis space that best maps input to the corresponding or correct outputs.

There are some common methods given to find out the possible hypothesis from the
Hypothesis space, where hypothesis space is represented by uppercase-h (H) and
hypothesis by lowercase-h (h). Th ese are defined as follows:

Hypothesis space (H):

Hypothesis space is defined as a set of all possible legal hypotheses; hence it is also
known as a hypothesis set. It is used by supervised machine learning algorithms to
determine the best possible hypothesis to describe the target function or best maps input to
output.

It is often constrained by choice of the framing of the problem, the choice of model, and the
choice of model configuration.

Hypothesis (h):

It is defined as the approximate function that best describes the target in supervised machine
learning algorithms. It is primarily based on data as well as bias and restrictions applied to
data.

Hence hypothesis (h) can be concluded as a single hypothesis that maps input to proper
output and can be evaluated as well as used to make predictions.

The hypothesis (h) can be formulated in machine learning as follows:

y= mx + b

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

Where,

Y: Range

m: Slope of the line which divided test data or changes in y divided by change in x.

x: domain

c: intercept (constant)

Example: Let's understand the hypothesis (h) and hypothesis space (H) with a two-
dimensional coordinate plane showing the distribution of data as follows:

Now, assume we have some test data by which ML algorithms predict the outputs for input
as follows:

If we divide this coordinate plane in such as way that it can help you to predict output or
result as follows:

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

Based on the given test data, the output result will be as follows:

However, based on data, algorithm, and constraints, this coordinate plane can also be divided
in the following ways as follows:

With the above example, we can conclude that;

Hypothesis space (H) is the composition of all legal best possible ways to divide the
coordinate plane so that it best maps input to proper output.

Further, each individual best possible way is called a hypothesis (h). Hence, the hypothesis
and hypothesis space would be like this:

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

4.2 Hypothesis in Statistics

Similar to the hypothesis in machine learning, it is also considered an assumption of the


output. However, it is falsifiable, which means it can be failed in the presence of sufficient
evidence.

Unlike machine learning, we cannot accept any hypothesis in statistics because it is just an
imaginary result and based on probability. Before start working on an experiment, we must
be aware of two important types of hypotheses as follows:

o Null Hypothesis: A null hypothesis is a type of statistical hypothesis which tells that
there is no statistically significant effect exists in the given set of observations. It is
also known as conjecture and is used in quantitative analysis to test theories about
markets, investment, and finance to decide whether an idea is true or false.
o Alternative Hypothesis: An alternative hypothesis is a direct contradiction of the
null hypothesis, which means if one of the two hypotheses is true, then the other must
be false. In other words, an alternative hypothesis is a type of statistical hypothesis
which tells that there is some significant effect that exists in the given set of
observations.

Significance level

• The significance level is the primary thing that must be set before starting an
experiment.
• It is useful to define the tolerance of error and the level at which effect can be
considered significantly.
• During the testing process in an experiment, a 95% significance level is accepted, and
the remaining 5% can be neglected.
• The significance level also tells the critical or threshold value. For e.g., in an
experiment, if the significance level is set to 98%, then the critical value is 0.02%.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

P-value

• The p-value in statistics is defined as the evidence against a null hypothesis.


• In other words, P-value is the probability that a random chance generated the data
or something else that is equal or rarer under the null hypothesis condition.
• If the p-value is smaller, the evidence will be stronger, and vice-versa which means
the null hypothesis can be rejected in testing.
• It is always represented in a decimal form, such as 0.035.
• Whenever a statistical test is carried out on the population and sample to find out P-
value, then it always depends upon the critical value.
• If the p-value is less than the critical value, then it shows the effect is significant, and
the null hypothesis can be rejected.
• Further, if it is higher than the critical value, it shows that there is no significant effect
and hence fails to reject the Null Hypothesis.

What Is Inductive Bias in Machine Learning?

5.Inductive Bias:

Definition

Every machine learning model requires some type of architecture design and possibly
some initial assumptions about the data we want to analyze. Generally, every building
block and every belief that we make about the data is a form of inductive bias.
• Inductive biases play an important role in the ability of machine learning models to
generalize to the unseen data.
• A strong inductive bias can lead our model to converge to the global optimum.
• On the other hand, a weak inductive bias can cause the model to find only the local
optima and be greatly affected by random changes in the initial states.
We can categorize inductive biases into two different groups called relational and non-
relational. The former represents the relationship between entities in the network, while
the latter is a set of techniques that further constrain the learning algorithm.

5.1 Inductive Biases in Machine Learning

In traditional machine learning, every algorithm has its own inductive biases. In this
section, we mention some of these algorithms.
5.1.1 Bayesian Models
• Inductive bias in Bayesian models shows itself in the form of the prior distributions
that we choose for the variables.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

• Consequently, the prior can shape the posterior distribution in a way that the latter
can turn out to be a similar distribution to the former.
• In addition, we assume that the variables are conditionally independent, meaning
that given the parents of a node in the network, it’ll be independent from its
ancestors.
• As a result, we can make use of conditional probability to make the inference.
• Also, the structure of the Bayesian net can facilitate the analysis of causal
relationships between entities.

5.1.2. k-Nearest Neighbors (k-NN) Algorithm

The k-Nearest Neighbors(k-NN) algorithm assumes that entities belonging to a particular


category should appear near each other, and those that are part of different groups should
be distant. In other words, we assume that similar data points are clustered near each other
away from the dissimilar ones.

5.1.3. Linear Regression

Given the (X,Y)data points, in linear regression, we assume that the variable (Y)is linearly
dependent on the explanatory variables (X). Therefore, the resulting model linearly fits the
training data. However, this assumption can limit the model’s capacity to learn non-linear
functions.

5.1.4. Logistic Regression

In logistic regression, we assume that there’s a hyperplane that separates the two classes
from each other. This simplifies the problem, but one can imagine that if the assumption is
not valid, we won’t have a good model.

5.2 Relational Inductive Biases in Deep Learning

Relational inductive biases define the structure of the relationships between


different entities or parts in our model. These relations can be arbitrary, sequential,
local, and so on.

5.2.1. Weak Relation

Sometimes the relationship between the neural units is weak, meaning that they’re
somewhat independent of each other. The choice of including a fully connected layer in the
net can represent this kind of relationship:

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

5.2.2. Locality

In order to process an image, we start by capturing the local information. One way to do
that is the use of a convolutional layer. It can capture the local relationship between the
pixels of an image. Then, as we go deeper in the model, the local feature extractors help to
extract the global features:

5.2.3. Sequential Relation

Sometimes our data has a sequential characteristic. For instance, time series and sentences
consist of sequential elements that appear one after another. To model this pattern, we can
introduce a recurrent layer to our network:

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

5.2.4. Arbitrary Relation

To solve problems related to a group of things or people, it might be more informative to


see them as a graph. The graph structure imposes arbitrary relationships between the
entities, which is ideal when there’s no clear sequential or local relation in the model:

5.3 Non-Relational Inductive Biases in Deep Learning

Other than relational inductive biases, there are also some concepts that impose
additional constraints on our model. In this section, we list some of these concepts.

5.3.1. Non-linear Activation Functions

Non-linear activation functions allow the model to capture the non-linearity hidden in the
data. Without them, a deep neural network wouldn’t be able to work better than a single-
layer network. The reason is that the combination of several linear layers would still be a
linear layer.

5.3.2. Dropout

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

Dropout is a regularization technique that helps the network avoid memorizing the data by
forcing random subsets of the network to each learn the data pattern. As a result, the
obtained model, in the end, is able to generalize better and avoid overfitting.

5.3.3. Weight Decay

Weight decay is another regularization method that puts constraints on the model’s
weights. There are several versions of weight decay, but the common ones
are and regularization techniques. Weight decay doesn’t let the weights grow very
large, which prevents the model from overfitting.

5.3.4. Normalization

Normalization techniques can help our model in several ways, such as making the training
faster and regularizing. But most importantly, it reduces the change in the distribution of
the net’s activations which is called internal co-variate shift. There are different
normalization techniques such as batch normalization, instance normalization, and layer
normalization.

5.3.5. Data Augmentation

We can think of data augmentation as another regularization method. What it imposes on


the model depends on its algorithm. For instance, adding noise or word substitution in
sentences are two types of data augmentation. They assume that the addition of the noise
or word substitution should not change the category of a sequence of words in a
classification task.

5.3.6. Optimization Algorithm

The optimization algorithm has a key role in the model’s outcome we want to learn. For
example, different versions of the gradient descent algorithm can lead to different optima.
Subsequently, the resulting models will have other generalization properties. Moreover,
each optimization algorithm has its own parameters that can greatly influence the
convergence and optimality of the model.
Briefly explain the two main problems that degrade the performance of machine
learning models?
6. Generalization(Overfitting and Underfitting) in Machine Learning:

Overfitting and Underfitting are the two main problems that occur in machine learning and
degrade the performance of the machine learning models.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

• The main goal of each machine learning model is to generalize well.


• Here generalization defines the ability of an ML model to provide a suitable output
by adapting the given set of unknown input.
• It means after providing training on the dataset, it can produce reliable and accurate
output.
• Hence, the underfitting and overfitting are the two terms that need to be checked for
the performance of the model and whether the model is generalizing well or not.

Before understanding the overfitting and underfitting, let's understand some basic term that
will help to understand this topic well:

o Signal: It refers to the true underlying pattern of the data that helps the machine
learning model to learn from the data.
o Noise: Noise is unnecessary and irrelevant data that reduces the performance of the
model.
o Bias: Bias is a prediction error that is introduced in the model due to oversimplifying
the machine learning algorithms. Or it is the difference between the predicted values
and the actual values.
o Variance: If the machine learning model performs well with the training dataset, but
does not perform well with the test dataset, then variance occurs.

Overfitting

• Overfitting occurs when our machine learning model tries to cover all the data points
or more than the required data points present in the given dataset.
• Because of this, the model starts caching noise and inaccurate values present in the
dataset, and all these factors reduce the efficiency and accuracy of the model.
• The overfitted model has low bias and high variance.
• The chances of occurrence of overfitting increase as much we provide training to our
model.
• It means the more we train our model, the more chances of occurring the overfitted
model.

Overfitting is the main problem that occurs in supervised learning.

Example: The concept of the overfitting can be understood by the below graph of the linear
regression output:

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

As we can see from the above graph, the model tries to cover all the data points present in
the scatter plot. It may look efficient, but in reality, it is not so. Because the goal of the
regression model to find the best fit line, but here we have not got any best fit, so, it will
generate the prediction errors.

How to avoid the Overfitting in Model

Both overfitting and underfitting cause the degraded performance of the machine learning
model. But the main cause is overfitting, so there are some ways by which we can reduce the
occurrence of overfitting in our model.

o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling

Underfitting

• Underfitting occurs when our machine learning model is not able to capture the
underlying trend of the data.
• To avoid the overfitting in the model, the fed of training data can be stopped at an
early stage, due to which the model may not learn enough from the training data.
• As a result, it may fail to find the best fit of the dominant trend in the data.
• In the case of underfitting, the model is not able to learn enough from the training
data, and hence it reduces the accuracy and produces unreliable predictions.

An underfitted model has high bias and low variance.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

Example: We can understand the underfitting using below output of the linear regression
model:

As we can see from the above diagram, the model is unable to capture the data points present
in the plot.

How to avoid underfitting:

o By increasing the training time of the model.


o By increasing the number of features.

Goodness of Fit

• The "Goodness of fit" term is taken from the statistics, and the goal of the machine
learning models to achieve the goodness of fit.
• In statistics modeling, it defines how closely the result or predicted values match the
true values of the dataset.
• The model with a good fit is between the underfitted and overfitted model, and
ideally, it makes predictions with 0 errors, but in practice, it is difficult to achieve it.
• As when we train our model for a time, the errors in the training data go down, and
the same happens with test data.
• But if we train the model for a long duration, then the performance of the model may
decrease due to the overfitting, as the model also learn the noise present in the
dataset.
• The errors in the test dataset start increasing, so the point, just before the raising of
errors, is the good point, and we can stop here for achieving a good model.

There are two other methods by which we can get a good point for our model, which are
the resampling method to estimate model accuracy and validation dataset.

What is bias and variance? Explain its trade off?

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

7. Bias and Variance in Machine Learning

• Machine learning is a branch of Artificial Intelligence, which allows machines to


perform data analysis and make predictions.
• However, if the machine learning model is not accurate, it can make predictions
errors, and these prediction errors are usually known as Bias and Variance.
• In machine learning, these errors will always be present as there is always a slight
difference between the model predictions and actual predictions.
• The main aim of ML/data science analysts is to reduce these errors in order to get
more accurate results.

Errors in Machine Learning?

In machine learning, an error is a measure of how accurately an algorithm can make


predictions for the previously unknown dataset. On the basis of these errors, the machine
learning model is selected that can perform best on the particular dataset. There are mainly
two types of errors in machine learning, which are:

o Reducible errors: These errors can be reduced to improve the model accuracy. Such
errors can further be classified into bias and Variance.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

o Irreducible errors: These errors will always be present in the model

regardless of which algorithm has been used. The cause of these errors is unknown variables
whose value can't be reduced.

What is Bias?

While making predictions, a difference occurs between prediction values made by the
model and actual values/expected values, and this difference is known as bias errors or
Errors due to bias.

A model has either:

o Low Bias: A low bias model will make fewer assumptions about the form of the target
function.
o High Bias: A model with a high bias makes more assumptions, and the model
becomes unable to capture the important features of our dataset. A high bias model
also cannot perform well on new data.

Some examples of machine learning algorithms with low bias are Decision Trees, k-
Nearest Neighbours and Support Vector Machines.

At the same time, an algorithm with high bias is Linear Regression, Linear Discriminant
Analysis and Logistic Regression.

Ways to reduce High Bias:

High bias mainly occurs due to a much simple model. Below are some ways to reduce the
high bias:

o Increase the input features as the model is underfitted.


o Decrease the regularization term.
o Use more complex models, such as including some polynomial features.

What is a Variance Error?

variance tells that how much a random variable is different from its expected
value. Ideally, a model should not vary too much from one training dataset to another, which
means the algorithm should be good in understanding the hidden mapping between inputs
and output variables.

Variance errors are either of low variance or high variance.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

Low variance means there is a small variation in the prediction of the target function with
changes in the training data set.

At the same time, High variance shows a large variation in the prediction of the target
function with changes in the training dataset.

A model that shows high variance learns a lot and perform well with the training dataset,
and does not generalize well with the unseen dataset. As a result, such a model gives good
results with the training dataset but shows high error rates on the test dataset.

Since, with high variance, the model learns too much from the dataset, it leads to overfitting
of the model. A model with high variance has the below problems:

o A high variance model leads to overfitting.


o Increase model complexities.

Usually, nonlinear algorithms have a lot of flexibility to fit the model, have high variance.

Some examples of machine learning algorithms with low variance are, Linear Regression,
Logistic Regression, and Linear discriminant analysis.

At the same time, algorithms with high variance are decision tree, Support Vector
Machine, and K-nearest neighbours.

Ways to Reduce High Variance:

o Reduce the input features or number of parameters as a model is overfitted.


o Do not use a much complex model.
o Increase the training data.
o Increase the Regularization term.

Different Combinations of Bias-Variance

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

There are four possible combinations of bias and variances, which are represented by the
below diagram:

1. Low-Bias,Low-Variance:The combination of low bias and low variance shows an


ideal machine learning model. However, it is not possible practically.
2. Low-Bias, High-Variance: With low bias and high variance, model predictions are
inconsistent and accurate on average. This case occurs when the model learns with a
large number of parameters and hence leads to an overfitting
3. High-Bias,Low-Variance: With High bias and low variance, predictions are
consistent but inaccurate on average. This case occurs when a model does not learn
well with the training dataset or uses few numbers of the parameter. It leads
to underfitting problems in the model.
4. High-Bias,High-Variance:With high bias and high variance, predictions are
inconsistent and also inaccurate on average.

How to identify High variance or High Bias?

High variance can be identified if the model has:

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

o Low training error and high test error.

High Bias can be identified if the model has:

o High training error and the test error is almost similar to training error.

Bias-Variance Trade-Off

• While building the machine learning model, it is really important to take care of bias
and variance in order to avoid overfitting and underfitting in the model.
• If the model is very simple with fewer parameters, it may have low variance and high
bias.
• Whereas, if the model has a large number of parameters, it will have high variance
and low bias.
• So, it is required to make a balance between bias and variance errors, and this balance
between the bias error and variance error is known as the Bias-Variance trade-off.

For an accurate prediction of the model, algorithms need a low variance and low bias. But
this is not possible because bias and variance are related to each other:

o If we decrease the variance, it will increase the bias.


o If we decrease the bias, it will increase the variance.

Bias-Variance trade-off is a central issue in supervised learning.

o Ideally, we need a model that accurately captures the regularities in training data and
simultaneously generalizes well with the unseen dataset.
o Unfortunately, doing this is not possible simultaneously.
o Because a high variance algorithm may perform well with training data, but it may
lead to overfitting to noisy data.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

o Whereas, high bias algorithm generates a much simple model that may not even
capture important regularities in the data.
o So, we need to find a sweet spot between bias and variance to make an optimal model.

Hence, the Bias-Variance trade-off is about finding the sweet spot to make a balance
between bias and variance errors.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

PART -A

S.N Question and Answer CO,K


o
Define Machine Learning. CO1, K1
1. Machine learning is a subfield of artificial intelligence, which isbroadly
defined as the capability of a machine to imitate intelligent human
behavior.
2. List out Different Types of learning methods. CO1, K1
Supervised Learning, Unsupervised Learning, reinforcement
Learning
3. What is meant by supervised learning? CO1, K1
Supervised learning, also known as supervised machine learning, isa
subcategory of machine learning and artificial intelligence. It is defined
by its use of labeled datasets to train algorithms that to classify data or
predict outcomes accurately.
4. What is Unsupervised Learning? CO1, K1
Unsupervised learning is a type of machine learning in which models
are trained using unlabeled dataset and are allowed to act on that data
without any supervision.
5. Differentiate supervised and unsupervised machine learning. CO1, K1
In supervised machine learning, the machine is trained using labeled
data. Then a new dataset is given into the learning model so that the
algorithm provides a positive outcome by analyzing the labeled data.
For example, we first require to label the data which isnecessary to train
the model while performing classification.
In the unsupervised machine learning, the machine is not trained using
labeled data and let the algorithms make the decisions without any
corresponding output variables.

6. Define Reinforcement Learning? CO1, K1


Reinforcement learning is a machine learning training method basedon
rewarding desired behaviors and/or punishing undesired ones. In
general, a reinforcement learning agent is able to perceive and interpret
its environment, take actions and learn through trial and error.

7. Where is supervised learning used? CO1, K1


Linear regression is a supervised learning technique typically used in
predicting, forecasting, and finding relationships betweenquantitative
data.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

PART -A

S.No Question and Answer CO,K

8. Give Example for Unsupervised Learning. CO1, K1


ome examples of unsupervised learning algorithms include K-Means
Clustering, Principal Component Analysis and Hierarchical Clustering.

9. Give Example for Reinforcement Learning. CO1, K1


Reinforcement learning can be used in different fields such as healthcare,
finance, recommendation systems, etc. Playing games like Go: Google has
reinforcement learning agents that learn to solve problems by playing
simple games like Go, which is a game of strategy

10. List Out real time application of ML. CO1, K1


Image recognition
Speech recognition.
Medical diagnosis.
Statistical arbitrage
Predictive analytics
11. Define Data Science. CO1, K1
ata science is the domain of study that deals with vast volumes of data using
modern tools and techniques to find unseen patterns, derive meaningful
information, and make business decisions.

12. What is PCA? CO1, K1


Principal component analysis (PCA) is a technique for reducing the
dimensionality of such datasets, increasing interpretability but at the same
time minimizing information loss

13. What is the use of PCA in machine learning? CO1, K1


or Write Applications of PCA in Machine Learning.

It is used to reduce the number of dimensions in healthcare data. PCA can


help resize an image. It can be used in finance to analyze stock data and
forecast returns. PCA helps to find patterns in the high- dimensional
datasets

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

PART -A

S.No Question and Answer CO,K

14. What is Hypothesis? CO1, K1

The hypothesis is defined as the supposition or proposed explanation based


on insufficient evidence or assumptions.

Example: Some scientist claims that ultraviolet (UV) light can damage the eyes
then it may also cause blindness.In this example, a scientist just claims that UV
rays are harmful to the eyes, but we assume they may cause blindness.
However, it may or may not be possible. Hence, these types of assumptions are
called a hypothesis.
15. Define hypothesis set. CO1, K1
Hypothesis space is defined as a set of all possible legal hypotheses; hence
it is also known as a hypothesis set.

16. Write the hypothesis formula in machine learning? CO1, K2

The hypothesis (h) can be formulated in machine learning as follows:

y= mx + b

Where, Y: Range ,m: Slope of the line which divided test data or changes in y
divided by change in x, x: domain, c: intercept (constant)

17. Define inductive bias? CO1, K1


Every machine learning model requires some type of architecture design and
possibly some initial assumptions about the data we want to
analyze. Generally, every building block and every belief that we make about
the data is a form of inductive bias.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

18. What is k-NN algorithm? CO1, K1


The k-Nearest Neighbors(k-NN) algorithm assumes that entities belonging to
a particular category should appear near each other, and those that are part
of different groups should be distant. In other words, we assume that similar
data points are clustered near each other away from the dissimilar ones.

19. Define bias and variance? CO1, K1

Bias: Bias is a prediction error that is introduced in the model due to


oversimplifying the machine learning algorithms. Or it is the difference
between the predicted values and the actual values.

Variance: If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then variance occurs.
20. How to avoid the Overfitting in Model? CO1, K2

Both overfitting and underfitting cause the degraded performance of the


machine learning model. But the main cause is overfitting, so there are some
ways by which we can reduce the occurrence of overfitting in our model.

o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling

21. How to avoid underfitting in model? CO1, K2

o By increasing the training time of the model.


o By increasing the number of features.

22. Define VC dimension? CO1, K1

The Vapnik Chervonenkis dimension is the capacity of a classification


algorithm and is defined as the maximum cardinality of the points that the
algorithm is able to shatter.

Downloaded by E.BHUVANESWARI CIT ([email protected])


lOMoARcPSD|9899035

23. What is PAC learning? CO1, K1

A good learner will learn with high probability and close approximation to the
target concept,the selected hypothesis will have lower the
error(“approximately correct”)with the parameters ε and δ is called probably
approximately correct learning.

24. Hypothesis h generated the errors with respect to price and engine power of CO1, K3
5 samples,given ε = 0.05 and δ = 0.20

s.no 1 2 3 4 5
Error(h) 0.001 0.07 0.045 0.065 0.036

Find h is PAC or not?

Part-B

S.No Question and Answer CO,K

1. Explain in detail about machine learning concepts with different CO1,K3


learning types.
2. Discuss with examples some useful applications of machine learning. CO1,K3

3. Differentiate supervised, unsupervised and reinforcement learning. CO1,K3

4. Explain in detail why to learn linear algebra before machine learning CO1,K3

5. Describe briefly about some examples of linear algebra in machine learning? CO1,K3

6. Give a detailed note on hypothesis in machine learning and statistics? CO1,K3

7. What Is Inductive Biasin Machine Learning? CO1,K3

8. Briefly explain the two main problems that degrade the performance of CO1,K3
machine learning models?

9. What is bias and variance? Explain its trade off? CO1,K3

10. Give a short note on VC dimension and PAC learning? CO1,K3

Downloaded by E.BHUVANESWARI CIT ([email protected])

You might also like