0% found this document useful (0 votes)
12 views23 pages

Class Test1,2,3-Answer Key

Uploaded by

jayanbu05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views23 pages

Class Test1,2,3-Answer Key

Uploaded by

jayanbu05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

KARPAGA VINAYAGA COLLEGE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE


CLASS TEST - 1

AD 3501 DEEP LEARNING


YEAR/SEM:III/V DATE: 22.08.2025
PART-A (5X2=10)

Answers the following questions:-


1. .What is Deep Learning?
2 .Differentiate bias and variance.
3 .What is Stochastic gradient descent?
4 Differentiate supervised and unsupervised deep learning.
5. Why over fitting and under fitting in ML.
Part-B (1 X 15 =15)
6. Explain briefly on Estimators, Bias and Variance that are useful for generalization, under
fitting and over fitting.
KARPAGA VINAYAGA COLLEGE OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
CLASS TEST – 1(Answer Key)

AD 3501 DEEP LEARNING


YEAR/SEM:III/V DATE: 22.08.2025
PART-A (5X2=10)

Answers the following questions:-


1. .What is Deep Learning?
Deep learning is a part of machine learning with an algorithm inspired by the structure
and function of the brain, which is called an artificial neural network. In the mid-1960s,
Alexey Grigorevich Ivakhnenko published the first general, while working on deep learning
network. Deep learning is suited over a range of fields such as computer vision, speech
recognition, natural language processing, etc
2. Differentiate bias and variance.
Bias is an error from overly simplistic assumptions in the learning algorithm, while variance
is an error from being too sensitive to fluctuations in the training data.

3. What is stochastic gradient descent?


Stochastic Gradient Descent (SGD) is a variant of the Gradient Descent algorithm
used for optimizing machine learning models. In this variant, only one random training
example is used to calculate the gradient and update the parameters at each iteration.

4 Differentiate supervised and unsupervised deep learning.


5. Why over fitting and under fitting in ML.
Factors determining how well an ML algorithm will perform are its ability to:
1. Make the training error small
2. Make gap between training and test errors small
They correspond to two ML challenges
Underfitting - Inability to obtain low enough error rate on the training set
Overfitting - Gap between training error and testing error is too large
We can control whether a model is more likely to overfit or underfit by altering its capacit

Part-B (1 X 15 =15)
6. Explain briefly on Estimators, Bias and Variance that are useful for generalization,
under fitting and over fitting.
ESTIMATORS
The field of statistics provides many tools to achieve the ML goal of solving a task not only
on the training set but also to generalize
Foundational concepts such as
1. Parameter estimation
2. Bias
3. Variance
They characterize notions of generalization, over- and under-fitting.
Point Estimation
Point Estimation is the attempt to provide the single best prediction of some quantity of
interest.
Quantity of interest can be:
a. A single parameter
b. A vector of parameters
E.g., weights in linear regression
Point estimator or Statistic
To distinguish estimates of parameters from their true value, a point estimate of a parameter.
O is represented by
Let (x(1), x(2)...x(m)) be m independent and identically distributed data points.
Then a point estimator or statistic is any function of the
data Then a point estimator or statistic is any function of the data.
θ m = g(x(1),...x(m))

Thus a statistic is any function of the data.


It need not be close to the true θ.
A good estimator is a function whose output is close to the true underlying θ that generated
the data.
Function Estimation
Point estimation can also refer to estimation of relationship between input and target variables
referred to as function estimation.
Here we predict a variable y given input x.
We assume f(x) is the relationship between x and y.
We may assume y = f(x) + ɛ
Where & stands for a part of y not predictable from x.
We are interested in approximating with a model f.
Function estimation is same as estimating a parameter θ.
where is a point estimator in function space
Ex: in polynomial regression we are either estimating a parameter w or estimating a function
mapping from x to y.
Properties of Point Estimators
Most commonly studied properties of point estimators are:
1. Bias
2. Variance
1. Bias of an estimator
The bias of an estimator for parameter θ is defined as.

Examples of Estimator Bias


We look at common estimators of the following parameters to determine whether there is
bias:
Bernoulli distribution: mean θ
Gaussian distribution: mean μ
Gaussian distribution: variance σ²
Variance and Standard Error
Another property of an estimator:
How much we expect the estimator to vary as a function of the data sample.
Just as we computed the expectation of the estimator to determine its bias. we can compute
its variance.
The variance of an estimator is simply Var(0) where the random variable is the training set.
The square root of the the variance is called the standard error, denoted SE (0) I
Trading-off Bias and Variance
Bias and Variance measure two different sources of error of an estimator.
Bias measures the expected deviation from the true value of the function or parameter.
Variance provides a measure of the expected deviation that any particular sampling of the
data is likely to cause.
Underfit-Overfit: Bias-Variance
Relationship of bias-variance to capacity is similar to underfitting and overfitting relationship
to capacity.

OVERFITTING
Overfitting occurs when our machine learning model tries to cover all the data points or more
than the required data points present in the given dataset. Because of this, the model starts
caching noise and inaccurate values present in the dataset, and all these factors reduce the
efficiency and accuracy of the model. The overfitted model has low bias and high variance.
The chances of occurrence of overfitting increase as much we provide training to our model.
It means the more we train our model, the more chances of occurring the
overfitted model
Overfitting is the main problem that occurs in supervised learning.
Example: The concept of the overfitting can be understood by the below graph of the
linear regression output:
As we can see from the above graph, the model tries to cover all the data points present in the
scatter plot. It may look efficient, but in reality, it is not so. Because the goal of the regression
model to find the best fit line, but here we have not got any best fit, so, it will generate the
prediction errors.
How to avoid the Overfitting in Model?
Both overfitting and underfitting cause the degraded performance of the machine
learning model. But the main cause is overfitting, so there are some ways by which we
can reduce the occurrence of overfitting in our model.
o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling
UNDERFITTING
Underfitting occurs when our machine learning model is not able to capture the underlying
trend of the data. To avoid the overfitting in the model, the fed of training data can be stopped
at an early stage, due to which the model may not learn enough from the training data. As a
result, it may fail to find the best fit of the dominant trend
in the data.
In the case of underfitting, the model is not able to learn enough from the training data, and
hence it reduces the accuracy and produces unreliable predictions. An underfitted model has
high bias and low variance.
Example: We can understand the underfitting using below output of the linear regression
model:

As we can see from the above diagram, the model is unable to capture the data points present
in the plot.
How to avoid underfitting:
 By increasing the training time of the model.
 By increasing the number of features.

KARPAGA VINAYAGA COLLEGE OF ENGINEERING AND TECHNOLOGY


DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
CLASS TEST - 2

AD 3501 DEEP LEARNING


YEAR/SEM:III/V DATE: 28.08.2025
PART-A (5X2=10)

Answers the following questions:-


1. What are the applications of deep learning?
2. Define pooling in the context of CNN.
3. List the importance of equivariance in convolutional neural networks.
4. What are sparse interactions in a convolutional neural network?
5. Define a loss function for image identification.
Part-B (1 X 15 =15)
6. Briefly explain an example of a fully functioning feed forward network on a simple
task.
KARPAGA VINAYAGA COLLEGE OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
CLASS TEST – 2(Answer Key)

AD 3501 DEEP LEARNING


YEAR/SEM:III/V DATE: 28.08.2025
PART-A (5X2=10)

Answers the following questions:-


1. What are the applications of deep learning?
Deep learning is used across a wide range of industries, enabling applications like
self-driving cars, virtual assistants, personalized medicine, and financial fraud detection.
2. Define pooling in the context of CNN.
A pooling layer is another building block of a CNN. Pooling Its function is to
progressively reduce the spatial size of the representation to reduce the network complexity
and computational cost.
There are two types of widely used pooling in CNN layer:
i) Max Pooling ii) Average Pooling
3. List the importance of equivariance in convolutional neural networks.
Equivariant means varying in the similar or equivalent proportion. Equivariant to
translation means that a translation of input features results in an equivalent translation of
outputs. It makes the CNN understand the rotation or proportion change. The equivariance
allows the network to generalize edge, texture, shape, detection in different locations.
4. What are sparse interactions in a convolutional neural network?

A Convolution layer defines a window or filter or kernel by which they examine a


subset of the data, and subsequently scans the data looking through this window.
his is what we call sparse connectivity or sparse interactions or sparse weights.
Actually, it limits the activated connections at each layer. In the example below an
5x5 input with a 2x2 filter produces a reduced 4x4 output. The first element of
feature map is calculated by the convolution of the input area with the filter i.e

5. Define a loss function for image identification.


A loss function for image identification, typically in the context of image
classification, quantifies the discrepancy between the predicted output of a model and the true
label of an image. The goal during training is to minimize this loss, thereby improving the
model's ability to correctly identify images.
For image classification tasks, the most common loss function is Cross-Entropy Loss,
also known as Log Loss.
Part-B (1 X 15 =15)
6. Briefly explain an example of a fully functioning feed forward network on a simple
task.
What is a Neural Network?
The fundamental building block of deep learning, a neural network is a computational
model used to recognize patterns and make predictions or decisions based on data. The main
inspiration behind this is the way the human brain functions, it consists of layers of neurons
(also called nodes) connected by synapses. These neurons work together to process data and
learn from it in a way that allows the network to improve its performance over time.
The structure of a neural network typically includes three main components:
Input Layer:
This is where the neural network receives data.
Each neuron in the input layer represents one feature or piece of information from the
data. For example, in an image classification task, the pixels of an image could be the
features input into the network.
Hidden Layers:
These layers sit between the input and output layers and do most of the computation.
Each neuron in a hidden layer takes input from the neurons of the previous layer,
processes the data using mathematical functions, and passes the result to the next layer.
Hidden layers allow the network to learn complex patterns and relationships in the
data. The more hidden layers there are, the deeper the network becomes, allowing it to
capture intricate features of the data.
Output Layer:
The final layer of the neural network is where the processed data is transformed into a
prediction or classification result.
For example, in a classification task, the output layer might give the probability of the
input data belonging to each class.
The learning process in a neural network involves adjusting the weights of the
connections between neurons in order to reduce the difference between the network’s output
and the actual result (the error or loss).
The working of the neural network:
Forward Propagation:
The data is passed through the network, starting from the input layer, moving through the
hidden layers, and finally reaching the output layer. This is called forward propagation.
In each layer, the neurons perform mathematical operations, often using a function called an
activation function to introduce non-linearity to the network. This helps the network learn
complex patterns that are not just linear combinations of the input.
Backpropagation:
After the output is generated, the network compares it to the correct output (the target) and
calculates the error.
Backpropagation is the process of sending the error back through the network to adjust the
weights of the connections between neurons. The goal is to reduce this error over time, which
is done using optimization algorithms like gradient descent.
This process is repeated many times during training, with the weights being adjusted slightly
each time, until the network is able to make predictions that are accurate enough for the task.
Input
It is the collection of data (i.e, features) that is input into the learning model. For
instance, an array of current atmospheric measurements can be used as the input for a
meteorological prediction model.
Weight
Giving importance to features that help the learning process the most is the primary
purpose of using weights. By adding scalar multiplication between the input value and the
weight matrix, we can increase the effect of some features while lowering it for others. For
instance, the presence of a high pitch note would influence the music genre classification
model’s choice more than other average pitch notes that are common between genres.
Activation Function
In order to take into account changing linearity with the inputs, the activation function
introduces non-linearity into the operation of neurons. Without it, the output would simply be
a linear combination of the input values, and the network would not be able to accommodate
non-linearity. The most commonly used activation functions are: Unit step, sigmoid,
piecewise linear, and Gaussian.
Bias
The purpose of bias is to change the value that the activation function generates. Its
function is comparable to a constant in a linear function. So, it’s a shift for the activation
function output.
Layers
An artificial neural network is made of multiple neural layers stacked on top of one
another. Each layer consists of several neurons stacked in a row. We distinguish three types of
layers: Input, hidden, and Output.
Input Layer
The input layer of the model receives the data that we introduce to it from external
sources like images or a numerical vector. It is the only layer that can be seen in the entire
design of a neural network that transmits all of the information from the outside world
without any processing.
Hidden Layers
The hidden layers are what make deep learning what it is today. They are intermediary
layers that do all the calculations and extract the features of the data. The search for hidden
features in data may comprise many interlinked hidden layers. In image processing, for
example, the first hidden layers are often in charge of higher-level functions such as the
detection of borders, shapes, and boundaries. The later hidden layers, on the other hand,
perform more sophisticated tasks, such as classifying or segmenting entire objects.
Output Layer
The final prediction is made by the output layer using data from the preceding hidden
layers. It is the layer from which we acquire the final result, hence it is the most important.
In the output layer, classification and regression models typically have a single node.
However, it is fully dependent on the nature of the problem at hand and how the model was
developed. Some of the most recent models have a two-dimensional output layer. For
example, Meta’s new Make-A-Scene model that generates images simply from text at the
input.
How do these layers work together?
The input nodes receive data in a form that can be expressed numerically. Each node
is assigned a number; the higher the number, the greater the activation. The information is
displayed as activation values. The network then spreads this information outward. The
activation value is sent from node to node based on connection strengths (weights) to
represent inhibition or excitation.
Each node adds the activation values it has received before changing the value by its
activation function. The activation travels via the network’s hidden levels before arriving at
the output nodes. The input is then meaningfully reflected to the outside world by the output
nodes. The error, which is the difference between the projected value and the actual value, is
propagated backward by allocating the weights of each node to the proportion of the error
that each node is responsible for.

Convolutional neural networks (CNNs) are one of the most well-known iterations of
the feedforward architecture. They offer a more scalable technique to image classification and
object recognition tasks by using concepts from linear algebra, specifically matrix
multiplication, to identify patterns within an image.
Below is an example of a CNN architecture that classifies handwritten digits
In a feedforward network, signals can only move in one direction. These networks are
considered non-recurrent networks with inputs, outputs, and hidden layers. A layer of
processing units receives input data and executes calculations there. Based on a weighted
total of its inputs, each processing element performs its computation. The newly derived
values are subsequently used as the new input values for the subsequent layer. This process
continues until the output has been determined after going through all the layers.
Perceptron (linear and non-linear) and Radial Basis Function networks are examples
of feedforward networks. A single-layer perceptron network is the most basic type of neural
network. It has a single layer of output nodes, and the inputs are fed directly into the outputs
via a set of weights. Each node calculates the total of the products of the weights and the
inputs. This neural network structure was one of the first and most basic architectures to be
built.

Architecture examples: AlexNet


Alex Krizhevsky developed AlexNet, a significant Convolutional Neural Network (CNN)
architecture. This network comprised eight layers: five convolutional layers (some followed
by max-pooling) and three fully connected layers. AlexNet notably employed the non-
saturating ReLU activation function, which proved more efficient in training compared to
tanh and sigmoid. Widely regarded as a pivotal work in computer vision, the publication of
AlexNet spurred extensive subsequent research leveraging CNNs and GPUs for accelerated
deep learning. By 2022, the AlexNet paper had received over 69,000 citations.

Long short-term memory (LSTM)


LSTM networks are one of the prominent examples of RNNs. These architectures can
analyze complete data sequences in addition to single data points. For instance, LSTM can be
used to perform tasks like unsegmented handwriting identification, speech recognition,
language translation and robot control.
KARPAGA VINAYAGA COLLEGE OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
CLASS TEST – 3(Answer Key)

AD 3501 DEEP LEARNING


YEAR/SEM:III/V DATE: 04.09.2025
PART-A (5X2=10)

1. What is convolutional neural network?


2. Write the formula to find how many neurons fit for a network?
3 Why parameter sharing is used in CNN?
4. Define the acceptor design pattern in RNNs.
5. Write the merits of using unfolding graphs for understanding the behaviour of RNN
Part-B 1*15=15
6. Discuss in detail about Strided, Tiled, Transposed and dilated convolutions with an
example
KARPAGA VINAYAGA COLLEGE OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
CLASS TEST – 3(Answer Key)

AD 3501 DEEP LEARNING


YEAR/SEM:III/V DATE: 28.08.2025
PART-A (5X2=10)

1. What is convolutional neural network?


A Convolutional Neural Network (CNN) is a type of Deep Learning neural network
architecture commonly used in Computer Vision. Computer vision is a field of Artificial
Intelligence that enables a computer to understand and interprehe image or visual data.
2. Write the formula to find how many neurons fit for a network?

To compute the spatial size of the output volume as a function of the input volume
size (W), the receptive field size of the Conv Layer neurons (F), the stride with which they
are applied (S), and the amount of zero padding used (P) on the border. The formula for
calculating how many neurons “fit” is given by

By introducing probability to a deep learning system, we introduce common sense to


the system. Otherwise the system would be very brittle and will not be useful.In deep
learning, several models like Bayesian models, probabilistic graphical models, hidden
markov models are used. They depend entirely on probability concepts.
3 Why parameter sharing is used in CNN?
Parameter sharing is used in the convolutional layers to reduce the number of
parameters in the network. For example in the first convolutional layer let’s say we have an
output of 15x15x4 where 15 is the size of the output and 4 the number of filters used in this
layer. For each output node in that layer we have the same filter, thus reducing dramatically
the storage requirements of the model to the size of the filter.
The same filter (weights) (1, 0, -1) are used for that layer
4. Define the acceptor design pattern in RNNs.
The acceptor design pattern in Recurrent Neural Networks (RNNs) refers to a network
architecture that reads an entire input sequence and produces a single output at the very end.
It makes a single classification or prediction based on the final hidden state, which is meant
to summarize all the information from the input sequence
5. Write the merits of using unfolding graphs for understanding the behaviour of RNN.
 Visualizing the flow of information over time
 Explaining the learning process with Backpropagation Through Time (BPTT)
 Explaining parameter sharing
 Supporting variable-length sequences
5. Write the merits of using unfolding graphs for understanding the behaviour of RNN
Part-B 1*15=15
6. Discuss in detail about Strided, Tiled, Transposed and dilated convolutions with an
example

strided:
 Deep neural networks have proved a powerful learning technique inbmachine learning and
AI by revolutionizing various applications from visual recognition and language generation
to autonomous driving and healthcare. A central component of their success can be attributed
to the substantial learning capabilities of Convolutional Neural Networks (CNNs)which are
exceptionally effective in dealing with tasks involving images, graphs, and sequences.
 In every CNN architecture, we can find a convolution operator that aims to exploit some
type of correlation (either temporal or spatial) to learn high-level features. At a high level, we
can imagine convolution as a small filter (also mentioned as the kernel) that is sliding over an
input image or sequence capturing local features at each position. These local features are
then combined to generate a feature map that is given as input to the next layers of the
network.
 Strided Convolution
 In this article, we’ll focus on strided convolutions which improve the conventional
convolutional applied in CNNs. Specifically, conventional convolution uses a step size (or
stride) of 1 meaning that the sliding filter moves 1 sample (e.g. pixel in the case of images) at
a time. On the contrary, strided convolution introduces a stride variable that controls the step
of the folder as it moves over the input. So, for example, when the stride is equal to 2, the
filter skips one pixel every time it slides over the input sample, resulting in a smaller output
feature map.
 In the images below, we can see how the conventional convolution (top) and a strided
convolution with stride=2 (bottom) are applied in a 2-dimensional input:

Tiled Convolution
Convolutional neural networks (CNNs) [1] have been successfully applied to many
recognition tasks. These tasks include digit recognition (MNIST dataset [2]), object
recognition (NORB dataset [3]), and natural language processing [4]. CNNs take translated
versions of the same basis function, and “pool” over them to build translational invariant
features. By sharing the same basis function across different image locations (weight-tying),
CNNs have significantly fewer learnable parameters which makes it possible to train them
with fewer examples than if entirely different basis functions were learned at different
locations (untied weights). Furthermore, CNNs naturally enjoy translational invariance, since
this is hard-coded into the network architecture. However, one disadvantage of this hard-
coding approach is that the pooling architecture captures only translational invariance; the
network does not, for example, pool across units that are rotations of each other or capture
more complex invariances, such as out-of-plane rotations. Is it better to hard-code
translational invariance – since this is a useful form of prior knowledge – or let the network
learn its own invariances from unlabeled data?
Transposed and dilated convolutions:
A transposed convolutional layer is an upsampling layer that generates the output feature map
greater than the input feature map. It is similar to a deconvolutional layer. A deconvolutional
layer reverses the layer to a standard convolutional layer. If the output of the standard
convolution layer is deconvolved with the deconvolutional layer then the output will be the
same as the original value, While in transposed convolutional value will not be the same, it
can reverse to the same dimension, Transposed convolutional layers are used in a variety of
tasks, including image generation, image super-resolution, and image segmentation. They are
particularly useful for tasks that involve upsampling the input data, such as converting a
lowresolution image to a high-resolution one or generating an image from a set of noise
vectors. The operation of a transposed convolutional layer is similar to that of a normal
convolutional layer, except that it performs the convolution operation in the opposite
direction. Instead of sliding the kernel over the input and performing element-wise
multiplication and summation, a transposed convolutional layer slides the input over the
kernel and performs element-wise multiplication and summation. This results in an output
that is larger than the input, and the size of the output can be controlled by the stride and
padding parameters of the layer.

Example 1:
Suppose we have a grayscale image of size 2 X 2, and we want to upsample it using a
transposed convolutional layer with a kernel size of 2 x 2, a stride of 1, and zero padding (or
no padding). The input image and the kernel for the transposed convolutional layer would be
as follows:
The output will be:

Method 1: Manually with TensorFlow


Code Explanations:
 Import necessary libraries (TensorFlow and NumPy)
 Define Input tensor and custom kernel
 Apply Transpose convolution with kernel size =2, stride = 1.
 Write the custom functions for transpose convolution
 Apply Transpose convolution on input data.

You might also like