0% found this document useful (0 votes)

29 views11 pages

Supervised vs Unsupervised Learning Explained

The document discusses various concepts in machine learning and deep learning, including supervised vs. unsupervised learning, stochastic gradient descent, convolutional neural networks, recurrent neural networks, and hyperparameter tuning. It covers the bias-variance trade-off, overfitting vs. underfitting, and performance metrics for evaluating models. Additionally, it compares LSTM and GRU architectures, emphasizing the advantages of autoencoders over PCA for dimensionality reduction.

Uploaded by

devirajaaravind10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views11 pages

Supervised vs Unsupervised Learning Explained

Uploaded by

devirajaaravind10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

PART A — (10 × 2 = 20 marks)

1. Differentiate supervised and unsupervised learning.

Basis Supervised Learning Unsupervised Learning

Data Uses labeled data (input-output pairs). Uses unlabeled data (only inputs).

To learn a mapping from inputs to To find hidden patterns, structures,

Goal
outputs for prediction or classification. or groupings in the data.

Clustering, Dimensionality
Examples Regression, Classification.
Reduction.

2. What is stochastic gradient descent?

Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to
minimize the loss function of a model. Unlike standard Gradient Descent, which uses
the entire dataset to compute the gradient, SGD calculates the gradient and updates the
model's parameters using a single, randomly selected training example (or a small mini-
batch) at a time. This makes it much faster and allows for online learning, though the
path to the minimum is noisier.
3. What are sparse interactions in a convolutional neural network?
Sparse interactions (or sparse connectivity) refer to the concept in CNNs where a
neuron in a layer is connected to only a small, local region of the previous layer, rather
than every neuron. This is achieved using a small kernel/filter. It drastically reduces the
number of parameters and computations, makes the model more efficient, and helps in
detecting local features like edges and corners.
4. Present an outline of pooling layer in a convolutional neural network.
A pooling layer is used in CNNs to progressively reduce the spatial size (height, width)
of the representation, which reduces the computational load, memory usage, and
number of parameters. It also helps in making the detection of features somewhat
invariant to scale and orientation. The most common type is Max Pooling, which
outputs the maximum value from a region of the feature map covered by the filter.
5. Define a recurrent neural network.
A Recurrent Neural Network (RNN) is a class of artificial neural networks designed
for sequential data. Unlike feedforward networks, RNNs have "memory" through
internal loops, allowing information to persist. The output from a previous step is fed
back as input to the current step, making them suitable for tasks like time series
prediction, speech recognition, and natural language processing.
6. What is LSTM? How it differs from RNN?
LSTM (Long Short-Term Memory) is a special kind of RNN architecture designed to
solve the vanishing gradient problem of standard RNNs.
• RNN: Has a simple repeating module (usually a single tanh layer). It struggles to
learn long-term dependencies due to vanishing gradients.
• LSTM: Has a complex repeating module with three gates (Input, Forget,
Output) and a cell state. This gating mechanism allows it to selectively remember
or forget information over long periods, making it highly effective for long-range
dependencies.
7. What is a baseline model in deep learning?
A baseline model is a simple, often non-deep-learning or a very simple neural network
model, used as a reference point for evaluating the performance of more complex
models. It provides a minimum performance threshold that any new, sophisticated
model must significantly exceed to be considered useful. Examples include a logistic
regression model for classification or a simple feedforward network with one hidden
layer.
8. Define random search.
Random Search is a hyperparameter tuning technique where a fixed number of
hyperparameter settings are sampled from a defined search space randomly. The model
is trained and evaluated for each of these random combinations. It is often more
efficient than grid search because it has a better chance of finding good
hyperparameters by exploring the search space more broadly.
9. What is a regularized autoencoder?
A regularized autoencoder is an autoencoder that does not use an undercomplete
hidden layer for dimensionality reduction. Instead, it uses a loss function with a
regularization term to prevent the network from simply learning the identity function,
even if the hidden layer is the same size or larger than the input. This forces the model
to learn useful properties of the data. Types include Sparse, Denoising, and Contractive
Autoencoders.
10. Define a stochastic encoder.
A stochastic encoder is an encoder, typically used in Variational Autoencoders (VAEs),
where the encoding process is probabilistic. Instead of encoding an input into a fixed
point in the latent space, it encodes it into a probability distribution (e.g., a Gaussian).
The latent vector is then sampled from this distribution. This introduces continuity and
completeness in the latent space, enabling generative capabilities.
PART B — (5 × 13 = 65 marks)
11. (a) (i) Discuss the Bias - Variance trade off.
The bias-variance trade-off is a fundamental concept in machine learning that describes the
tension between a model's simplicity and its ability to fit the training data.
• Bias: Error due to overly simplistic assumptions in the learning algorithm. A high-
bias model (e.g., linear regression for a complex problem) fails to capture the
underlying trends of the data, leading to underfitting.
• Variance: Error due to excessive sensitivity to small fluctuations in the training set. A
high-variance model (e.g., a very deep decision tree) learns the noise in the training
data as if it were a true pattern, leading to overfitting.
The Trade-off:
• As model complexity increases, bias decreases (it fits the training data better) but
variance increases (it becomes more sensitive to the specific training set).
• The goal is to find the optimal model complexity that minimizes the total error (the
sum of bias error, variance error, and irreducible error).
(ii) Discuss overfitting and underfitting with an example.
• Overfitting: Occurs when a model learns the training data too well, including its
noise and outliers. It performs excellently on training data but poorly on unseen test
data.
o Example: A student who memorizes a textbook word-for-word but fails to
answer application-based questions in the exam.
o In ML: A deep neural network achieving 99% accuracy on the training set but
only 60% on the test set.
• Underfitting: Occurs when a model is too simple to capture the underlying structure
of the data. It performs poorly on both training and test data.
o Example: A student who has only read the chapter titles and fails to answer
both direct and application-based questions.
o In ML: Using a linear model to fit a complex, non-linear dataset, resulting in
high error on both sets.

(b) Explain the operations of a deep feedforward network with a diagram.

A Deep Feedforward Network (DFN) or Multi-Layer Perceptron (MLP) is the quintessential
deep learning model. Information flows from input to output without any feedback loops.
Operations:
1. Input Layer: Receives the feature vector.
2. Hidden Layers: Multiple layers between input and output. Each layer consists of
neurons.
3. Forward Propagation: For each neuron in a hidden/output layer:
a. Weighted Sum: z = (w1*x1 + w2*x2 + ... + wn*xn) + b where w are
weights, x are inputs, and b is bias.
b. Activation Function: An activation function f (e.g., ReLU, Sigmoid) is applied
to z to introduce non-linearity: a = f(z).
c. This output a becomes the input for the next layer.
4. Output Layer: The final layer produces the network's prediction.
5. Loss Calculation: The difference between the prediction and the actual target is
calculated using a loss function (e.g., Mean Squared Error, Cross-Entropy).
6. Backpropagation & Optimization: The loss is propagated backward through the
network using the chain rule to calculate the gradient of the loss with respect to each
weight. An optimizer (e.g., SGD, Adam) then updates the weights to minimize the
loss.
[Diagram Description: A diagram would show an input layer on the left with 3 nodes, two
hidden layers in the middle with 4 nodes each, and an output layer on the right with 1 node.
All nodes in one layer would be fully connected to all nodes in the next layer, representing a
fully connected network. Arrows would indicate the forward flow of data.]

12. (a) What is a convolutional neural network? Outline transposed and dilated
convolutions with an example.
A Convolutional Neural Network (CNN) is a specialized neural network for processing grid-
like data such as images. Its core components are convolutional layers, pooling layers, and
fully connected layers. It uses shared weights and sparse connectivity to efficiently capture
spatial hierarchies of features.
• Transposed Convolution (Deconvolution): It is essentially a reverse convolution
used for upsampling a feature map to a higher resolution. It applies a filter but
increases the spatial dimensions.
o Example: In a segmentation task, the initial layers downsample the image.
Transposed convolutions in the decoder part of the network are used to
upsample the feature maps back to the original image size to predict a label for
each pixel.
• Dilated Convolution (Atrous Convolution): It is a convolution where the kernel is
applied over an area larger than its length by inserting "holes" (zeros) between the
kernel elements. The dilation rate defines the spacing.
o Example: In image segmentation (e.g., DeepLab), dilated convolutions are
used to exponentially increase the receptive field without increasing the
number of parameters or losing resolution (avoiding pooling), thus capturing
more context from the image.

(b) How to introduce non-linearity in a convolutional neural network? Explain with an

example.
Non-linearity is introduced in a CNN through activation functions applied element-wise to
the output of a convolutional layer. Without it, the entire network would be a linear
transformation, no matter how many layers, severely limiting its ability to learn complex
patterns.
• Mechanism: After the convolution operation produces a feature map, each value in
that map is passed through a non-linear activation function like ReLU (Rectified
Linear Unit): f(x) = max(0, x).
• Example: Consider a CNN for cat vs. dog classification. The first layer might detect
simple edges. If we only had linear activations, subsequent layers would only be able
to combine these edges in a linear way (e.g., weighted sums). With ReLU, the
network can learn to "ignore" negative activations (like weak edges) and "activate"
only strong, positive features. This allows it to build up a hierarchy: edges -> textures
-> patterns -> parts of a cat's face -> the concept of a "cat". This complex, non-linear
mapping is only possible due to the activation functions.

13. (a) What is a bi-directional recurrent neural network? Explain the architecture with
a diagram.
A Bidirectional Recurrent Neural Network (Bi-RNN) is an RNN architecture that processes
sequential data in both forward and backward directions. This allows the network to have
information about both past and future context for any point in the sequence, which is not
possible in a standard unidirectional RNN.
Architecture:
1. It consists of two separate hidden layers:
o A forward hidden layer that processes the sequence from start to end (t=1 to
t=T).
o A backward hidden layer that processes the sequence from end to start (t=T
to t=1).
2. At each time step t, the output is computed based on the concatenation or combination
of the hidden states from both the forward and backward layers (h_t = [h_forward_t,
h_backward_t]).
3. This combined context is then passed to the output layer.
[Diagram Description: A diagram would show a sequence of inputs (x1, x2, x3). Each input
is connected to two hidden layers: one processing left-to-right (forward RNN) and another
processing right-to-left (backward RNN). The hidden states from both directions at each time
step are combined (e.g., concatenated) and fed to an output layer (y1, y2, y3).]

(b) What is long short term memory? Compare and contrast LSTM and gated
recurrent units.
LSTM (Long Short-Term Memory) is a type of RNN with a gating mechanism to control
the flow of information. It solves the vanishing gradient problem and can learn long-term
dependencies.
LSTM vs. GRU (Gated Recurrent Unit):

Feature LSTM GRU

Gates Three gates: Input, Forget, Output. Two gates: Update, Reset.

Two state vectors: Cell State (c_t) and

Internal State One state vector: Hidden State (h_t).
Hidden State (h_t).

The cell state acts as a "conveyor The hidden state captures both long
Functionality belt" for long-term memory, carefully and short-term dependencies without a
regulated by the gates. separate cell state.

Complexity More complex, has more parameters. Simpler, has fewer parameters.

Training
Can be slower to train. Often faster to train due to simplicity.
Speed

Can model very long sequences Often performs comparably to LSTM

Performance
effectively. on many tasks with less data.

Conclusion: GRU is a simpler, more efficient alternative to LSTM and often performs just as
well, especially on smaller datasets. LSTM might still be preferred for tasks requiring
modeling of very long-term dependencies.

14. (a) Discuss the various performance metrics to evaluate a deep learning model with
an example.
Performance metrics quantify a model's effectiveness. The choice depends on the task.
• For Classification:
o Accuracy: (TP+TN)/(TP+TN+FP+FN). Proportion of correct predictions.
Good for balanced classes.
▪ Example: A cat/dog classifier with 95% accuracy means it's correct
95% of the time.
o Precision: TP/(TP+FP). How many of the predicted positives are actual
positives.
▪ Example: A spam detector with high precision means when it says
"spam," it's very likely correct (low false positives).
o Recall (Sensitivity): TP/(TP+FN). How many of the actual positives were
correctly predicted.
▪ Example: A cancer detection model with high recall misses very few
actual cancer cases (low false negatives).
o F1-Score: Harmonic mean of precision and recall. Balances the two.
o Confusion Matrix: A table showing correct and incorrect predictions for each
class.
• For Regression:
o Mean Absolute Error (MAE): Average of absolute differences between
predicted and actual values. Robust to outliers.
o Mean Squared Error (MSE): Average of squared differences. Punishes
larger errors more heavily.

(b) What are hyperparameters? Discuss the steps to perform hyperparameter tuning.
Hyperparameters are configuration parameters external to the model whose values cannot
be estimated from the data. They are set before the training process begins. Examples:
Learning rate, number of layers, number of neurons per layer, batch size, dropout rate.
Steps for Hyperparameter Tuning:
1. Define a Search Space: Identify the hyperparameters to tune and their possible value
ranges (e.g., learning rate: [0.1, 0.01, 0.001]).
2. Choose a Tuning Method:
o Manual Search: Manually tweaking based on intuition and experience.
o Grid Search: Exhaustively searching over a specified set of values. It's
thorough but computationally expensive.
o Random Search: Randomly sampling combinations from the search space.
Often more efficient than grid search.
o Bayesian Optimization: A probabilistic model that uses past evaluation
results to choose the next hyperparameters to evaluate. It's very efficient for
expensive models.
3. Select a Performance Metric: Choose a metric to evaluate the models (e.g.,
validation accuracy, F1-score).
4. Execute the Search: Run the training and evaluation process for each
hyperparameter combination using cross-validation.
5. Select the Best Model: Identify the hyperparameter set that yielded the best
performance on the validation metric.
6. Evaluate on Test Set: Finally, assess the performance of the best-tuned model on the
held-out test set to get an unbiased estimate of its generalization error.

15. (a) Justify your answer, that how autoencoders are suitable compared to Principal
Component Analysis (PCA) for dimensionality reduction.
While both Autoencoders and PCA are used for dimensionality reduction, autoencoders are
generally more powerful and suitable for complex data due to their non-linear nature.
• PCA: Is a linear technique. It performs a linear transformation of the data to find the
orthogonal directions (principal components) of maximum variance. It is simple, fast,
and deterministic.
• Autoencoder: Is a non-linear technique. It uses a neural network with an encoder (to
compress) and a decoder (to reconstruct). The bottleneck layer acts as the low-
dimensional representation.
Justification for Autoencoders:
1. Non-Linearity: Autoencoders can learn complex, non-linear manifolds in the data,
whereas PCA is limited to linear subspaces. For real-world data like images (which lie
on non-linear manifolds), autoencoders can capture the structure much more
effectively.
2. Representation Power: The hidden layers in an autoencoder can learn hierarchical
features, leading to a more powerful and meaningful latent space representation.
3. Flexibility: The architecture is highly flexible. We can use convolutional layers for
images, different types of regularizations (sparse, denoising), and different loss
functions tailored to the data.
Conclusion: For simple, linear data, PCA is sufficient and efficient. However, for complex,
high-dimensional data like images, audio, and text, autoencoders are far more suitable as they
can learn a more efficient and powerful non-linear reduced representation.
(b) What is a generative adversarial network? Explain the architecture with a diagram.
A Generative Adversarial Network (GAN) is a class of deep learning frameworks designed
for generative modeling. It consists of two neural networks, a Generator and
a Discriminator, that are trained simultaneously in a competitive game.
Architecture:
1. Generator (G): Takes random noise as input and tries to generate fake data that is
indistinguishable from real data. Its goal is to fool the Discriminator.
2. Discriminator (D): Takes both real data and fake data from the generator as input. It
tries to correctly classify whether the input is real or fake. Its goal is to become a
perfect classifier.
The Training Process (The Adversarial Game):
• The generator and discriminator are pitted against each other.
• The generator tries to produce more realistic data to fool the discriminator.
• The discriminator gets better at distinguishing real from fake.
• This competition drives both networks to improve until the generator produces highly
realistic data.
[Diagram Description: A diagram would show a "Noise Vector" input going into the
"Generator" network, which produces "Fake Data." Both "Real Data" from the dataset and
"Fake Data" from the generator are fed into the "Discriminator" network. The Discriminator
outputs a probability ("Real" or "Fake"). Arrows would indicate the flow of data and the
adversarial feedback loop, where the discriminator's output is used to update both networks.]

PART C — (1 × 15 = 15 marks)
16. (a) Discuss the various loss functions in neural networks.
A loss function (or cost function) measures the discrepancy between the model's prediction
and the actual target value. It is the objective that the model aims to minimize during training.
1. For Regression Tasks:
• Mean Squared Error (MSE): MSE = (1/n) * Σ(y_true - y_pred)²
o Pros: Easily differentiable, convex.
o Cons: Sensitive to outliers (due to squaring).
• Mean Absolute Error (MAE): MAE = (1/n) * Σ|y_true - y_pred|
o Pros: Robust to outliers.
o Cons: Gradients are not smooth around zero, which can slow down
convergence.
2. For Classification Tasks:
• Binary Cross-Entropy: Used for binary classification (2 classes).
o L = -[y_true * log(y_pred) + (1 - y_true) * log(1 - y_pred)]
o Heavily penalizes confident but wrong predictions.
• Categorical Cross-Entropy: Used for multi-class classification (>2 classes) with
one-hot encoded labels.
o L = - Σ y_true_i * log(y_pred_i)
o Measures the difference between the predicted probability distribution and the
true distribution.
• Sparse Categorical Cross-Entropy: Same as above, but used when the labels are
integers (not one-hot encoded), which is more memory efficient.
3. Other Specialized Loss Functions:
• Huber Loss: Combines MSE and MAE. It is less sensitive to outliers than MSE and
is smooth around zero.
• Hinge Loss: Used for "maximum-margin" classification, notably in Support Vector
Machines (SVMs).
The choice of loss function is critical as it directly guides the learning process of the neural
network.

(b) Discuss the steps involved in grid search with an example.

Grid Search is a traditional method for hyperparameter tuning that performs an exhaustive
search over a manually specified subset of the hyperparameter space.
Steps:
1. Define the Model: Choose the algorithm (e.g., a Support Vector Classifier).
2. Define the Hyperparameter Grid: Specify the hyperparameters and the values you
want to try for each.
o Example Grid for an SVM:
▪ 'kernel': ['linear', 'rbf']
▪ 'C': [0.1, 1, 10, 100]
▪ 'gamma': [0.01, 0.1, 1]
3. Define the Evaluation Metric: Choose a scoring metric to evaluate performance
(e.g., 'accuracy').
4. Perform Cross-Validation: Typically, k-fold cross-validation (e.g., 5-fold) is used
for each hyperparameter combination to get a robust performance estimate and avoid
overfitting to a single train-validation split.
5. Execute the Search: The algorithm will train and evaluate a model for every single
combination of hyperparameters in the grid.
o For the example above, the number of combinations is 2 (kernels) × 4 (C
values) × 3 (gamma values) = 24 unique models.
o With 5-fold cross-validation, it will train 24 × 5 = 120 models in total.
6. Identify the Best Parameters: After all runs are complete, the combination of
hyperparameters that achieved the highest average cross-validation score is selected
as the best.
7. Train the Final Model: Finally, train a new model on the entire training set using
these best-found hyperparameters and evaluate it on the held-out test set.
Limitation: It can be computationally very expensive when the hyperparameter space is
large.

Common questions

Pooling layers in convolutional neural networks reduce the spatial size of the representation, which decreases the computational load, memory usage, and number of parameters in the network. This reduction helps in making the detection of features invariant to minor changes in scale and orientation, thus improving the model's ability to generalize. Max pooling, a common type, outputs the maximum value from the region of the feature map covered by the filter, providing a form of translational invariance .

Supervised learning uses labeled data, meaning that each input is paired with the correct output, which enables the model to learn the mapping from inputs to outputs for the purpose of prediction or classification. In contrast, unsupervised learning uses unlabeled data, where the system attempts to find hidden patterns, structures, or groupings solely from the inputs without explicit output annotations .

Long Short-Term Memory (LSTM) networks address the vanishing gradient problem that traditional Recurrent Neural Networks (RNNs) face, which makes it difficult to learn long-term dependencies. LSTMs incorporate a gating mechanism with input, forget, and output gates along with a cell state, enabling them to selectively forget or remember information over extended time periods. This capability is crucial for tasks requiring understanding of long-range dependencies, such as sequence prediction and language modeling, where traditional RNNs would struggle to maintain context and temporal information throughout sequences .

Autoencoders are often more suitable than Principal Component Analysis (PCA) for dimensionality reduction on complex data like images due to their ability to capture non-linear relationships. Autoencoders, being neural network-based, can learn intricate features and representations through multiple layers, unlike PCA which is a linear method limited to linear transformations. This non-linearity enables autoencoders to effectively capture the manifold of real-world data, such as images, which often lie on non-linear subspaces, thus providing a more powerful and informative lower-dimensional representation .

Stochastic Gradient Descent (SGD) has the advantage of being faster and allowing for online learning because it updates the model's parameters using only a single or a small batch of randomly selected training examples instead of the entire dataset, as in standard Gradient Descent. This makes it more efficient, especially for large datasets, but can also result in a noisier path to the minimum due to the random selection of samples, which may cause the algorithm to converge less stably or even to a suboptimal solution .

Underfitting occurs when a model is too simplistic to capture the underlying structure of the data, resulting in poor performance on both training and test datasets. This issue can be addressed by increasing the complexity of the model, which may involve adding more parameters, using more intricate models, or introducing nonlinear features. Additionally, ensuring sufficient training data or minimizing information loss through feature extraction techniques can further mitigate underfitting .

A Convolutional Neural Network (CNN) primarily consists of convolutional layers, pooling layers, and fully connected layers. Convolutional layers detect spatial hierarchies by focusing on local patterns via kernels. Transposed convolutions, also known as deconvolutions, are used for upsampling, reversing the downsampling of feature maps to higher resolutions, vital for tasks like segmentation. Dilated convolutions expand the receptive field without increasing the number of parameters by spacing out the kernel elements, thereby capturing more contextual information from the input without pooling, essential in tasks requiring detailed spatial understanding .

A baseline model provides a reference point in deep learning experiments that serves as the minimum performance threshold. It is usually a simple model, either non-deep-learning or a very simple neural network, and its main role is to give an indication of what level of performance a sophisticated model must exceed in order to be considered successful. This encourages more complex models to not only improve upon this baseline but also ensures that any improvements are significant and not due to chance .

Hyperparameters are external configuration parameters for neural networks that aren't learned from the data, such as learning rates, number of layers, or dropout rates. Effective hyperparameter tuning involves several steps: defining a search space for potential hyperparameters, choosing a method for searching the space (e.g., grid, random, Bayesian optimization), determining performance metrics for evaluation, executing the search through cross-validation, and finally selecting the best hyperparameters by evaluating the model on a hold-out test set. This tuning aims to optimize model performance while minimizing overfitting risks .

A Generative Adversarial Network (GAN) is composed of two neural networks: the Generator and the Discriminator. The Generator takes random noise as input and aims to generate data that resembles the real data. The Discriminator receives both real and generated data and tries to distinguish between them. During training, these networks are engaged in a min-max game; the Generator attempts to improve its data's realism to fool the Discriminator, while the Discriminator enhances its ability to differentiate real from fake. Their adversarial progression drives the Generator to produce increasingly realistic outputs .

Deep Learning Exam Questions & Answers
No ratings yet
Deep Learning Exam Questions & Answers
14 pages
Deep Learning 5 Mark QA Detailed All
No ratings yet
Deep Learning 5 Mark QA Detailed All
31 pages
Deep Learning Concepts and Applications
No ratings yet
Deep Learning Concepts and Applications
18 pages
Bias-Variance Trade-off & CNN Overview
No ratings yet
Bias-Variance Trade-off & CNN Overview
14 pages
AI, ML, and DL Explained Concisely
No ratings yet
AI, ML, and DL Explained Concisely
18 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
17 pages
Deep Learning MCQs and Concepts Guide
No ratings yet
Deep Learning MCQs and Concepts Guide
17 pages
Deep Learning Concepts and Techniques
No ratings yet
Deep Learning Concepts and Techniques
15 pages
Deep Learning Concepts and Techniques
No ratings yet
Deep Learning Concepts and Techniques
6 pages
Deep Learning Applications and Concepts
No ratings yet
Deep Learning Applications and Concepts
6 pages
Deep Learning: Key Concepts & Questions
No ratings yet
Deep Learning: Key Concepts & Questions
27 pages
DEEP LEARNING - 2marks With Answers
No ratings yet
DEEP LEARNING - 2marks With Answers
12 pages
Deep Learning Overview and Techniques
No ratings yet
Deep Learning Overview and Techniques
8 pages
Deep Learning Fundamentals and Concepts
No ratings yet
Deep Learning Fundamentals and Concepts
6 pages
Deep Learning: Types & Key Concepts
No ratings yet
Deep Learning: Types & Key Concepts
12 pages
Deep_Learning_with_Python_Short_QA_Ch1_to_Ch7
No ratings yet
Deep_Learning_with_Python_Short_QA_Ch1_to_Ch7
17 pages
Deep Learning Fundamentals and Techniques
No ratings yet
Deep Learning Fundamentals and Techniques
6 pages
Unit Saturation in Deep Learning
No ratings yet
Unit Saturation in Deep Learning
37 pages
Deep Learning Fundamentals and Applications
No ratings yet
Deep Learning Fundamentals and Applications
6 pages
Deep Learning Laboratory Viva Questions
No ratings yet
Deep Learning Laboratory Viva Questions
8 pages
Deep Learning Viva Questions Guide
No ratings yet
Deep Learning Viva Questions Guide
7 pages
CNN Misconceptions and Characteristics
50% (2)
CNN Misconceptions and Characteristics
51 pages
Deep Learning Concepts Explained
No ratings yet
Deep Learning Concepts Explained
51 pages
Neural Network Concepts and Techniques
No ratings yet
Neural Network Concepts and Techniques
36 pages
Deep Learning Concepts: Overfitting, Bias, and Networks
No ratings yet
Deep Learning Concepts: Overfitting, Bias, and Networks
23 pages
DRDO CABS Internship: ML & DL Insights
No ratings yet
DRDO CABS Internship: ML & DL Insights
28 pages
Deep Learning Exam Q&A Guide
No ratings yet
Deep Learning Exam Q&A Guide
5 pages
Neural Network Basics and Techniques
No ratings yet
Neural Network Basics and Techniques
9 pages
Deep Learning Exam Answer Key 2025
No ratings yet
Deep Learning Exam Answer Key 2025
10 pages
DL QB
No ratings yet
DL QB
23 pages
Deep Learning Interview Q&A Guide
No ratings yet
Deep Learning Interview Q&A Guide
9 pages
120 Deep Learning Important Questions + Answers ?
100% (1)
120 Deep Learning Important Questions + Answers ?
68 pages
Key Deep Learning Concepts and Questions
No ratings yet
Key Deep Learning Concepts and Questions
5 pages
Machine Learning Interview Q&A Guide
No ratings yet
Machine Learning Interview Q&A Guide
73 pages
Neural Networks Terminology Explained
No ratings yet
Neural Networks Terminology Explained
8 pages
Understanding Deep Learning Basics
No ratings yet
Understanding Deep Learning Basics
17 pages
Deep Learning Concepts and Techniques
No ratings yet
Deep Learning Concepts and Techniques
5 pages
Understanding Deep Feedforward Networks
No ratings yet
Understanding Deep Feedforward Networks
5 pages
Deep Learning Concepts and Techniques
No ratings yet
Deep Learning Concepts and Techniques
8 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
Neural Networks: Key Concepts and Techniques
No ratings yet
Neural Networks: Key Concepts and Techniques
24 pages
Deep Learning Question Bank 2022
No ratings yet
Deep Learning Question Bank 2022
14 pages
Minimizing Vanishing Gradient Issues
No ratings yet
Minimizing Vanishing Gradient Issues
13 pages
Neural Networks Syllabus and Concepts
No ratings yet
Neural Networks Syllabus and Concepts
26 pages
Deep Learning Concepts and Challenges
No ratings yet
Deep Learning Concepts and Challenges
6 pages
CNN Architecture and Deep Learning Techniques
No ratings yet
CNN Architecture and Deep Learning Techniques
11 pages
Neural Network Fundamentals Explained
No ratings yet
Neural Network Fundamentals Explained
19 pages
Deep Learning Viva Q&A Guide
No ratings yet
Deep Learning Viva Q&A Guide
4 pages
Introduction to Deep Learning Concepts
No ratings yet
Introduction to Deep Learning Concepts
39 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
7 pages
Deep Learning Research Interview Q&A
No ratings yet
Deep Learning Research Interview Q&A
7 pages
Mid-2 ML & DL
No ratings yet
Mid-2 ML & DL
5 pages
ReLU, Backpropagation, and CNN Insights
No ratings yet
ReLU, Backpropagation, and CNN Insights
51 pages
Deep Learning Activation Functions Explained
No ratings yet
Deep Learning Activation Functions Explained
4 pages
RNN and CNN Concepts in Deep Learning
No ratings yet
RNN and CNN Concepts in Deep Learning
4 pages
Deep Learning MCQ Study Guide
No ratings yet
Deep Learning MCQ Study Guide
9 pages
Understanding Feedforward Neural Networks
No ratings yet
Understanding Feedforward Neural Networks
44 pages
CNN Layers and Regularization Techniques
No ratings yet
CNN Layers and Regularization Techniques
18 pages
LAN vs WAN: Key Differences Explained
No ratings yet
LAN vs WAN: Key Differences Explained
3 pages
Z-Test, T-Test, and ANOVA Explained
No ratings yet
Z-Test, T-Test, and ANOVA Explained
9 pages
Data Science Fundamentals Overview
No ratings yet
Data Science Fundamentals Overview
3 pages
Neo4j: The Leading Graph Database
No ratings yet
Neo4j: The Leading Graph Database
2 pages
Cloud Computing Laboratory Experiments
No ratings yet
Cloud Computing Laboratory Experiments
68 pages
Z-Test, T-Test, and ANOVA Explained
No ratings yet
Z-Test, T-Test, and ANOVA Explained
9 pages
Degradation-Aware Image Restoration Method
No ratings yet
Degradation-Aware Image Restoration Method
9 pages
Hybrid RAG for Multilingual Legal Retrieval
No ratings yet
Hybrid RAG for Multilingual Legal Retrieval
8 pages
NLP Exam Notes: Theory & Methods
No ratings yet
NLP Exam Notes: Theory & Methods
4 pages
Multi-layer Perceptron Overview
No ratings yet
Multi-layer Perceptron Overview
80 pages
Video Prediction with Spatiotemporal Transformers
No ratings yet
Video Prediction with Spatiotemporal Transformers
14 pages
Clustering Techniques and Applications Review
No ratings yet
Clustering Techniques and Applications Review
15 pages
Rain-Snow Partitioning Accuracy Limits
No ratings yet
Rain-Snow Partitioning Accuracy Limits
14 pages
Image Enhancement Techniques Overview
No ratings yet
Image Enhancement Techniques Overview
77 pages
Guvi 40 Most Asked Data Science Interview Questions & Answers 2026
No ratings yet
Guvi 40 Most Asked Data Science Interview Questions & Answers 2026
21 pages
Medical Image Feature Extraction Techniques
No ratings yet
Medical Image Feature Extraction Techniques
32 pages
Restoring Early 20th-Century Photos
No ratings yet
Restoring Early 20th-Century Photos
3 pages
Employee Salary Prediction with ML
No ratings yet
Employee Salary Prediction with ML
17 pages
AI & Machine Learning Handbook 2025
No ratings yet
AI & Machine Learning Handbook 2025
16 pages
Diabetes Prediction with Machine Learning
No ratings yet
Diabetes Prediction with Machine Learning
8 pages
12-Month AI/ML Learning Plan Guide
No ratings yet
12-Month AI/ML Learning Plan Guide
2 pages
Digital Image Processing Assignment Guide
No ratings yet
Digital Image Processing Assignment Guide
2 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
98 pages
Fine-Tuning BERT for Classification Tasks
No ratings yet
Fine-Tuning BERT for Classification Tasks
72 pages
Brain Tumor Detection with AI and MRI
No ratings yet
Brain Tumor Detection with AI and MRI
21 pages
CoviNet: Deep Learning for COVID-19 X-ray Detection
No ratings yet
CoviNet: Deep Learning for COVID-19 X-ray Detection
6 pages
Advanced ML & DL in Remote Sensing
No ratings yet
Advanced ML & DL in Remote Sensing
364 pages
VideoCLIP: Zero-Shot Video-Text Learning
No ratings yet
VideoCLIP: Zero-Shot Video-Text Learning
14 pages
Python Data Science & Deep Learning Course
No ratings yet
Python Data Science & Deep Learning Course
2 pages
Deep Learning Course Overview and Structure
No ratings yet
Deep Learning Course Overview and Structure
12 pages
Referring Camouflaged Object Detection
No ratings yet
Referring Camouflaged Object Detection
14 pages
CCS355 Lab Manual: Neural Networks
No ratings yet
CCS355 Lab Manual: Neural Networks
12 pages
IPL Score Prediction with Deep Learning
No ratings yet
IPL Score Prediction with Deep Learning
17 pages
Online vs Offline RL for LLM Alignment
No ratings yet
Online vs Offline RL for LLM Alignment
41 pages
Multi-Resolution Image Representation Techniques
No ratings yet
Multi-Resolution Image Representation Techniques
13 pages
IIT Kanpur Intern 2024-25 Proforma
No ratings yet
IIT Kanpur Intern 2024-25 Proforma
3 pages

Supervised vs Unsupervised Learning Explained

Uploaded by

Supervised vs Unsupervised Learning Explained

Uploaded by

PART A — (10 × 2 = 20 marks)

1. Differentiate supervised and unsupervised learning.

Basis Supervised Learning Unsupervised Learning

To learn a mapping from inputs to To find hidden patterns, structures,

2. What is stochastic gradient descent?

(b) Explain the operations of a deep feedforward network with a diagram.

(b) How to introduce non-linearity in a convolutional neural network? Explain with an

Feature LSTM GRU

Two state vectors: Cell State (c_t) and

Can model very long sequences Often performs comparably to LSTM

(b) Discuss the steps involved in grid search with an example.

Common questions

Explain how pooling layers in convolutional neural networks contribute to the model's efficiency and feature detection.

How does supervised learning differ from unsupervised learning in terms of data usage and goals?

In what ways do Long Short-Term Memory (LSTM) networks improve upon the limitations of traditional Recurrent Neural Networks (RNNs), and why are these improvements significant?

Why might autoencoders be more suitable than Principal Component Analysis (PCA) for dimensionality reduction on complex data like images?

What are the main advantages of using Stochastic Gradient Descent (SGD) over standard Gradient Descent, and what are its potential drawbacks?

What challenges does underfitting present in model training, and how can these challenges be addressed?

Describe the architectural components of a Convolutional Neural Network (CNN) and the advantages of using transposed and dilated convolutions.

What is the primary role of a baseline model in deep learning experiments, and why is it important?

What are hyperparameters in the context of neural networks, and how can one effectively perform hyperparameter tuning?

Explain the two components of a Generative Adversarial Network (GAN) and how they function together during the training process.

You might also like