0% found this document useful (0 votes)
14 views26 pages

Deep Learning

The document covers various deep learning techniques including localization, regression, embeddings, and inverse problems. It discusses algorithms like YOLO and Faster R-CNN for localization, and highlights the importance of embeddings in representing high-dimensional data. Additionally, it addresses recent trends in deep learning architectures such as transformers and residual networks, emphasizing their applications and advantages.

Uploaded by

jip873464
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views26 pages

Deep Learning

The document covers various deep learning techniques including localization, regression, embeddings, and inverse problems. It discusses algorithms like YOLO and Faster R-CNN for localization, and highlights the importance of embeddings in representing high-dimensional data. Additionally, it addresses recent trends in deep learning architectures such as transformers and residual networks, emphasizing their applications and advantages.

Uploaded by

jip873464
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Deep learning

Unit - 7

Localization Techniques in Deep Learning

Concept of Localization

Definition

Localization involves identifying the exact position of one or more objects in an image by placing
bounding boxes or masks around them.

Purpose

While classification labels an entire image, localization focuses on identifying specific areas where
objects are present.

Types of Localization

1. Bounding Box Localization: Places rectangular boxes around objects.

2. Segmentation (Instance/Pixel-level): Identifies object boundaries at the pixel level.

Algorithms

YOLO (You Only Look Once)

 Description: Directly predicts bounding boxes and class probabilities for multiple objects in a
single pass.

 Advantages: Fast and efficient for real-time applications.

 Disadvantages: Struggles with detecting small objects in complex scenes.

Faster R-CNN

 Description: Two-stage process:

1. Region Proposal Network (RPN) identifies potential object regions.

2. A separate network refines these regions for precise classification and localization.

 Advantages: High accuracy, particularly in detecting small and overlapping objects.

 Disadvantages: Computationally slower than YOLO.

Regression in Deep Learning

Definition

Regression predicts continuous numerical outputs based on input data. In which we try to find best
fit line which can predict the output more accurately. It is a statistical approach used to analyze the
relationship between a dependent variable and one or more independent variables and it is
supervised learning approach.

Example
Predicting house prices, temperatures, or stock market trends.

Key Characteristics

 Objective: The task of regression algo is to map the input variable (x) with continuous
output variable (y) .

 Common Loss Functions:

o Mean Squared Error (MSE)

o Mean Absolute Error (MAE)

Applications

 Predicting sales trends over time.

 Forecasting weather conditions.

 Estimating real-world physical properties like weight or density.

Embeddings and DrLIM

Embeddings

Definition

Embeddings represent high-dimensional data in a lower-dimensional vector space while preserving


meaningful relationships. Enable machine learning models to understand and process categorical or
structured data.

Examples

1. Word Embeddings (e.g., Word2Vec, GloVe, BERT) capture relationships between words
based on their context in a corpus.

Example: "King - Man + Woman = Queen."

2. Image Embeddings: Map image features into vector spaces for comparison or clustering.

DrLIM (Dimensionality Reduction by Learning an Invariant Mapping)

Purpose:
DrLIM creates embeddings that stay consistent (invariant) even if the data is rotated, scaled,
or translated.
It learns to group similar data points together, no matter how they are transformed, making it
easier for models to analyze and work with such data.

Process:

DrLIM uses a contrastive loss function to train a neural network:

 For similar pairs of inputs: Reduces the distance between their embeddings.
 For dissimilar pairs: Increases the distance beyond a margin.
 Learns relationships between data points, making it especially effective for structured
tasks.

Applications:

 Supervised Learning
 Clustering and Retrieval

Inverse Problems in Deep Learning

Definition

Inverse problems involve reconstructing inputs (causes) from given outputs (effects).

 Inverse Problems Ill-posedness:


Non-uniqueness: Multiple inputs can produce the same output.

 Instability: Small changes in the output can lead to large variations in the
reconstructed input.

Applications

1. Image Reconstruction: Recovering high-quality images from noisy or incomplete data.

2. Medical Imaging: Enhancing CT or MRI images for diagnostic purposes.

3. Physics Simulations: Estimating initial conditions of physical systems.

Example Problems

 Image Deblurring: Remove blur caused by camera shake.

 Super-Resolution: Enhance the resolution of low-resolution images.

Extensions to Non-Euclidean Domains

Definition

In deep learning, traditional methods often work in Euclidean domains, where data lies in flat,
regular spaces like grids (e.g., images or time series). However, many real-world data types, such as
graphs, manifolds, or irregular networks, exist in non-Euclidean domains, which have complex
structures or geometries.

Non-Euclidean Domains focus on

1. Graphs: Represent relationships (e.g., social networks, molecular structures).

2. Manifolds: Curved spaces like the Earth's surface.

Graph Neural Networks (GNNs)

Graph Neural Networks Process and analyze graph-structured data.


How GNNs Work

 Nodes: Represent entities.

 Edges: Represent relationships between entities.

GNNs update a node by using information from its neighbors through the edges.

Applications

 Social Networks: Friend recommendations, influencer detection.

 Drug Discovery: Predict molecular properties from graph structures.

Recurrent Neural Networks (RNNs)

RNNs are a type of neural network designed to process sequential data, where the current output
depends on previous computations. They are particularly useful for tasks involving time series,
language, and any data with temporal or sequential relationships. Feedback loops enable the
network to retain memory of previous computations.

Challenges of RNNs

1. Vanishing and Exploding Gradients:


o During backpropagation, gradients can become very small (vanish) or very
large (explode), making learning difficult.
2. Limited Long-Term Memory:
o Standard RNNs struggle to capture dependencies in long sequences.

Types of RNNs

LSTM (Long Short-Term Memory)

 Introduces gates to control information flow, mitigating the vanishing gradient problem.

 Applications: Speech recognition, language modeling.

GRU (Gated Recurrent Unit)

 Simplified version of LSTM, with fewer gates but comparable performance.

Applications

1. Time-Series Forecasting: Stock prices, weather prediction.

2. Text Generation: Generating coherent sentences or paragraphs.

3. Speech Recognition: Converting speech to text.


Unit - 1

Introduction to Deep Learning

Deep Learning is a subfield of machine learning inspired by the structure and function of the human
brain, particularly artificial neural networks (ANNs). It uses multi-layered neural networks to model
complex patterns in data.

Key Features of Deep Learning:

1. Deep Neural Networks: Utilizes multiple hidden layers between input and output layers.

2. Feature Learning: Automatically extracts and learns features from raw data.

3. Scalability: Performs better with larger datasets and computational resources.

Example:

To classify images of cats and dogs, a convolutional neural network (CNN) learns features like edges,
textures, and higher-level patterns directly from the images without manual feature engineering.

Practical Applications:

1. Image Recognition: Used in facial recognition, medical imaging, and autonomous vehicles.

2. Natural Language Processing (NLP): Powers chatbots, translation tools, and sentiment
analysis.

3. Speech Recognition: Converts spoken words to text, as used in virtual assistants like Alexa
and Siri.

Bayesian Learning

Bayesian Learning is a probabilistic approach to learning based on Bayes' Theorem. It updates the
probability of a hypothesis as more evidence or data becomes available.

Bayes' Theorem: Bayes' Theorem calculates the probability of an event A occurring given that
another event B has occurred

Practical Applications:

1. Spam Filtering: Classifying emails as spam or not.

2. Medical Diagnosis: Updating disease probabilities based on new symptoms.

3. Weather Prediction: Refining forecasts with incoming data.


Decision Surfaces

Decision Surfaces are boundaries that separate different classes in a feature space. They help in
visualizing and understanding how a model distinguishes between classes.

Characteristics:

 Linear Decision Surface: Formed by linear classifiers like Logistic Regression.

 Non-Linear Decision Surface: Formed by complex models like Neural Networks and SVMs
(with kernels).

Example:

In a 2D space, classifying points as red or blue may result in:

 A straight line (linear decision surface) for simpler models.

 A curved boundary (non-linear decision surface) for complex models.

Practical Applications:

1. Binary Classification: Identifying spam emails (spam vs. not spam).

2. Multi-Class Classification: Categorizing handwritten digits (0–9).

3. Cluster Analysis: Separating data clusters in unsupervised learning.

Unit - 2

Linear Classifiers: A Deep Explanation

Chat gpt link

linear classifier tries to draw a straight line (or a plane/hyperplane) to separate data points belonging
to different classes. These classifiers assume that the decision boundary between different classes
can be represented as a straight line (in 2D), plane (in 3D), or hyperplane (in higher dimensions). It is
widely used in machine learning for classification tasks because of its simplicity, efficiency, and
interpretability.

Types of Linear Classifiers:

1. Logistic Regression

Logistic regression is a fundamental statistical model used for binary classification. Logistic
regression is a type of linear classifier, but it models the probability of a data point belonging to a
particular class using the logistic function (sigmoid).It gives probabilities between 0 and 1

 Applications:

o Email sam detection

o Medical diagnosis (e.g., predicting if a patient has a particular disease)


2. Support Vector Machine (SVM)

Support Vector Machines (SVMs) are another popular linear classifier that aims to find the
maximum-margin hyperplane that best separates the data into classes.

 How it works:

o The SVM tries to find the hyperplane that maximizes the margin between the two
classes. The margin is the distance between the hyperplane and the nearest data
points from either class (called support vectors).

o The objective is to maximize this margin, as a larger margin implies a better


generalization ability.

 Key Features:

o SVMs are often used in both linear and non-linear classification tasks. For linear
classification, it directly applies a linear hyperplane. For non-linear classification, the
kernel trick is used to map data into higher dimensions where a linear decision
boundary can be found.

 Applications:

o Text classification (e.g., sentiment analysis)

o Handwriting recognition

3. Perceptron

The Perceptron is one of the simplest types of neural networks and can be viewed as a linear
classifier. It is part of supervised leaning .

 Applications:

o Early models of neural networks

o Binary classification tasks (e.g., determining whether an email is spam or not)

Hinge Loss:

Hinge loss, also known as max-margin loss, is a loss function primarily used in Support Vector
Machines (SVMs) for classification tasks. Hinge loss is widely used for binary classification, though it
can be adapted to multi-class classification as well.

Understanding Hinge Loss Formula:


For a classifier to be "correct," the following conditions should hold:

1. Correct Classification: The predicted class label should be the same as the true label.

2. Margin: The classifier’s decision boundary should be sufficiently far from the data points of
both classes.

The hinge loss works by penalizing points that are on the wrong side of the margin or are too close to
the decision boundary. The max(0, .) part ensures that the loss is zero when the classifier is correct
and confidently far from the decision boundary

Unit – 6

Recent trends in deep learning architectures:

1. Transformer Architectures: Used in NLP and vision tasks (e.g., BERT, GPT, ViT ) and it use self-
attention mechanisms to focus on important parts of the input.

2. Self-Supervised Learning (SSL): Learning representations without labeled data (e.g., SimCLR,
BYOL).

3. Graph Neural Networks (GNNs): Applied to graph-structured data (e.g., node classification,
recommendation systems).

4. Neural Architecture Search (NAS): It Automated design the neural networks, Instead of
manually tuning architectures it use algorithms to search most optimal model.

5. Federated Learning: Federated learning allows training a model on multiple devices without
sharing raw data, sending only updates to improve the model while maintaining privacy.

6. Neural Radiance Fields (NeRF): NeRF is a neural network-based method for generating 3D
scenes from 2D images.

7. Few-Shot and Zero-Shot Learning: Learning with minimal or no labeled data.

8. Energy-Efficient Deep Learning: Focus on reducing computational power (e.g., EfficientNet,


TinyML).

9. Generative Models and Diffusion Models: Generative and Diffusion Models create high-
quality content, with Diffusion Models often outperforming GANs .

10. Multimodal Learning: Combining different types of data (e.g., CLIP, DALL·E).
Residual Networks (ResNet)

Residual Networks, or ResNet, is a deep learning architecture introduced in 2015 to


overcome the challenges of training very deep neural networks, particularly the
vanishing/exploding gradient Problem.

Problem:

Vanishing Gradient

When gradients become too small, earlier layers in the network stop learning, making it hard
for the model to improve

Exploding Gradient

When gradients become too large, the training process becomes unstable and the model
doesn’t work properly.

Why Residual Networks?

 Traditional deep networks (e.g., VGG) show increased training and testing errors as
their depth increases beyond a certain point.
 ResNet solves this issue by introducing Residual Blocks that use skip connections to
bypass some layers, ensuring better gradient flow and learning.

Residual Block

 The input x goes through a series of convolution (conv) and ReLU activation,
which gives us F(x).
 Then, the result F(x) is added to the original input x. We call this H(x) = F(x) + x.
 In regular CNNs, H(x) would just be F(x) (no addition to the input).
Now Instead of learning the full mapping H(x), ResNet forces the network to learn a
residual function F(x):

F(x)=H(x)−x⟹H(x)=F(x)+x

Skip Connection:

o A skip connection (or shortcut) directly connects the input of a residual block
to its output, bypassing one or more layers.
o This bypass allows the gradient to flow directly through the skip connection
during backpropagation, mitigating the vanishing gradient problem.
ResNet Architecture

The ResNet architecture is inspired by earlier networks like VGG-19, but it includes shortcut
connections that transform it into a residual network.

Diagram of a Residual Block:

Input ---> [Conv Layer] ---> [Batch Norm] ---> [ReLU] ---> [Conv Layer] ---
> [Batch Norm] ---> (+ Skip Connection) ---> Output
\
___________________________________________________________________________
____________/

ResNet
 ResNet is built by stacking residual blocks.
 Each residual block has two 3x3 convolution layers.
 As we go deeper, the network:
o Doubles the number of filters to capture more details.
o Reduces image size using stride 2 (shrinks height and width).
 There’s an extra convolution layer at the start of the network.
 No fully connected (FC) layers except the last one, which gives the final output .

ResNet Versions
 Comes in depths like 34, 50, 101, or 152 layers for ImageNet tasks.
 For deeper networks (ResNet-50 and beyond), it uses a bottleneck layer for
efficiency (similar to GoogLeNet).

Advantages of Residual Networks

1. Deep Learning without Degradation: Supports training of very deep networks


(100+ layers).
2. Reduced Overfitting: Shortcut connections regularize the network.
3. Improved Accuracy: Better performance on tasks like image classification
(ImageNet), object detection, etc.
4. Scalability: Can handle very large datasets and complex models.

Skip Connection Network

What is Skip Connection?

In a traditional neural network, each layer is connected to the next one, with the output from
one layer serving as the input to the next. However, this deep architecture can cause problems
like the vanishing gradient problem and difficulty in training very deep networks.
Skip connections are a technique where outputs from earlier layers are passed directly to later
layers, bypassing intermediate layers. These connections help improve gradient flow and
reduce the vanishing gradient problem.

Variants of Skip Connections

1. Identity Skip Connections: This is the simplest form where the input X is directly
added to the output of the convolutional layers F(X) without any change.
2. Projection Shortcut: If the dimensions of the input and output don't match a
projection (such as a 1x1 convolution) is used to match the dimensions before adding
the skip connection.
3. Bottleneck Architecture: In very deep networks, a "bottleneck" structure is often
used where a 1x1 convolution is applied to reduce the number of features, followed
by a 3x3 convolution, and then another 1x1 convolution to restore the original number
of features.

Applications of Skip Connections

1. ResNet: Classification, object detection, and segmentation.


2. U-Net: Medical image segmentation.
3. DenseNet: Feature reuse through dense connections.

Diagram of a Skip Connection in ResNet:

Input ---> [Conv Layer] ---> [Conv Layer] ---> (+ Skip Connection) --->
Output
\_________________________________/

Advantages of Skip Connections

1. Deeper Networks: Without skip connections, training a very deep network (e.g.,
100+ layers) is often impractical due to problems with vanishing gradients. Skip
connections allow the network to be trained effectively even with hundreds of layers.
2. Better Gradient Flow: The addition of the skip connection makes the network more
stable during backpropagation, as gradients can flow directly through the skip
connection.
3. Improved Performance: Skip connections help the network generalize better and
often lead to improved performance on tasks like image classification and
segmentation.

Fully CNN:

A Fully Connected Convolutional Neural Network (CNN) is a type of neural network that combines
the principles of both Convolutional Neural Networks (CNNs) and Fully Connected (FC) layers. This
architecture is typically used for tasks like image classification, object detection, and segmentation.
Here’s a detailed breakdown of how it works:
1. Convolutional Layers (Feature Extraction)

 Convolution is the first step in a CNN where filters (kernels) are applied to the input image.
These filters slide across the image to detect patterns like edges, textures, or colors.

 The Convolutional Layer is responsible for extracting features like lines, shapes, or complex
structures by performing the convolution operation.

 It’s important to note that filter weights are shared across the entire image, making the
process more efficient than fully connected networks that learn separate weights for each
pixel.

2. Activation Functions (Non-Linearity)

 After applying convolution, the output is passed through an activation function, such as
ReLU (Rectified Linear Unit), to introduce non-linearity.

 The ReLU function is often used because it’s computationally efficient and helps the model
learn complex patterns.

3. Pooling (Dimensionality Reduction)

 Pooling layers are used to reduce the dimensionality of the data after each convolution,
which reduces the number of parameters and computation.

 The most common pooling method is Max Pooling, which selects the maximum value from a
patch of the image (usually 2x2 or 3x3).

 Pooling helps retain the most important features while discarding irrelevant details.

4. Fully Connected Layers (Dense Layers)

 After several convolutional and pooling layers, the data is flattened into a one-dimensional
vector. This step is necessary because the Fully Connected (FC) Layers require input in this
form.

 The fully connected layers are traditional dense layers found in any neural network. These
layers connect every neuron to every other neuron in the next layer.

 Each neuron in the fully connected layer computes a weighted sum of inputs and applies an
activation function (like ReLU or Softmax) to produce an output.
5. Final Layer and Output

 The output layer typically has a number of neurons equal to the number of classes in a
classification problem. For binary classification, one output neuron with a sigmoid activation
might be used. For multi-class classification, a Softmax activation function is common, which
converts the output into probabilities.

 In the case of regression tasks, the output layer might contain just one neuron with a linear
activation function.

Advantages of Fully Connected CNNs:

1. Feature Learning: Automatically extracts features, reducing manual effort.

2. Hierarchical Representation: Learns patterns progressively from simple to complex.

3. Fewer Parameters: Weight sharing in convolution layers reduces computational costs.

Disadvantages:

1. Overfitting: Too many layers or parameters can lead to overfitting.

2. High Training Time: Requires significant computational resources for large datasets.

Here’s a detailed comparison of Fully Connected CNN and CNN in a table format:

Aspect CNN (Convolutional Neural Network) Fully Connected CNN (FCCNN)

Used for feature extraction and Combines feature extraction with dense
Purpose classification tasks, often in image (fully connected) layers to refine
recognition. decision-making.

Composed of convolutional layers, Similar to CNN but with a greater


Architecture pooling layers, and optionally fully emphasis on fully connected layers for
connected layers at the end. richer decision-making.

Fully Connected May or may not include fully connected Always includes fully connected layers
Layers layers, depending on the task. after convolutional layers.

Spatial May lose spatial relationships when Preserves spatial features to some extent
Information flattening data for fully connected layers. but still flattens data before dense layers.

Fixed-size inputs, especially for Fixed-size inputs, similar to traditional


Input Size
classification tasks. CNNs.

Single class label, probability vector, or Single class label or output based on
Output Type
feature representation. fully connected layer decisions.

Common Image classification, object detection, Classification tasks requiring rich feature
Applications and feature extraction. representation or regression tasks.

Advantages Efficient for tasks requiring hierarchical Combines the strengths of convolutional
feature extraction. layers and fully connected layers for
Aspect CNN (Convolutional Neural Network) Fully Connected CNN (FCCNN)

richer outputs.

May discard spatial information when More computationally expensive due to


Disadvantages
flattening for fully connected layers. additional dense layers.

Unit – 5

Convolutional Neural Networks (CNNs) Simplified

A Convolutional Neural Network (CNN) is a special type of deep learning model designed
primarily for analyzing images. CNNs are used in tasks like image recognition, object
detection, and segmentation. They are also important in other fields like autonomous driving,
security systems, and even medical imaging.

1. What is a Convolutional Neural Network (CNN)?

CNNs are a form of neural network, but they are unique because they can automatically learn
features from raw image data. Unlike traditional machine learning models that need manual
feature extraction (like identifying edges, shapes, etc.), CNNs can do this automatically,
saving time and improving performance.

They are particularly useful for tasks that involve visual data like recognizing objects in
images. For example:

 In self-driving cars, CNNs help the car "see" and recognize objects like pedestrians,
other vehicles, and traffic signs.
 In security cameras, CNNs can identify unusual activity or people based on the
patterns they recognize.
2. Inspiration Behind CNN and How They Mimic the Human Brain

CNNs are inspired by the human visual system, specifically the way our brain processes
images. Just like how our brain looks for simple features like lines and curves and combines
them to recognize more complex objects, CNNs do the same through their layers.

 Hierarchical Structure: The human visual system works in layers—first identifying


simple features like edges and then combining them to recognize complex shapes.
CNNs work similarly by using layers to extract increasingly complex features.
 Local Connectivity: Just as we focus on parts of an image, CNNs only focus on small
parts at a time using filters (also called kernels).
 Translation Invariance: The brain can recognize an object even if it’s moved or
rotated. CNNs do the same through a pooling process that makes the network less
sensitive to the position of features in the image.

3. Key Components of a CNN

A CNN consists of four key components:

1. Convolutional Layers:
o These layers apply filters (small grids of numbers) to the input image, looking
for specific features like edges, shapes, or textures.
o For example, in recognizing a digit like '5', one filter might look for straight
lines, another for curves, etc.
o This process helps the network understand patterns in the image.
2. ReLU Activation Function:
o After convolution, the network uses an activation function called ReLU
(Rectified Linear Unit) to add non-linearity. This helps the network learn
more complex patterns. It also speeds up learning by avoiding problems like
the vanishing gradient (a problem where learning slows down significantly).
3. Pooling Layers:
o Pooling helps reduce the size of the image (feature map), making the network
more efficient and faster.
o Max pooling is commonly used, where the highest value in a grid of pixels is
selected to represent that part of the image. This step reduces the amount of
data the network needs to process.
4. Fully Connected Layers:
o After pooling, the data is flattened (converted into a one-dimensional vector)
and passed through fully connected layers. These layers make the final
prediction, such as classifying the image into a category (e.g., a '5' in digit
recognition).
o The final layer often uses a Softmax function to predict the probabilities of
each class.

Overfitting happens when a model learns the training data too well, including the noise or
random patterns that don’t generalize well to new data. This results in poor performance on
new, unseen data.

5. Practical Applications of CNNs


 Image Classification: CNNs classify images into categories. For example,
recognizing if an image contains a dog or a cat.
 Object Detection: CNNs identify multiple objects in an image and locate them. For
example, in autonomous driving, CNNs help detect pedestrians or other cars.
 Facial Recognition: CNNs are used in security systems for recognizing faces and
controlling access.
 Medical Imaging: Detecting diseases from X-rays, MRIs, or CT scans.

Invariance in CNNs

Invariance refers to the property of a model that allows it to recognize an object or pattern in an
image regardless of certain transformations, such as translation (movement), rotation, or scaling. In
other words, if an object in an image is shifted, rotated, or scaled, the CNN should still be able to
identify it correctly.

Types of Invariance in CNNs:

 Translation Invariance:

o Translation invariance means that if an object is shifted (translated) in the image, the
CNN can still detect it. For example, if a car is located at different positions in an
image, the CNN should still be able to identify it as a car, no matter where it appears
in the image.

 Rotation Invariance:

o Rotation invariance means that the CNN can recognize an object even if it is rotated
at different angles.

 Scale Invariance:

o Scale invariance refers to the ability of the CNN to detect objects regardless of their
size (whether the object is small or large in the image).

 Color Invariance:

o This refers to the network's ability to recognize objects regardless of changes in color
(e.g., a red car vs. a blue car).

 Stability in CNNs
 Stability in CNNs refers to how well the network can handle small changes or
disturbances in the input data. A stable model will give similar outputs when the
inputs are almost the same, even if there are slight changes or noise. This means the
network is robust and not easily affected by small alterations in the data.

Variability Models in CNNs


Variability models are used to represent and handle the differences (variations) in the data
that can occur due to various factors, such as changes in shape, noise, and randomness. Two
common types of variability models are the Deformation Model and the Stochastic Model.
These models help the network learn to be more flexible and recognize objects under
different conditions.
1. Deformation Model
The Deformation Model focuses on handling changes in the shape or structure of an object
due to transformations like rotation, scaling, and shifting. In this model, the main goal is to
understand how objects can deform (change their shape) while still being recognized as the
same object.
Deformation Model (Short Version)
1. Shape Variability: Objects can look different based on their position, but the model
still recognizes them.
Example: A car looks different from the front or side, but it’s still recognized as a car.
2. Geometric Transformation: The model recognizes objects even if they stretch,
shrink, or bend.
Example: A face can be recognized from any angle.
3. Application: Helps recognize objects, faces, or handwriting in various poses or
expressions.
Example: A smiling or frowning person is still recognized in facial recognition.
.Stochastic Model
Stochastic models are powerful tools in machine learning, particularly when dealing
with uncertainty, randomness, and noise in data. These models introduce randomness
into the process, helping the model adapt to real-world situations where data can be
unpredictable.
Key Concepts of Stochastic Models
1. Randomness: Randomness: Stochastic models use random elements to handle
uncertainties like noise or data errors.
2. Noise Handling: These models are designed to deal with noise (random variations,
sensor errors), making them more robust and preventing overfitting to specific details
in the data.
3. Probabilistic Learning: These models estimate probabilities instead of fixed outputs,
helping handle uncertainty in predictions.

Scattering Networks in CNN (Convolutional Neural Networks)


A scattering network is a type of deep learning model that is designed to capture
hierarchical patterns in data, such as images, while requiring minimal training. It is
commonly used as an initial step before moving to more complex and advanced
Convolutional Neural Networks (CNNs).
Basic Concept:
 Scattering refers to the process of spreading or distributing information across
various layers of a neural network.
 In a scattering network, the input data (such as an image) is passed through multiple
layers. Each layer applies wavelet transforms and non-linear operations to the data.
The main goal of this process is to preserve important features, such as edges and
textures, while simultaneously reducing unnecessary details or noise in the data.
Working of Scattering Networks in CNNs:
 Input Data:
o Scattering networks take in raw input data, such as an image or a signal.
 Wavelet Transform:
o The network applies a series of wavelet transforms to the input. These
transforms break down the data into different frequency bands (high, low, and
intermediate frequencies).
o This helps capture important details at different scales (like edges, textures).
 Non-linear Activations:
o After the wavelet transform, non-linear activations like ReLU are applied to
introduce non-linearity to the model, which allows the network to capture
complex relationships in the data.
 Pooling:
o Pooling layers are applied after each scattering operation to downsample the
data, reducing the size while retaining important features.
 Scattering Layers:
o The process is repeated in multiple layers. Each layer captures progressively
abstract features from the input, such as edges, textures, and shapes.
o The transformed image is then passed through multiple layers where different
operations are applied, capturing patterns from simple to more complex.
o Each layer focuses on different frequency patterns, so the network can
understand both large and fine details in the image.
 Output:
o Finally, the network produces an output that can be used for classification,
regression, or other tasks depending on the application.

Group Formalism
Group formalism is a mathematical framework for understanding the symmetries and invariances of
data .It helps design neural networks that stay consistent under changes like rotations or
translations.

Group: A set of elements (e.g., transformations like rotations, translations) with a defined operation
(e.g., composition) that satisfies certain properties (closure, associativity, identity, inverse).
Equivariance: When the input changes in a certain way (like shifting an image), the output changes in
the same predictable way (the output shifts too).

Benefits of Using Group Formalism:

1. Better Generalization
Neural networks using group formalism are better at handling unseen data because
they understand symmetries in the input.
2. Fewer Data Requirements
Since the model already knows the transformations, it doesn’t need as much training
data to learn them.
3. Improved Interpretability
Group formalism helps reveal patterns in data and the structure of what the model
learns.

Unit – 4

Autoencoders
 Autoencoders are a type of unsupervised learning where neural networks are used for
representation learning.
 They create a "bottleneck" in the network that forces the data to be compressed,
capturing its most essential features.

 Data should have a high degree of correlation or structure.


 If the input data features are independent or uncorrelated, compression and
reconstruction become difficult.
 Be sensitive to inputs for accurate results.
 Avoid being too sensitive to prevent overfitting or memorizing data.

Structure

1. Input Layer: Takes the input data.


2. Bottleneck (Hidden Layer): A compressed representation of the input data.
3. Output Layer: Reconstructs the input data as closely as possible.

 Encoder: Compresses the input into a smaller latent space representation:h = f(x).
 Decoder: Reconstructs the input from the latent space: r = g(f(x)).

Loss Function

 Measures the difference between the input x and the reconstructed output.
 Example: Mean Squared Error (MSE):
Properties of Autoencoders:

1. Data-Specific: Work well only on data like their training set (e.g., trained on cat
images, not tree images).
2. Lossy Compression: Reconstructed outputs may lose some quality, like MP3 or
JPEG.
3. Learned Representation: Automatically learn how to compress and reconstruct data.

Comparison with PCA:

 PCA simplifies data using linear transformations.


 Autoencoders can act like PCA with a linear decoder and MSE but handle complex
patterns using nonlinear activations.

Advantages of Autoencoders over PCA:

 Can learn nonlinear relationships in data.


 Can include advanced techniques like convolutional layers and transfer learning.

Applications of Autoencoders

1. Denoising:
o Train to remove noise from data. Example: Clean noisy images.
2. Image Colorization:
o Convert black-and-white images into colored ones.
3. Watermark Removal:
o Remove watermarks from images or videos.
4. Data Compression:
o Compress data efficiently for specific types, like images or audio.

Types of Autoencoders

1. Standard Autoencoder

Overview:

 A Standard Autoencoder is the simplest form of an autoencoder.


 It learns a compressed representation (encoding) of the input data by minimizing
reconstruction error.
 The encoder compresses the input, and the decoder reconstructs it.

Limitations:

 May overfit if the latent space is too large (memorizes data instead of generalizing).
 Struggles with noise in the input data, leading to poor performance on real-world data.

2. Denoising Autoencoder

 A Denoising Autoencoder (DAE) is designed to handle noisy data.


 It trains on corrupted inputs but learns to reconstruct the clean version of the input.
 Introduced to improve robustness and force the network to learn better features.

Benefits:

 Forces the autoencoder to focus on robust, generalizable features.


 Improves performance on noisy, real-world data.
 Useful for tasks like image denoising or recovering missing information.

3 Contractive Autoencoder (CAE):

 Adds a penalty to ensure learned representations are robust to small changes in input
data.
 Focuses on stability by reducing sensitivity to minor input variations.

Benefits:

 Produces a stable latent space .


 Helps extract robust features useful for tasks like clustering or semi-supervised
learning.

Regularization Techniques to Prevent Overfitting

1. Bottleneck Layer:
o Reduces the network's capacity by limiting the latent space.
2. Denoising:
o Trains the model to reconstruct clean data from noisy input.
3. Contractive Penalty:
o Penalizes large changes in activations for small changes in input.

Advanced Autoencoder Techniques

1. Denoising vs. Contractive Autoencoders:


o Denoising:
 Simple to implement.
 Adds noise to input data during training to improve robustness.
o Contractive:
 Uses deterministic gradients for stability.
 Requires calculating the Jacobian matrix of the hidden layer, making it
computationally intensive.
2. Regularization Methods:
o L1 Regularization: Adds a penalty for large activations, encouraging
sparsity.
o KL Divergence: Penalizes the difference between actual and desired
distributions, often used in variational autoencoders.

1. Comparison Table

Denoising
Feature Standard Autoencoder Contractive Autoencoder
Autoencoder
Input Data Clean Noisy Clean
Noise Handling Poor Excellent Not specifically for noise
Explicit (contractive
Regularization None Implicit via noise
penalty)
Robust feature
Focus Reconstruction Stability in encoding
extraction
Compression, Feature Denoising, Robust Clustering, Semi-
Applications
Extraction Features Supervised Learning

Variational Autoencoders (VAEs)

 Variational Autoencoders (VAEs) are a type of generative model that combines the concepts
of probabilistic modeling and neural networks.

 They extend the standard autoencoder by introducing a probabilistic approach to latent


space representation.

 VAEs are used to generate new data samples similar to the training data, making them
widely applicable in tasks like image generation, anomaly detection, and drug discovery.

Key Differences from Standard Autoencoders

1. Standard Autoencoders aim to learn a deterministic mapping of input to a latent space and
back.

2. VAEs instead learn a probabilistic mapping, treating the latent space as a probability
distribution, enabling:

o Sampling from the latent space.

o Generating diverse outputs.

Structure of a VAE

1. Encoder:

o Maps input to a latent space by generating mean (μ) and variance (σ^2).

2. Latent Space:

o A probabilistic representation of the input data.

o Latent vector is sampled using the reparameterization trick.

3. Decoder:
o Reconstructs the input from the sampled latent vector .

Advantages of VAEs

1. Generative Capability:

o VAEs can generate new data by sampling from the latent space.

2. Smooth Latent Space:

o Similar inputs map to nearby points in the latent space, making interpolation and
exploration possible.

3. Probabilistic Framework:

o Provides a well-defined probabilistic interpretation of the latent space.

Limitations of VAEs

1. Blurred Outputs:

o Generated outputs may be less sharp compared to other generative models like
GANs.

2. KL Divergence Trade-off:

o Balancing reconstruction loss and KL divergence can be challenging.

3. Computational Complexity:

o Training VAEs can be computationally expensive due to the probabilistic framework

Comparison with Regular Autoencoders


Feature Autoencoder Variational Autoencoder

Output Deterministic Probabilistic

Latent Space Fixed representation Probabilistic distribution

Generative No Yes

Loss Function Reconstruction loss Reconstruction + KL Divergence

Applications Compression, denoising Generative tasks

Generative Adversarial Networks (GANs)

 Generative Adversarial Networks (GANs) are a type of generative model designed to create
new data samples that resemble the training data.

 They consist of two neural networks: the Generator and the Discriminator, which are trained
simultaneously in an adversarial manner.

Key Components
1. Generator (G):

o Purpose: To generate data that looks similar to the training data.

o Input: Random noise vector (e.g., sampled from a Gaussian or uniform distribution).

o Output: A synthetic data sample .

2. Discriminator (D):

o Purpose: To distinguish between real data samples (from the dataset) and fake data
samples (produced by the generator).

o Input: Either a real data sample or a generated sample .

o Output: A probability indicating whether the input is real or fake.

3. Adversarial Training:

o The generator and discriminator compete in a zero-sum game:

 The Generator tries to create data that the Discriminator cannot distinguish
from real data.

 The Discriminator tries to correctly identify real vs. fake data.

Advantages of GANs

1. High-Quality Outputs:

o Can produce realistic and detailed samples.

2. Flexibility:

o Applicable to a wide range of generative tasks.

3. Unsupervised Learning:

o Can learn without explicit labels, making them useful for unstructured data.

Disadvantages of GANs

1. Training Complexity:

o Requires careful balancing of generator and discriminator.

2. Mode Collapse:

o Limited diversity in generated samples.

3. Sensitivity to Hyperparameters:

o GANs require careful tuning of parameters like learning rate, architecture, and loss
function.

Maximum Entropy Distributions


The principle of maximum entropy helps us choose the most "unbiased" probability distribution
when we only know a few facts (constraints) about a situation. It ensures that we don't assume
anything extra about the data that isn't given. This principle is particularly useful in cases where we
don't have enough information to determine the exact distribution of data.

Entropy measures the uncertainty or randomness of the distribution. The principle of maximum
entropy states that among all possible distributions that satisfy the given constraints, we should
choose the one with the highest entropy.

You might also like