0% found this document useful (0 votes)
19 views20 pages

Introduction to Deep Learning Concepts

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views20 pages

Introduction to Deep Learning Concepts

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit II

Introduction to Deep Learning


Deep Learning (DL) is a subset of Machine Learning (ML) that uses
artificial neural networks (ANNs) to model and solve complex
problems. It is particularly useful when dealing with large datasets
and high-dimensional data like images, audio, and text.
Key Points:
• Inspired by the human brain: DL models are made of layers of
neurons that mimic the brain’s neural network.
• Automatic feature extraction: Unlike traditional ML, DL can
learn features from raw data without manual feature
engineering.
• Applications:
o Image recognition (e.g., detecting cats/dogs in photos)
o Speech recognition (e.g., Siri, Alexa)
o Natural language processing (e.g., ChatGPT, translation)
o Self-driving cars (detecting lanes, obstacles)
Flow of Deep Learning:
Input Data → Neural Network (Layers) → Features automatically
learned → Prediction/Output

Why Deep Learning?


• Can handle complex and unstructured data (images, video,
audio).
• Scales well with large amounts of data.
• Higher accuracy compared to traditional ML for complex tasks.
• Learns hierarchical representations of data (low-level to high-
level features).

Deep Learning Architectures


Deep Learning architectures are different types of neural networks
designed for specific tasks.
A. Feedforward Neural Network (FNN) / Multi-Layer Perceptron
(MLP)
• Structure:
o Input Layer → Hidden Layers → Output Layer
o Neurons in each layer are fully connected to the next
layer.
• How it works: Data flows forward from input to output.
• Use cases: Tabular data, basic classification, regression.
• Limitations: Cannot capture spatial or sequential patterns well
(like images or time series).

B. Convolutional Neural Network (CNN)


• Specialized for images and spatial data.
• Key layers:
o Convolutional Layer: Extracts features like edges,
textures.
o Pooling Layer: Reduces dimensionality (max pooling,
average pooling).
o Fully Connected Layer: For classification or final output.
• Use cases: Image classification, object detection, face
recognition.
Example Flow:
Image → Conv Layer → ReLU → Pooling → Flatten → Fully Connected
→ Output

C. Recurrent Neural Network (RNN)


• Specialized for sequential data (time series, text).
• Characteristic: Has memory — it uses previous information to
influence current output.
• Problems: Vanishing gradient for long sequences.
• Variants:
o LSTM (Long Short-Term Memory) → Solves long-term
dependency issues.
o GRU (Gated Recurrent Unit) → Simpler and faster than
LSTM.
• Use cases: Language modeling, speech recognition, stock
prediction.

D. Autoencoder
• Purpose: Unsupervised learning for dimensionality reduction
or data compression.
• Structure: Input → Encoder → Bottleneck → Decoder →
Reconstructed Output
• Use cases: Image denoising, anomaly detection, feature
learning.
Machine Learning vs Deep Learning :
Feature Machine Learning (ML) Deep Learning (DL)

Neural networks with


Algorithms learn from
Definition many layers learn
data to make predictions
features automatically

Data Needed Small to medium Large datasets

Feature Manual (you decide Automatic (network finds


Engineering features) features)

Images, videos, audio,


Best For Structured/tabular data
text

Speed Faster Slower, needs GPU

Very complex (deep


Complexity Simple models
neural networks)

Accuracy Good for simple tasks High for complex tasks

Interpretation Easy to understand Hard (“black box”)

Representation Learning (RL) :


Definition:
Representation Learning is a technique in Machine Learning / Deep
Learning where the model automatically discovers useful features
or representations from raw data instead of relying on manually
crafted features.
What it Means
Representation Learning means —
the computer learns to understand data by itself.
It finds what features are important instead of you telling it.
Example:
If you give a lot of cat and dog photos —
• You don’t tell the computer “look at ears or tails.”
• It learns by itself what makes a cat or dog different.
That’s representation learning — learning how to represent data
automatically.

Why It’s Needed


In old Machine Learning:
You had to do feature engineering — manually find features like
“color”, “shape”, etc.
In Deep Learning:
The model itself learns features — from simple to complex, layer by
layer.

How It Works (Step by Step)


Imagine an image going through a Deep Neural Network (like CNN):
First layer: learns edges and corners
Next layer: learns shapes like eyes or wheels
Final layers: learn full objects like faces or cars
So, each layer builds better and smarter representations of data.

Types of Representation Learning


Type Description Example

Learns from labeled


Supervised CNN classifying cats/dogs
data

Learns from unlabeled Autoencoder compressing


Unsupervised
data images

Self- Learns using its own BERT predicting missing


Supervised signals words

Advantages
No need to manually design features
Works on images, text, speech
Learns useful patterns automatically
Improves accuracy and generalization
Helps in Transfer Learning (use knowledge from one task to
another)

Width vs Depth of Neural Networks (Simple Version)

Width
• Means how many neurons are in a single layer
• Wide network = more neurons in a layer
• Learns more features at once
• Too wide → might overfit
Example:
• 1 hidden layer with 100 neurons → wide
• 1 hidden layer with 10 neurons → narrow

Depth
• Means how many hidden layers the network has
• Deep network = more layers
• Learns complex patterns step by step
• Too deep → hard to train (vanishing gradient)
Example:
• 3 hidden layers → deep
• 1 hidden layer → shallow

Quick Comparison Table

Feature Width Depth

What it is Neurons per layer Number of layers

Learns many features at


Strength Learns complex patterns
once

Weakness Can overfit Harder to train

Complex data (images,


Best For Simple data
text)

Memory Tip:
• Width = fat layer (more neurons)
• Depth = tall network (more layers)
Activation Functions in Neural Networks :
Definition:
An activation function decides whether a neuron should be
activated or not, introducing non-linearity into the network. Without
it, neural networks would just be linear models, no matter how many
layers they have.

ReLU (Rectified Linear Unit)


Formula:
𝑓(𝑥) = max⁡(0, 𝑥)

Meaning:
• If input > 0 → output = input
• If input ≤ 0 → output = 0
Graph:
• Straight line for x > 0
• Flat at 0 for x ≤ 0
Pros:
• Simple and fast to compute
• Helps avoid vanishing gradient problem in deep networks
• Works well in practice for CNNs and many deep networks
Cons:
• “Dying ReLU” problem: Neurons can get stuck at 0 and stop
learning if inputs are always negative
Leaky ReLU (LReLU)
Formula:
𝑥 if 𝑥 > 0
𝑓(𝑥) = {
𝛼𝑥 if 𝑥 ≤ 0

• Typically, α = 0.01
Meaning:
• Positive inputs → output = input
• Negative inputs → output = small negative value (not zero)
Graph:
• Slight slope for negative inputs (instead of flat)
Pros:
• Solves dying ReLU problem
• Allows gradient to flow even for negative inputs
Cons:
• Slightly more complex than ReLU
• α is hyperparameter that needs tuning

ELU (Exponential Linear Unit)


Formula:
𝑥 if 𝑥 > 0
𝑓(𝑥) = {
𝛼(𝑒 𝑥 − 1) if 𝑥 ≤ 0

• α is usually 1.0
Meaning:
• Positive inputs → output = input
• Negative inputs → output = smooth exponential curve
approaching -α
Graph:
• Smooth, continuous curve for negative values
• Linear for positive values
Pros:
• Helps vanishing gradient problem
• Smooth output for negative inputs → better learning
• Can converge faster than ReLU
Cons:
• More computationally expensive than ReLU/LReLU
• Slightly more complex to implement

Unsupervised Training of Neural Networks

What is Unsupervised Training?


Definition:
Unsupervised training means training a neural network without
labeled data.
• The network tries to find patterns, structures, or
representations in the input data by itself.
• There is no “correct output” provided.
Key Idea:
The network learns relationships, clusters, or features directly from
the data.

Why Use Unsupervised Training?


• Labeled data is expensive or hard to get
• Helps the network discover hidden structures
• Useful for feature learning, clustering, dimensionality
reduction
Applications:
• Autoencoders: Compress and reconstruct data → feature
extraction
• Clustering Networks: Organize similar data points together
• Generative Models (GANs): Learn to generate new data similar
to training data

Step-by-Step Working
Let’s break down how it actually happens
Step 1: Input Data
You give the network raw data (like images, sounds, or text) — but
no labels.
Example:
• Images of cats and dogs are given
• The network does not know which is which

Step 2: Feature Extraction / Encoding


The neural network tries to capture patterns in the data — for
example:
• Which pixels are similar
• What shapes or textures repeat
• What parts of data are common
This is usually done by an Encoder network (in Autoencoders) or
hidden layers that learn compressed information.

Step 3: Representation Learning


The network converts input into a latent representation — a
compact form that captures important features.
Think of it as a “summary” of the input.
Example:
Instead of remembering every pixel of a face image, it learns:
• Shape of face
• Eyes position
• Mouth curve

Step 4: Reconstruction or Similarity Task


The network then tries to recreate the input or find patterns from
the learned representation.
There are 3 main methods:
1. Autoencoders:
o Network encodes → decodes → compares output to input
o Learns by reducing reconstruction error
Loss =∣∣ 𝐼𝑛𝑝𝑢𝑡 − 𝑂𝑢𝑡𝑝𝑢𝑡 ∣∣2

2. Clustering Networks (like SOMs):


o Neurons organize themselves into groups based on similar
inputs
3. Generative Models (like GANs):
o Network learns to generate new data similar to input data

Step 5: Weight Update (Learning Process)


The network still uses backpropagation, but instead of a “label-based
loss,” it uses:
• Reconstruction loss (for autoencoders)
• Distribution loss (for GANs)
• Similarity measure (for clustering)
Weights are updated to minimize these losses → so the network gets
better at representing the input structure.

Example: Autoencoder Working


Here’s how it works practically
1. Input: Image (say, a handwritten digit)
2. Encoder: Compresses image → smaller feature vector
3. Latent space: Stores key patterns of the digit
4. Decoder: Rebuilds the original image from the compressed
version
5. Loss: Difference between input and output (MSE)
6. Backpropagation: Updates weights to minimize this difference
Real-Life Applications
• Image compression and reconstruction
• Feature extraction for other models
• Anomaly detection
• Clustering and pattern discovery
• Pretraining models before supervised learning

Restricted Boltzmann Machines (RBMs)

What is an RBM? :
Definition:
A Restricted Boltzmann Machine (RBM) is a type of unsupervised
neural network that learns patterns in data and can represent
complex probability distributions.
• It’s called “restricted” because connections only exist between
layers, not within a layer.
• It’s a stochastic neural network (neurons have probabilities,
not fixed outputs).
Use: Mainly for feature learning, dimensionality reduction, and
pretraining deep networks.

Structure of RBM
RBM has two layers:
1. Visible Layer (v):
o Represents the input data
o Example: Pixels of an image
2. Hidden Layer (h):
o Learns features or patterns from visible layer
Key Point:
• No connections between neurons within the same layer (this
is why it’s “restricted”)
• All visible neurons connect to all hidden neurons
Diagram (simple view):
Visible Layer: v1 v2 v3 ... vn
⬇ ⬇ ⬇
Hidden Layer: h1 h2 h3 ... hm

Step-by-Step Working
Step 1: Input Data → Visible Layer
• Feed the raw input data into the visible layer
• Example: An image of a handwritten digit

Step 2: Activate Hidden Layer Probabilistically


• Each hidden neuron computes weighted sum of inputs + bias

𝑝(ℎ𝑗 = 1 ∣ 𝑣) = 𝜎(∑ 𝑤𝑖𝑗 𝑣𝑖 + 𝑏𝑗 )


𝑖

• σ = sigmoid function → outputs probability of neuron being ON


• Hidden neurons turn ON or OFF stochastically based on this
probability
Step 3: Reconstruct the Input
• Using the activated hidden neurons, the network reconstructs
the visible layer

𝑝(𝑣𝑖 = 1 ∣ ℎ) = 𝜎(∑ 𝑤𝑖𝑗 ℎ𝑗 + 𝑎𝑖 )


𝑗

• This gives a reconstructed input 𝑣 ′


• Idea: Network tries to reproduce the input from the hidden
representation

Step 4: Compute Reconstruction Error


• Compare original input (v) with reconstructed input (v')
Error = 𝑣 − 𝑣 ′

• This tells the network how well it has captured the features

Step 5: Update Weights (Learning)


• Weights and biases are updated to minimize reconstruction
error
• Common algorithm: Contrastive Divergence (CD)
o Approximate method → fast and works well
o Updates weights iteratively:
Δ𝑤𝑖𝑗 = 𝜂(⟨𝑣𝑖 ℎ𝑗 ⟩𝑑𝑎𝑡𝑎 − ⟨𝑣𝑖 ℎ𝑗 ⟩𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 )
• Repeat steps 1–5 for many epochs until the network learns the
patterns.
Mathematical Idea
• Energy Function: Measures how good a configuration of
neurons is

𝐸(𝑣, ℎ) = − ∑ 𝑎𝑖 𝑣𝑖 − ∑ 𝑏𝑗 ℎ𝑗 − ∑ ∑ 𝑣𝑖 𝑤𝑖𝑗 ℎ𝑗
𝑖 𝑗 𝑖 𝑗

Where:
• 𝑣𝑖 = visible neuron
• ℎ𝑗 = hidden neuron
• 𝑤𝑖𝑗 = weight between visible and hidden neuron
• 𝑎𝑖 , 𝑏𝑗 = biases
• The network learns weights (w) that minimize energy → better
feature representation

Autoencoders (AEs)

What is an Autoencoder?
Definition:
An Autoencoder is a type of unsupervised neural network that
learns to compress data and then reconstruct it back as accurately as
possible.
• Input → Encoder → Latent space → Decoder → Output
• Goal: Output ≈ Input
Key Idea:
Autoencoders learn a compact representation (features) of the input
data automatically.

Structure of an Autoencoder
1. Input Layer: Raw data
2. Encoder: Compresses input into a smaller latent
representation
3. Latent Space / Bottleneck: Stores the compressed features
4. Decoder: Reconstructs the input from the latent representation
5. Output Layer: Reconstructed data
Diagram (simplified):
Input ---> [Encoder] ---> Latent Space ---> [Decoder] ---> Output

How Autoencoders Work (Step-by-Step)


Step 1: Feed Input
• Raw data (image, text, audio) is fed into the input layer
Step 2: Encode
• Encoder compresses input into latent features
• Reduces dimensionality while preserving important info
Step 3: Decode
• Decoder reconstructs the original input from latent features
Step 4: Calculate Loss
• Compare reconstructed output with original input
• Loss function: Mean Squared Error (MSE) or Binary Cross-
Entropy
𝐿𝑜𝑠𝑠 =∣∣ 𝐼𝑛𝑝𝑢𝑡 − 𝑂𝑢𝑡𝑝𝑢𝑡 ∣∣2

Step 5: Update Weights


• Backpropagation adjusts weights in encoder + decoder to
minimize reconstruction error
Step 6: Repeat
• Repeat for many epochs → network learns best representation
of input data

Types of Autoencoders
1. Undercomplete AE
2. Sparse AE
3. Denoising AE
4. Variational AE (VAE)

You might also like