0% found this document useful (0 votes)

38 views17 pages

Understanding Deep Learning Basics

Deep learning is a subset of artificial intelligence that utilizes neural networks to process information similarly to the human brain, enabling the identification of complex patterns in data. It involves multiple layers of artificial neurons, with processes like forward and backward propagation to train models, and employs various techniques such as activation functions, loss functions, and regularization to optimize performance. Key concepts include the architecture of neural networks, gradient descent methods, and evaluation metrics for assessing model accuracy and generalization.

Uploaded by

steven royal son

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views17 pages

Understanding Deep Learning Basics

Uploaded by

steven royal son

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DEEP LEARNING

Deep learning is a powerful branch of artificial intelligence that mimics the way the human brain
processes information. At its core, deep learning uses structures called neural networks, which are
inspired by the biological neurons in our brain. Just as our brain is made up of billions of
interconnected neurons that fire signals to recognize patterns—like a face or a voice—deep learning
models are made up of layers of artificial neurons that learn to identify patterns in data. These models
are capable of learning directly from raw inputs, gradually transforming them into meaningful
representations through a series of processing layers. Over time, as the model is trained on data, it
strengthens the connections between its artificial neurons, much like how the brain strengthens
synapses when we learn something new. This ability to learn hierarchical features makes deep learning
especially powerful in solving complex problems such as image recognition, language translation, and
speech understanding. The "deep" in deep learning refers to the multiple layers through which data
passes and gets refined, leading to highly accurate outcomes. This biologically inspired design has
allowed machines to learn and adapt in ways that were previously thought to be unique to humans.
1. Neural Network Architecture
 Input Layer:

o Purpose: This is the entry point for data into the network. It does not perform any
computation.

o Structure: It consists of one neuron for each feature in the input dataset. For example, a
dataset with 10 features would have an input layer with 10 neurons. Its role is to pass the
initial data to the first hidden layer.

 Hidden Layers:

o Purpose: These are the computational engines of the network, responsible for learning the
complex patterns in the data. They extract increasingly abstract features from the input as
data flows through them.

o Structure: A network can have one or more hidden layers. The number of layers (depth)
and the number of neurons per layer (width) define the model's capacity. Each neuron in a
hidden layer is connected to all neurons in the previous layer.

 Output Layer:

o Purpose: This layer produces the final prediction of the model.

o Structure: The number of neurons and the activation function in the output layer are
determined by the task. For regression, it's typically one neuron with a linear activation.
For binary classification, one neuron with a sigmoid activation. For multi-class
classification, it's one neuron per class with a softmax activation.

 Role of Weights and Biases:

o Weights: These are learnable parameters that represent the strength of the connection
between neurons. A higher weight means a stronger influence from the input neuron.
During training, the network adjusts these weights to minimize the prediction error.

o Biases: A bias is another learnable parameter associated with each neuron (except in the
input layer). It allows for shifting the activation function to the left or right, providing the
model with more flexibility to fit the data. It essentially provides a trainable constant to the
neuron's input.
2. Forward Propagation
 Conceptual Flow: Forward propagation is the process of passing input data through the
network, layer by layer, to generate an output. It is a sequence of linear transformations followed
by non-linear activations.

 Matrix Operations:

o At each layer, the process starts with a matrix multiplication. The input vector (or the
output from the previous layer) is multiplied by the weight matrix of the current layer. This
operation scales and combines the inputs.

o This dot product effectively computes a weighted sum of the inputs for each neuron in the
current layer.

 Pre-activation and Activation Values:

o Pre-activation (Z): This is the intermediate value calculated for each neuron. It is the
result of the weighted sum of inputs from the previous layer plus the neuron's bias term. It
represents the linear part of the neuron's computation.

o Activation (A): This is the final output of the neuron. It is obtained by applying a non-
linear activation function to the pre-activation value (Z). This non-linearity is crucial, as it
allows the network to learn complex, non-linear relationships in the data. The activation
value (A) is then passed as input to the next layer.
3. Activation Functions
 Purpose: To introduce non-linearity into the model. Without them, a neural network would
just be a series of linear transformations, equivalent to a single linear model.

 Common Functions & Use Cases:

o Sigmoid: Compresses any input into a range between 0 and 1. Historically used in hidden
layers but now primarily used in the output layer for binary classification tasks. Its
derivative is small for high or low inputs, leading to the vanishing gradient problem.

o Tanh (Hyperbolic Tangent): Compresses input into a range between -1 and 1. It is

zero-centered, which can help in learning. Like Sigmoid, it suffers from the vanishing
gradient problem in its saturated regions.

o ReLU (Rectified Linear Unit): Outputs the input directly if it is positive, and zero
otherwise. It is the most common activation for hidden layers due to its
computational efficiency and its ability to mitigate the vanishing gradient problem for
positive inputs. Its derivative is either 0 or 1.

o Leaky ReLU: A variant of ReLU that allows a small, non-zero gradient when the unit is
not active (i.e., for negative inputs). This helps to prevent "dying ReLU" neurons, where
neurons get stuck in a state where they always output zero.

o Softmax: Used exclusively in the output layer for multi-class classification. It

converts a vector of raw scores (logits) into a probability distribution, where each value is
between 0 and 1, and the sum of all values is 1.
 Gradient Issues:

o Vanishing Gradients: Occurs when gradients become extremely small as they are
propagated backward through the network. This is common with saturating functions like
Sigmoid and Tanh, effectively halting learning in earlier layers.

o Exploding Gradients: The opposite problem, where gradients become excessively large,
leading to unstable training. This is less common but can occur with certain weight
initializations or architectures. ReLU helps with vanishing gradients but can contribute to
exploding gradients if not managed.

4. Loss Functions
 Purpose: A loss function (or cost function) quantifies the difference between the model's
predicted output and the actual target values. The goal of training is to minimize this function.

 Regression Loss Functions:

o Mean Squared Error (MSE): Calculates the average of the squared differences between
predicted and actual values. It penalizes large errors more heavily due to the
squaring operation. It is the default choice for many regression problems.

o Mean Absolute Error (MAE): Calculates the average of the absolute differences between
predicted and actual values. It is less sensitive to outliers than MSE and provides a
more direct interpretation of the average error magnitude.

 Classification Loss Functions:

o Binary Cross-Entropy: Used for binary (two-class) classification problems. It

measures the dissimilarity between the predicted probability and the true binary label. It
works best when the output layer has a single sigmoid neuron.

o Categorical Cross-Entropy: Used for multi-class classification problems. It compares

the model's predicted probability distribution (from a softmax output layer) with the true
distribution (which is typically one-hot encoded).
5. Backward Propagation
 Core Concept: This is the algorithm used to train the network by calculating the gradients of
the loss function with respect to each weight and bias. It propagates the error signal backward
from the output layer to the input layer.

 Chain Rule and Gradient Flow:

o Backward propagation relies on the chain rule from calculus to compute gradients
efficiently.

o It first calculates the gradient of the loss with respect to the output of the final layer.

o Then, it iteratively moves backward, layer by layer, calculating the gradient of the loss with
respect to each layer's outputs, pre-activations, weights, and biases. The chain rule allows
reusing previously computed gradients, making the process highly efficient.

 Weight Update Process:

o Once the gradients for all weights and biases are computed, they are used to update the
parameters.

o The update rule involves subtracting a fraction of the gradient (determined by the learning
rate) from the current parameter value. This moves the parameter in the direction that
most steeply decreases the loss.

6. Gradient Descent & Its Variants

Gradient Descent: Gradient Descent is a method to help a Deep learning model learn by slowly
adjusting its settings (weights) to make fewer mistakes over time.

 Batch Gradient Descent:

o Process: Computes the gradient of the loss function using the entire training dataset for a
single weight update.

o Pros: Produces a stable and direct convergence path.

o Cons: Extremely slow and memory-intensive for large datasets. Not practical for most deep
learning applications.

 Stochastic Gradient Descent (SGD):

o Process: Updates the model's weights after processing each single training example.
o Pros: Much faster computation per update. The noisy updates can help the model escape
shallow local minima.

o Cons: The convergence path is very erratic and noisy. It may never fully converge to the
absolute minimum.

 Mini-Batch Gradient Descent:

o Process: A compromise between the two extremes. It updates the weights after processing
a small batch (e.g., 32, 64, or 128 examples) of training data.

o Pros: Offers the best of both worlds: it's computationally efficient and provides a more
stable convergence than SGD. It's the standard method used in deep learning.

⚙️ How It Works (Step-by-Step):

1. Start with random weights

2. Make a prediction
3. Compare it with the actual value (calculate loss)
4. Compute the gradient (i.e., how much error changes with respect to weights)
5. Update the weights in the direction that reduces the error
6. Repeat until the error is very small (or we reach max iterations)
7. Training, Validation, and Testing Sets
 Dataset Splits Explained:

o Training Set: The largest portion of the data, used to train the model by adjusting its
weights and biases.

o Validation Set: A separate subset used to evaluate the model's performance during
training. It helps in tuning hyperparameters (like learning rate or model architecture) and
provides a check for overfitting. The model does not learn from this data.

o Test Set: A completely unseen subset of data that is used only once, after all training and
hyperparameter tuning is complete. It provides an unbiased estimate of the final model's
performance on new, real-world data.

 Typical Ratios: Common splits include 70% for training, 15% for validation, and 15% for
testing (70/15/15), or 80/10/10. For very large datasets, the validation and test sets can be a
smaller percentage (e.g., 98/1/1).
Step-by-Step Workflow of Training a Neural Network
8. Model Evaluation Metrics
 For Regression:

o R-squared (R²): Indicates the proportion of the variance in the dependent variable that
is predictable from the independent variables. A value closer to 1 is better.

o Mean Squared Error (MSE): The average of the squared errors. Useful for penalizing
larger errors. (Preferred)

o Root Mean Squared Error (RMSE): The square root of MSE. It is in the same units as
the target variable, making it more interpretable.

o Mean Absolute Error (MAE): The average of the absolute errors. It is robust to outliers
and also in the same units as the target variable.

 For Classification:

o Accuracy: The ratio of correctly predicted instances to the total instances. Can be
misleading for imbalanced datasets. (Preferred)

o Precision: Measures the accuracy of positive predictions. Answers the question: "Of all
instances predicted as positive, how many were actually positive?"

o Recall (Sensitivity): Measures the model's ability to find all the actual positive
instances. Answers the question: "Of all actual positive instances, how many did the model
correctly identify?"

o F1-Score: The harmonic mean of Precision and Recall. It provides a single score that
balances both metrics, which is useful when there is an uneven class distribution.
9. Underfitting & Overfitting

 Identifying Underfitting (High Bias):

o The model is too simple to capture the underlying patterns in the data.

o Symptoms: Both the training loss and the validation loss are high and plateau at a high
value. The model performs poorly on both the training set and the validation set.

 Identifying Overfitting (High Variance):

o The model has learned the training data too well, including its noise, and fails to generalize
to new, unseen data.

o Symptoms: The training loss continues to decrease to a very low value, while the validation
loss starts to increase after a certain point. There is a large and growing gap between the
training and validation loss curves.

 Diagnostic Plots:

o Plotting the training loss and validation loss over epochs is the primary way to diagnose
these issues.

o Good Fit: Both curves converge to a low value, and the gap between them is minimal.

o Overfitting: The training curve goes down, while the validation curve goes down and then
starts to go up.
o Underfitting: Both curves flatten out at a high loss value.

10. Regularization Techniques

 Purpose: Techniques used to prevent overfitting by adding a penalty for model complexity to the
loss function.

 L1 Regularization (Lasso):

o Concept: Adds a penalty proportional to the absolute value of the weights.

o Effect: It can shrink some weights to exactly zero, effectively performing automatic feature
selection by removing irrelevant features from the model. This results in a "sparse" model.

 L2 Regularization (Ridge / Weight Decay):

o Concept: Adds a penalty proportional to the square of the value of the weights.

o Effect: It forces the weights to be small but rarely shrinks them to zero. It is the most
common form of regularization and is often referred to as "weight decay" because of how it
is implemented in optimizers.

 Dropout:

o Concept: During each training iteration, it randomly sets the activations of a fraction of
neurons in a layer to zero.

o Intuition: This forces the network to learn more robust features and prevents neurons
from co-adapting too much. It's like forcing the network to be redundant, so it doesn't rely
on any single neuron.

o Implementation: It is only active during training and is turned off during

evaluation/testing.
11. Training Optimization Techniques

Learning Rate Scheduling:

o Concept: A strategy to adjust the learning rate during training. A common approach is to
start with a higher learning rate for faster initial progress and then gradually decrease it to
allow the model to settle into a good minimum. Examples include step decay, exponential
decay, or adaptive methods.

Advanced Optimizers:

o Momentum: Helps accelerate SGD in the correct direction by adding a fraction of the
previous weight update to the current one. This helps to smooth out the noisy updates of
SGD.

o RMSProp: Maintains a moving average of the squared gradients for each weight and
divides the learning rate by this average. This effectively adapts the learning rate for each
parameter.

o Adam (Adaptive Moment Estimation): Combines the ideas of both Momentum and
RMSProp. It stores moving averages of both the past gradients and the past squared
gradients. It is the most widely used and recommended optimizer for deep learning.

Early Stopping:

o Concept: A form of regularization that stops training when the model's performance on the
validation set stops improving.

o Logic: Monitor the validation loss at the end of each epoch. If the validation loss does not
improve for a specified number of consecutive epochs (the "patience"), stop the training
process and save the model from the epoch with the best validation loss.
12. Model Training Lifecycle

 Key Terms:

o Epoch: One complete pass of the entire training dataset through the network.

o Batch: A small subset of the training dataset.

o Iteration: A single update of the model's weights. It corresponds to processing one batch of
data. The number of iterations in one epoch is the total number of training samples
divided by the batch size.

 Key Hyperparameters and Tuning:

o Hyperparameters: These are settings configured before training begins, such as the
learning rate, batch size, number of epochs, number of hidden layers, number of neurons
per layer, choice of activation function, and choice of optimizer.

o Tuning: The process of finding the optimal set of hyperparameters for a model. This is
typically an empirical process involving experimentation. Techniques like Grid Search,
Random Search, or more advanced methods like Bayesian Optimization are used, with
performance evaluated on the validation set.
TensorFlow and Keras: Frameworks for Deep Learning

📌 Introduction to TensorFlow

TensorFlow is an open-source deep learning framework developed by the Google Brain Team in 2015.
It is designed to facilitate the development, training, and deployment of machine learning and deep
learning models at scale.

At its core, TensorFlow uses a computational graph approach, where each operation is represented as
a node and data flows as tensors between them. This architecture enables it to run efficiently across
multiple CPUs, GPUs, and even TPUs (Tensor Processing Units), making it highly scalable for
production environments and research alike.

TensorFlow supports both low-level APIs (for maximum control and customization) and high-level
APIs (for rapid development). One of its most powerful features is automatic differentiation, which is
essential for backpropagation in neural networks.

📌 Introduction to Keras

Keras is a high-level neural network API that was initially developed by François Chollet. Since 2017, it
has been tightly integrated into TensorFlow as [Link]. It simplifies the process of building and
training deep learning models by abstracting much of the complexity of TensorFlow’s lower-level
operations.

Keras follows the principle of modularity and user-friendliness, allowing developers to construct
neural networks layer by layer using intuitive building blocks such as Dense, Conv2D, LSTM, etc.

TensorFlow vs Keras (Before & After TensorFlow 2.0)

Feature Keras (standalone, pre-TF2.0) [Link] (Keras inside TensorFlow)

Backend Theano, CNTK, or TensorFlow TensorFlow only
Performance Moderate Highly optimized with XLA, GPU/TPU
Integration External Native and seamless
Industry Usage Prototyping and research Research + Production
Why Use TensorFlow and Keras?

 Ease of Use: Keras provides simple syntax for building deep learning models.
 Powerful Back-End: TensorFlow manages low-level operations efficiently, even on large
datasets and clusters.
 Pretrained Models & Tools: [Link], [Link], and [Link] offer ready-
made models, input pipelines, and distributed training.
 Visualization: Built-in integration with TensorBoard for visualizing metrics, model graphs,
and profiling.
 Deployment Ready: TensorFlow supports serving models via TensorFlow Serving, TFLite
for mobile, and [Link] for browser environments.
 Auto-differentiation: Essential for backpropagation and gradient-based optimization.

from [Link] import Sequential

from [Link] import Dense

# Step 1: Define the model

model = Sequential([

Dense(64, activation='relu', input_shape=(10,)),

Dense(32, activation='relu'),

Dense(1, activation='sigmoid')

])

# Step 2: Compile the model

[Link](optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Step 3: Train the model

[Link](X_train, y_train, validation_data=(X_val, y_val), epochs=10, batch_size=32)

# Step 4: Evaluate or predict

[Link](X_test, y_test)

[Link](new_data)

Understanding Neural Network Architecture
No ratings yet
Understanding Neural Network Architecture
18 pages
Understanding Neural Networks Basics
No ratings yet
Understanding Neural Networks Basics
7 pages
Introduction to Neural Networks Basics
No ratings yet
Introduction to Neural Networks Basics
28 pages
Deep Learning Explained: Key Concepts & Differences
No ratings yet
Deep Learning Explained: Key Concepts & Differences
6 pages
Neural Network Calculation Overview
No ratings yet
Neural Network Calculation Overview
12 pages
Introduction to Neural Networks Basics
No ratings yet
Introduction to Neural Networks Basics
48 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
60 pages
Neural Network Fundamentals Explained
No ratings yet
Neural Network Fundamentals Explained
19 pages
ML Chapter 3
No ratings yet
ML Chapter 3
29 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
11 pages
Neural Network Architecture Explained
No ratings yet
Neural Network Architecture Explained
17 pages
Understanding Neural Networks Basics
No ratings yet
Understanding Neural Networks Basics
10 pages
Machine Learning and Neural Networks Overview
No ratings yet
Machine Learning and Neural Networks Overview
45 pages
Deep Learning with ANNs in Python
No ratings yet
Deep Learning with ANNs in Python
30 pages
Introduction to Artificial Neural Networks
100% (1)
Introduction to Artificial Neural Networks
19 pages
Deep Learning Unit I Notes: FFNN & GD
No ratings yet
Deep Learning Unit I Notes: FFNN & GD
9 pages
Machine Learning: Neural Networks Overview
No ratings yet
Machine Learning: Neural Networks Overview
19 pages
Introduction to Neural Networks
No ratings yet
Introduction to Neural Networks
34 pages
Understanding Perceptrons and Deep Learning
No ratings yet
Understanding Perceptrons and Deep Learning
23 pages
Feedforward Neural Network
No ratings yet
Feedforward Neural Network
16 pages
Neural Network Architectures Explained
No ratings yet
Neural Network Architectures Explained
18 pages
Deep Learning Key Concepts and Errors
No ratings yet
Deep Learning Key Concepts and Errors
18 pages
Deep Learning Fundamentals and Techniques
No ratings yet
Deep Learning Fundamentals and Techniques
56 pages
Comprehensive Guide to Neural Networks
No ratings yet
Comprehensive Guide to Neural Networks
3 pages
Understanding Perceptron Structure and Flow
No ratings yet
Understanding Perceptron Structure and Flow
21 pages
Ai Lab4
No ratings yet
Ai Lab4
8 pages
Multi-Layer Perceptron Overview and Learning
No ratings yet
Multi-Layer Perceptron Overview and Learning
39 pages
Deep Learning Basics and Applications
No ratings yet
Deep Learning Basics and Applications
23 pages
DL@UNIT2
No ratings yet
DL@UNIT2
16 pages
Understanding Perceptrons and MLPs
No ratings yet
Understanding Perceptrons and MLPs
13 pages
Neural Networks: Perceptrons & MLPs
No ratings yet
Neural Networks: Perceptrons & MLPs
34 pages
Neural Networks: Dense Structures Explained
No ratings yet
Neural Networks: Dense Structures Explained
20 pages
Deep Learning
No ratings yet
Deep Learning
40 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
54 pages
Neural Networks: Classification & Training
No ratings yet
Neural Networks: Classification & Training
46 pages
Deep Learning Ecosystem
No ratings yet
Deep Learning Ecosystem
73 pages
Understanding Deep Feedforward Networks
No ratings yet
Understanding Deep Feedforward Networks
5 pages
Deep Learning Fundamentals Explained
No ratings yet
Deep Learning Fundamentals Explained
3 pages
Introduction to Neural Networks Basics
No ratings yet
Introduction to Neural Networks Basics
45 pages
Deep Learning Fundamentals Explained
No ratings yet
Deep Learning Fundamentals Explained
50 pages
Understanding Deep Learning Basics
No ratings yet
Understanding Deep Learning Basics
22 pages
Neural Network Basics and Functions
No ratings yet
Neural Network Basics and Functions
25 pages
Neural Networks Overview and Applications
No ratings yet
Neural Networks Overview and Applications
17 pages
Deep Feedforward Networks Overview
No ratings yet
Deep Feedforward Networks Overview
33 pages
Introduction to Neural Networks Basics
No ratings yet
Introduction to Neural Networks Basics
45 pages
Deep Learning Study Notes for AI Course
No ratings yet
Deep Learning Study Notes for AI Course
60 pages
Gradient Descent Techniques in DNNs
No ratings yet
Gradient Descent Techniques in DNNs
56 pages
Understanding Deep Learning Concepts
No ratings yet
Understanding Deep Learning Concepts
102 pages
Understanding Neural Networks and Backpropagation
No ratings yet
Understanding Neural Networks and Backpropagation
16 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
58 pages
DL Unit 3 Important Questions and Answers PDF .. - 1
No ratings yet
DL Unit 3 Important Questions and Answers PDF .. - 1
8 pages
Multilayer Neural Networks Overview
No ratings yet
Multilayer Neural Networks Overview
24 pages
Deep Learning Course Outline
No ratings yet
Deep Learning Course Outline
63 pages
DL Micro 2
No ratings yet
DL Micro 2
16 pages
Understanding Deep Feedforward Networks
No ratings yet
Understanding Deep Feedforward Networks
47 pages
Deep Learning Fundamentals Explained
No ratings yet
Deep Learning Fundamentals Explained
8 pages
Neural Networks for Big Data Explained
No ratings yet
Neural Networks for Big Data Explained
8 pages
Bank Customer Churn Prediction Model
No ratings yet
Bank Customer Churn Prediction Model
7 pages
Deep Learning Fundamentals Overview
No ratings yet
Deep Learning Fundamentals Overview
3 pages
Online Assessment Tools for Teachers
100% (1)
Online Assessment Tools for Teachers
18 pages
Adam Smith and Free Market Philosophy
No ratings yet
Adam Smith and Free Market Philosophy
1 page
CBSE Cluster Games 2025-26 Guidelines
No ratings yet
CBSE Cluster Games 2025-26 Guidelines
10 pages
Understanding the Role of Architects
100% (1)
Understanding the Role of Architects
2 pages
School Climate and Management Insights
67% (6)
School Climate and Management Insights
6 pages
Work Immersion Narrative Report
No ratings yet
Work Immersion Narrative Report
13 pages
Non-Hermitian Wave Funneling in Media
No ratings yet
Non-Hermitian Wave Funneling in Media
14 pages
Performance Appraisal Methods and Errors
100% (2)
Performance Appraisal Methods and Errors
25 pages
BCIPS MBA Placement Highlights 2021-22
No ratings yet
BCIPS MBA Placement Highlights 2021-22
32 pages
BIS Hanoi Key Stage 3 Curriculum Guide
No ratings yet
BIS Hanoi Key Stage 3 Curriculum Guide
44 pages
Computer-Based Test System Design
No ratings yet
Computer-Based Test System Design
89 pages
Ethical Dimensions of Human Existence
67% (3)
Ethical Dimensions of Human Existence
13 pages
Concerns About Seth Linfield's Leadership
No ratings yet
Concerns About Seth Linfield's Leadership
2 pages
Class 11 Speech and Debate Topics
No ratings yet
Class 11 Speech and Debate Topics
3 pages
Intimation for ICAI Foundation Exam
No ratings yet
Intimation for ICAI Foundation Exam
1 page
EU Ecosystem Services Assessment Report
No ratings yet
EU Ecosystem Services Assessment Report
452 pages
EMR Audit Report: Week 3 Analysis
No ratings yet
EMR Audit Report: Week 3 Analysis
7 pages
Implications of False Belief Experiments
No ratings yet
Implications of False Belief Experiments
7 pages
Common ESL Mistakes and Corrections
No ratings yet
Common ESL Mistakes and Corrections
6 pages
Impact of Sitting on Health and Learning
No ratings yet
Impact of Sitting on Health and Learning
4 pages
English Past Tenses for Travelers
No ratings yet
English Past Tenses for Travelers
16 pages
Daga Elementary School GPTA Event Resolution
No ratings yet
Daga Elementary School GPTA Event Resolution
2 pages
Grade IX Mathematics: Polynomials Worksheet
No ratings yet
Grade IX Mathematics: Polynomials Worksheet
3 pages
Types of Literature Lesson Plan
No ratings yet
Types of Literature Lesson Plan
34 pages
Advanced Electronics Exam Paper 2021-22
No ratings yet
Advanced Electronics Exam Paper 2021-22
3 pages
Eysenck Personality Report for Children
No ratings yet
Eysenck Personality Report for Children
4 pages
African Railway Scholarship Application Form
No ratings yet
African Railway Scholarship Application Form
2 pages
Sehajdeep Singh: IT Student & Developer
No ratings yet
Sehajdeep Singh: IT Student & Developer
1 page
Year 3 Term 2 Assessment Guide
No ratings yet
Year 3 Term 2 Assessment Guide
7 pages
Tableau BI Tool Case Study at Cornell
No ratings yet
Tableau BI Tool Case Study at Cornell
6 pages

Understanding Deep Learning Basics

Uploaded by

Understanding Deep Learning Basics

Uploaded by

DEEP LEARNING

o Purpose: This layer produces the final prediction of the model.

 Role of Weights and Biases:

 Pre-activation and Activation Values:

 Common Functions & Use Cases:

o Tanh (Hyperbolic Tangent): Compresses input into a range between -1 and 1. It is

o Softmax: Used exclusively in the output layer for multi-class classification. It

 Regression Loss Functions:

 Classification Loss Functions:

o Binary Cross-Entropy: Used for binary (two-class) classification problems. It

o Categorical Cross-Entropy: Used for multi-class classification problems. It compares

 Chain Rule and Gradient Flow:

 Weight Update Process:

6. Gradient Descent & Its Variants

 Batch Gradient Descent:

o Pros: Produces a stable and direct convergence path.

 Stochastic Gradient Descent (SGD):

 Mini-Batch Gradient Descent:

⚙️ How It Works (Step-by-Step):

1. Start with random weights

 Identifying Underfitting (High Bias):

 Identifying Overfitting (High Variance):

10. Regularization Techniques

o Concept: Adds a penalty proportional to the absolute value of the weights.

 L2 Regularization (Ridge / Weight Decay):

o Implementation: It is only active during training and is turned off during

Learning Rate Scheduling:

o Batch: A small subset of the training dataset.

 Key Hyperparameters and Tuning:

TensorFlow vs Keras (Before & After TensorFlow 2.0)

Feature Keras (standalone, pre-TF2.0) [Link] (Keras inside TensorFlow)

from [Link] import Sequential

from [Link] import Dense

# Step 1: Define the model

Dense(64, activation='relu', input_shape=(10,)),

# Step 2: Compile the model

[Link](optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Step 3: Train the model

[Link](X_train, y_train, validation_data=(X_val, y_val), epochs=10, batch_size=32)

# Step 4: Evaluate or predict

You might also like