0% found this document useful (0 votes)
16 views78 pages

Introduction To Deep Learning AI 2025

The AI Winter School program from January 20-24, 2025, covers an introduction to deep learning, including theoretical basics and hands-on sessions involving data, training, evaluation, and design techniques. It explores various disciplines of machine learning such as supervised, unsupervised, and reinforcement learning, along with their applications and challenges. The program emphasizes the importance of datasets, training methodologies, evaluation metrics, and common techniques in deep learning.

Uploaded by

thecoolguyhelper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views78 pages

Introduction To Deep Learning AI 2025

The AI Winter School program from January 20-24, 2025, covers an introduction to deep learning, including theoretical basics and hands-on sessions involving data, training, evaluation, and design techniques. It explores various disciplines of machine learning such as supervised, unsupervised, and reinforcement learning, along with their applications and challenges. The program emphasizes the importance of datasets, training methodologies, evaluation metrics, and common techniques in deep learning.

Uploaded by

thecoolguyhelper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

AI WinterSchool

January 20 - 24, 2025


Today’s Program

Part I: Introduction lecture

14:15 - 15:45

● Overview

● Theoretical Basics Part II: Hands-on

● Data 16:15 - 18:00

● Training ● Questions

● Evaluation ● Setup

● Design and Techniques ● Some coding


Introduction to
Deep Learning

Jan 20, 2025


Overview
Machine Learning as Artificial Intelligence

Artificial
Intelligence
Machine Learning
Deep Learning
Any technique that
enables computers to Learn to perform tasks
mimic human behaviour from data without being Extract patterns from data using
explicitly programmed deep neural networks
Disciplines of Machine Learning

Supervised Learning

Labeled Training Data

Pentagon

Square
Triangle Circle
Triangle Triangle

Learning to label

New Data Square!

Model
Disciplines of Machine Learning

Supervised Learning Face recognition Handwritten transcription

Labeled Training Data

Pentagon

Square
Triangle Circle
Triangle Triangle

https://www.theguardian.com/technology/2019/jul/29/
https://www.behance.net/gallery/71324093/The-Handwritten-A
what-is-facial-recognition-and-how-sinister-is-it

Speech recognition Medical diagnosis


Learning to label

New Data Square!

Model https://support.apple.com/de-de/HT208336
https://www.wired.com/story/fmri-ai-suicide-ideation/
Disciplines of Machine Learning

Unsupervised Learning

Unlabeled Training Data

Learning meaningful
representations

New Data

Model
Disciplines of Machine Learning

Gene clustering Unsupervised Learning Image clustering

Unlabeled Training Data

https://ernest-bonat.medium.com/building-machine-learning-clust
ering-models-for-gene-expression-rna-seq-data-d0e5af10416d

https://neurohive.io/en/state-of-the-art/deep-clustering-approach/

Learning meaningful
Language processing representations Generation tasks

New Data

https://www.superannotate.com/blog/what-is-natural-language-processing
Model
Disciplines of Machine Learning

Reinforcement Learning

Unlabeled Training Data

Learning to make decisions

best reward!
New Task:

“build a pyramid
with suitable item”

Model
Disciplines of Machine Learning

Game playing Algorithmic trading Reinforcement Learning

Unlabeled Training Data

https://www.mathworks.com/videos/reinforcement-learning-in-finance-1578033119150
.html

https://deepmind.google/research/breakthroughs/alphago/
Robotics

Goal-oriented chatbots Learning to make decisions

best reward!
New Task:

“build a pyramid
with suitable item”

https://towardsdatascience.com/training-a-goal-oriented-chatbot-with-deep-reinf https://www.sciencenews.org/article/reinforcement-learn-ai-humanoid-robots Model


orcement-learning-part-i-introduction-and-dce3af21d383
Disciplines of Machine Learning

Supervised Learning Unsupervised Learning Reinforcement Learning

Labeled Training Data Unlabeled Training Data Unlabeled Training Data

Pentagon

Square
Triangle Circle
Triangle Triangle

Learning to label Learning to cluster


Learning to make decisions

New Data Square! New Data best reward!


New Task:

“build a pyramid
with suitable item”
Model Model Model
Supervised Learning Tasks

Classification Regression
Training: learn to predict a label out of a Training: predict a label as a continuous value
discrete set directly

Testing: accuracy as # of correctly Testing: distance/similarity to actual outcomes


predicted
Unsupervised Learning Tasks

Clustering Generation
Training: learn to identify groups Training: create representations to sample realistic
outputs

Testing? Depends on the availability of ground truth data / other measures of


performance…
Figure modified from: https://towardsdatascience.com/training-a-goal-oriented-chatbot-with-deep-reinforcement-learning-part-i-introduction-and-dce3af21d383
Deep Learning

Deep Neural Networks

Input Hidden Output


Deep Learning

Deep Neural Networks


Why this?

● Hierarchical processing: several levels


● All-in-one model: human out of the loop (?!)
● Extremely expressive: can learn “anything”

Input Hidden Output


Deep Learning

Deep Neural Networks


Why this?

● Hierarchical processing: several levels


● All-in-one model: human out of the loop (?!)
● Extremely expressive: can learn “anything”

Why now?

● Unprecedented amount of available data


● Parallelization of computations by GPUs
● Many available toolkits
Input Hidden Output
Theoretical Basics
A Neural Network

Input Hidden Output


A Neural Network

“Multi-layer Perceptron”
Neuron
Layer

Weight
Perceptron

Input Hidden Output


A Neural Network

Perceptron

input output
A Neural Network

Perceptron
The math
bias

input weights sum non-linearity output


A Neural Network

Perceptron
The math…

bias

linear combination
activation of input bias

input weights sum non-linearity output


A Neural Network

Single Layer Network


The math…

input layer
A Neural Network

Single Layer Network


The math…

input layer
A Neural Network

Single Layer Network


The math…

input layer output


A Neural Network

Multi-layer Network
A Neural Network

Multi-layer Network
A Neural Network

Multi-layer Network
A Neural Network

network output (“prediction”)


Multi-layer Network

input network parameters


= weights
A Neural Network

network output (“prediction”)


Multi-layer Network

input network parameters


= weights

fixed fixed
has two faces

Evaluation function Training function


Non-Linearities: Activation Functions

Biological motivation:
here: activate neuron if threshold b is exceeded
threshold

activate!

discard!

Heaviside (step) function


Non-Linearities: Activation Functions

here:
threshold

the “default”

output within [0,1]


Supervised Learning Tasks

Classification Regression

Cat

Burger
°C, $, ...
Tree

Bed

→ probability distribution (soft-max) predict the value directly


“Expressive Power”

What can a neural network learn?


“Expressive Power”

What can a neural network learn?

anything
“Expressive Power”

What can a neural network learn?

“anything”

Universal Approximation Theorem

“Neural networks with a non-polynomial activation function


can approximate any continuous function arbitrary well”
Data
What is a dataset?

● An organized collection of data


○ One “unit” of data = an instance / data point
○ Information about a data point = Features
○ Labels or other annotations often included
→ Required for supervised tasks but not (necessarily) for unsupervised
○ Normalize it:
Properties of a (good) dataset

● What about dataset size…?


○ Defined entirely by the task (from dozens/hundreds to millions)
○ Only certainty is that “the more the merrier”, but also “the more
representative the merrier”
● Do not forget: the data split (~80/20%)

Training set Test set

Data

use during training check performance of


finished(!) model
Training
Training

Supervised learning:

Given samples of training data with corresponding labels

input (matrix with values) camel

cat

label (binary vector with a 1 at Pikachu

camel cat Pikachu


the correct class)

Goal: Optimize the weights such that ( ) = cat for all of the samples in the training data.

but also ( ) = cat for samples outside!


Training

How to achieve this goal?

Loss function (error, cost) : how good is prediction compared to the true label

● Zero-one loss: - Is it exactly the same or not?

● Square loss (L2): - Euclidean distance

● Cross entropy loss: - maximize likelihood

→ minimizing the loss function will improve the prediction!


Training

Idea: Start with random weights

1) Take a sample and measure good bad the prediction is: ( ) = Pikachu
2) Update the weights to improve the prediction (i.e., loss decreases): ( ) = cat

Repeat the process for every sample in the training data set.
Evaluate

true
labels
Compute
Initialize Loss

if good
enough
Update
weights
go to next sample
Training

GOAL: find a weight update rule that produces a sequence that gradually decreases the loss.

As training progresses, later weights should result in smaller losses.

Loss
And do it over the whole training set:

Training

Find the weights which result in minimal loss over the whole training set.

→ non-linear, non-convex optimization problem!


Special Case: Linear Perceptron

Loss Linear Regression!

Least squares problem!


Weight Updates: A simple optimization technique

Gradient Descent

Gradient of the loss: “how does the loss change, if a weight changes?”

→ points to steepest ascent (i.e., the direction to change the weights, so that there is maximal change in the loss)
Weight Updates: A simple optimization technique

Gradient Descent

Gradient of the loss:

Gradient

→ points to steepest ascent


Weight Updates: A simple optimization technique

Gradient Descent

Gradient of the loss:

→ points to steepest ascent

go opposite direction of steepest ascent


Weight Updates: A simple optimization technique

Gradient Descent

Gradient of the loss:

→ points to steepest ascent

go opposite direction of steepest ascent


Weight Updates: A simple optimization technique

Gradient Descent

Gradient of the loss:

→ points to steepest ascent

go opposite direction of steepest ascent


Weight Updates: A simple optimization technique

Gradient Descent

Gradient of the loss:

→ points to steepest ascent


local minimum

global minimum

go opposite direction of steepest ascent


Weight Updates: A simple optimization technique

Gradient Descent

Gradient of the loss:


Algorithm

Initialize

Until convergence:
→ points to steepest ascent
Compute gradient

Update weights

Return weights

“learning rate”
Training on Batches

Gradient descent is very expensive…

Example: A single step of gradient descent for AlexNet (neural network ~160M parameters) on ImageNet
(dataset ~1.2M images) requires ~2*10^14 flops!
Training on Batches

Gradient descent is very expensive…

Example: A single step of gradient descent for AlexNet (neural network ~160M parameters) on ImageNet
(dataset ~1.2M images) requires ~2*10^14 flops!

Train on small batches of the dataset!

“Training with large minibatches is bad for your health. More importantly, it’s bad for your
test error. Friends don’t let friends use minibatches larger that 32.”
-Yann LeCun
Evaluation
Training-Test

Training set Test set

Data

during training: monitor loss/error after training:


check performance
Training-Test

Training set Test set

Data

during training: monitor loss/error after training:


check performance

Evaluate on unseen data


Loss

Training Training
Bias-Variance Tradeoff

Over- and underfitting

Example:
Learn a second-degree
polynomial from noisy
observations

ground truth
deg = 2

https://shapeofdata.wordpress.com
Bias-Variance Tradeoff

Over- and underfitting

Example:
Learn a second-degree
polynomial from noisy
observations

ground truth underfitting


deg = 2 deg too low

Simple model:
high bias,
good capturing of essentials,
bad fit
https://shapeofdata.wordpress.com
Bias-Variance Tradeoff

Over- and underfitting

Example:
Learn a second-degree
polynomial from noisy
observations

ground truth underfitting overfitting


deg = 2 deg too low deg too high

Simple model: Complex model:


high bias, high variance,
good capturing of essentials, good fit to data,
bad fit too specific
https://shapeofdata.wordpress.com
Bias-Variance Tradeoff

Over- and underfitting

Example:
Learn a second-degree
polynomial from noisy
observations

ground truth underfitting overfitting


deg = 2 deg too low deg too high

Simple model: Complex model: Trade-off between


high bias, high variance, model assumptions (bias) and
good capturing of essentials, good fit to data, model complexity (variance)
bad fit too exact
https://shapeofdata.wordpress.com
Training-Validation-Test

Training set Validation set Test set

Data

during training: after training:


intermediate check performance
performance check
Training-Validation-Test

Training set Validation set Test set

Data

during training: after training:


intermediate check performance
high bias high variance performance check

Training
Loss

Validation
Training / Model complexity
Training-Validation-Test

Training set Validation set Test set

Data

during training: after training:


intermediate check performance
high bias high variance performance check

Stop here!
Training
Loss

Validation
Training / Model complexity
Metrics of performance

● Defined by the task: MSE, accuracy, mAP, etc…

● In case of classification:
Interpretability

● XAI: steering away from the black box


● Crucial in high-responsibility decision making, e.g. medicine
● TOOLS: explainable architecture, post-hoc analysis, etc.

Wu et al., 2023: Discover and Cure - Concept-aware Mitigation of Spurious Correlation


Bias

● Mitigating bias
○ Especially important in decision making with a social effect (e.g., granting parole [1])

● TOOLS: metrics to assess group fairness (demographic parity, equalized


odds, etc.), transparency about biases in the data collection process…

[1]: Angwin et al., 2016: Machine Bias


Design and Techniques
Common Techniques

regularization
Regularizing: term of the ...often
network weights

Dropout: set weights to zero at random

Stochastic Gradient Descent (SGD): use the gradient of a randomly selected subset

Batch normalization: normalize the samples w.r.t. to the other samples in the batch
Popular architectures

Convolutional neural networks: apply “filters” to extract spatial features, textures, patterns, etc.
● Popular choice in image processing.
● Examples: VGG-16, VGG-19, AlexNet, etc.
Popular architectures

Convolutional neural networks: apply “filters” to extract spatial features, textures, patterns, etc.
● Popular choice in image processing.
● Examples: VGG-16, VGG-19, AlexNet, etc.

Autoencoders: learn a compact statistical representation of the data and sample from it.
● Useful in dimensionality reduction, data generation, denoising, etc.
● Example: Variational Autoencoders (VAE)
Popular architectures

Convolutional neural networks: apply “filters” to extract spatial features, textures, patterns, etc.
● Popular choice in image processing.
● Examples: VGG-16, VGG-19, AlexNet, etc.

Autoencoders: learn a compact statistical representation of the data and sample from it.
● Useful in dimensionality reduction, data generation, denoising, etc.
● Example: Variational Autoencoders (VAE)

Residual neural networks: use shortcut connections to skip layers (helps with vanishing gradients).
● Useful in applications requiring large networks: image segmentation, object detection, etc.
● Example: ResNet
Popular architectures

Convolutional neural networks: apply “filters” to extract spatial features, textures, patterns, etc.
● Popular choice in image processing.
● Examples: VGG-16, VGG-19, AlexNet, etc.

Autoencoders: learn a compact statistical representation of the data and sample from it.
● Useful in dimensionality reduction, data generation, denoising, etc.
● Example: Variational Autoencoders (VAE)

Residual neural networks: use shortcut connections to skip layers (helps with vanishing gradients).
● Useful in applications requiring large networks: image segmentation, object detection, etc.
● Example: ResNet

Transformers: capture relationships in sequential data by considering the whole context.


● Useful in applications with sequential data (e.g., text), but also otherwise (vision transformers).
● Examples: GPTs, BERT, ViT, DINOv2
Today’s Program

Part I: Introduction lecture

14:15 - 15:45

● Overview

● Theoretical Basics Part II: Hands-on

● Data 16:15 - 18:00

● Training ● Questions

● Evaluation ● Setup

● Design and Techniques ● Some coding


AI WinterSchool
January 20 - 24, 2025
Exercises
Exercises

Using Google Colab and PyTorch.

Open the notebook Intro_WS_2025.ipynb.

Follow the instructions in the notebook.

You might also like