0% found this document useful (0 votes)

44 views73 pages

Deep Learning Lectures - 2

Uploaded by

Việt Lê

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views73 pages

Deep Learning Lectures - 2

Uploaded by

Việt Lê

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 73

Neural networks and

Backpropagation
Charles Ollion - Olivier Grisel

1 / 74
Neural Network for
classification
Vector function with tunable parameters θ
N K
f (⋅; θ) : R → (0, 1)

2 / 74
Neural Network for
classification
Vector function with tunable parameters θ
N K
f (⋅; θ) : R → (0, 1)

Sample s in dataset S :

input: xs
N
∈ R

expected output: y s ∈ [0, K − 1]

3 / 74
Neural Network for
classification
Vector function with tunable parameters θ
N K
f (⋅; θ) : R → (0, 1)

Sample s in dataset S :

input: xs
N
∈ R

expected output: y s ∈ [0, K − 1]

Output is a conditional probability distribution:

s s
f (x ; θ)c = P (Y = c|X = x )

4 / 74
Artificial Neuron

5 / 74
Artificial Neuron

T
z(x) = w x + b

T
f (x) = g(w x + b)

x, f (x) input and output

z(x) pre-activation
w, b weights and bias
g activation function

6 / 74
Layer of Neurons

7 / 74
Layer of Neurons

f (x) = g(z(x)) = g(Wx + b)

W, b now matrix and vector

8 / 74
One Hidden Layer Network

h h h
z (x) = W x + b
h h h
h(x) = g(z (x)) = g(W x + b )
o o o
z (x) = W h(x) + b
o o o
f (x) = sof tmax(z ) = sof tmax(W h(x) + b )

9 / 74
One Hidden Layer Network

h h h
z (x) = W x + b
h h h
h(x) = g(z (x)) = g(W x + b )
o o o
z (x) = W h(x) + b
o o o
f (x) = sof tmax(z ) = sof tmax(W h(x) + b )

10 / 74
One Hidden Layer Network

h h h
z (x) = W x + b
h h h
h(x) = g(z (x)) = g(W x + b )
o o o
z (x) = W h(x) + b
o o o
f (x) = sof tmax(z ) = sof tmax(W h(x) + b )

11 / 74
One Hidden Layer Network

h h h
z (x) = W x + b
h h h
h(x) = g(z (x)) = g(W x + b )
o o o
z (x) = W h(x) + b
o o o
f (x) = sof tmax(z ) = sof tmax(W h(x) + b )

12 / 74
One Hidden Layer Network

Alternate representation

13 / 74
One Hidden Layer Network

Keras implementation

model = Sequential()
model.add(Dense(H, input_dim=N)) # weight matrix dim [N * H]
model.add(Activation("tanh"))
model.add(Dense(K)) # weight matrix dim [H x K]
model.add(Activation("softmax"))

14 / 74
Element-wise activation
functions

blue: activation function

green: derivative 15 / 74
Softmax function
x1
e
⎡ ⎤
x2
1 ⎢ e ⎥
⎢ ⎥
sof tmax(x) = ⋅
n xi ⎢ ⎥
∑ e ⎢ ⋮ ⎥
i=1

⎣ xn ⎦
e

∂ sof tmax(x)i sof tmax(x)i ⋅ (1 − sof tmax(x)i ) i = j

= {
∂ xj −sof tmax(x)i ⋅ sof tmax(x)j i ≠ j

16 / 74
Softmax function
x1
e
⎡ ⎤
x2
1 ⎢ e ⎥
⎢ ⎥
sof tmax(x) = ⋅
n xi ⎢ ⎥
∑ e ⎢ ⋮ ⎥
i=1

⎣ xn ⎦
e

∂ sof tmax(x)i sof tmax(x)i ⋅ (1 − sof tmax(x)i ) i = j

= {
∂ xj −sof tmax(x)i ⋅ sof tmax(x)j i ≠ j

vector of values in (0, 1) that add up to 1

p(Y = c|X = x) = softmax(z(x))c

the pre-activation vector z(x) is often called "the logits"

17 / 74
Training the network
Find parameters θ = (Wh ; bh ; Wo ; bo ) that minimize the
negative log likelihood (or cross entropy)

18 / 74
Training the network
Find parameters θ = (Wh ; bh ; Wo ; bo ) that minimize the
negative log likelihood (or cross entropy)

The loss function for a given sample s ∈ S :

s s s s s
l(f (x ; θ), y ) = nll(x , y ; θ) = − log f (x ; θ)y s

19 / 74
Training the network
Find parameters θ = (Wh ; bh ; Wo ; bo ) that minimize the
negative log likelihood (or cross entropy)

The loss function for a given sample s ∈ S :

s s s s s
l(f (x ; θ), y ) = nll(x , y ; θ) = − log f (x ; θ)y s

example

20 / 74
Training the network
Find parameters θ = (Wh ; bh ; Wo ; bo ) that minimize the
negative log likelihood (or cross entropy)

The loss function for a given sample s ∈ S :

s s s s s
l(f (x ; θ), y ) = nll(x , y ; θ) = − log f (x ; θ)y s

The cost function is the negative likelihood of the model computed

on the full training set (for i.i.d. samples):

1
s
LS (θ) = − ∑ log f (x ; θ)y s
|S |
s∈S

21 / 74
Training the network
Find parameters θ = (Wh ; bh ; Wo ; bo ) that minimize the
negative log likelihood (or cross entropy)

The loss function for a given sample s ∈ S :

s s s s s
l(f (x ; θ), y ) = nll(x , y ; θ) = − log f (x ; θ)y s

The cost function is the negative likelihood of the model computed

on the full training set (for i.i.d. samples):

1
s
LS (θ) = − ∑ log f (x ; θ)y s + λΩ(θ)
|S |
s∈S

2 2
λΩ(θ) = λ(||W
h
|| + ||W
o
|| ) is an optional regularization term.

22 / 74
Stochastic Gradient Descent
Initialize θ randomly

23 / 74
Stochastic Gradient Descent
Initialize θ randomly

For E epochs perform:

Randomly select a small batch of samples (B ⊂ S)

24 / 74
Stochastic Gradient Descent
Initialize θ randomly

For E epochs perform:

Randomly select a small batch of samples (B ⊂ S)

Compute gradients: Δ = ∇θ LB (θ)

25 / 74
Stochastic Gradient Descent
Initialize θ randomly

For E epochs perform:

Randomly select a small batch of samples (B ⊂ S)

Compute gradients: Δ = ∇θ LB (θ)

Update parameters: θ ← θ − ηΔ

η > 0 is called the learning rate

26 / 74
Stochastic Gradient Descent
Initialize θ randomly

For E epochs perform:

Randomly select a small batch of samples (B ⊂ S)

Compute gradients: Δ = ∇θ LB (θ)

Update parameters: θ ← θ − ηΔ

η > 0 is called the learning rate

Repeat until the epoch is completed (all of S is covered)

27 / 74
Stochastic Gradient Descent
Initialize θ randomly

For E epochs perform:

Randomly select a small batch of samples (B ⊂ S)

Compute gradients: Δ = ∇θ LB (θ)

Update parameters: θ ← θ − ηΔ

η > 0 is called the learning rate

Repeat until the epoch is completed (all of S is covered)

Stop when reaching criterion:

nll stops decreasing when computed on validation set

28 / 74
Computing Gradients

∂l(f (x),y) ∂l(f (x),y)

Output Weights: o
Output bias: o
∂b i
∂W i,j

∂l(f (x),y) ∂l(f (x),y)

Hidden Weights: h
Hidden bias: h
∂W i,j ∂b i

29 / 74
Computing Gradients

∂l(f (x),y) ∂l(f (x),y)

Output Weights: o
Output bias: o
∂b i
∂W i,j

∂l(f (x),y) ∂l(f (x),y)

Hidden Weights: h
Hidden bias: h
∂W i,j ∂b i

The network is a composition of differentiable modules

We can apply the "chain rule"

30 / 74
Chain rule

31 / 74
Chain rule

chain-rule

32 / 74
Chain rule

chain-rule

33 / 74
Chain rule

chain-rule

34 / 74
Backpropagation

35 / 74
Backpropagation

Compute partial derivatives of the loss

∂l(f (x),y) ∂−log f (x) −1 y=i

y ∂l
= = =
∂f (x) ∂f (x) f (x) ∂f (x)
i i y i

36 / 74
Backpropagation

Compute partial derivatives of the loss

∂l(f (x),y) ∂−log f (x) −1 y=i

y ∂l
= = =
∂f (x) ∂f (x) f (x) ∂f (x)
i i y i

∂l
o
=?
∂z (x)
i

37 / 74
Chain rule!

38 / 74
39 / 74
40 / 74
41 / 74
: one-hot encoding of y

42 / 74
Backpropagation

Gradients

∇zo (x) l = f (x) − e(y)

∇b o l = f (x) − e(y)

o
∂z (x)
because zo (x) and then
o o i
= W h(x) + b o = 1 i=j
∂b j

43 / 74
Backpropagation

Partial derivatives related to Wo

o
∂z (x)
∂l ∂l k

o
= ∑ o
∂W i,j k o
∂z (x) ∂W i,j
k

⊤
∇Wo l = (f (x) − e(y)). h(x)

44 / 74
Backprop gradients
Compute activation gradients

∇zo (x) l = f (x) − e(y)

45 / 74
Backprop gradients
Compute activation gradients

∇zo (x) l = f (x) − e(y)

Compute layer params gradients

⊤
∇Wo l = ∇zo (x) l ⋅ h(x)

∇b o l = ∇zo (x) l

46 / 74
Backprop gradients
Compute activation gradients

∇zo (x) l = f (x) − e(y)

Compute layer params gradients

⊤
∇Wo l = ∇zo (x) l ⋅ h(x)

∇b o l = ∇zo (x) l

Compute prev layer activation gradients

o⊤
∇h(x) l = W ∇zo (x) l
′ h
∇zh (x) l = ∇h(x) l ⊙ σ (z (x))

47 / 74
Loss, Initialization and
Learning Tricks

48 / 74
Discrete output (classification)
Binary classification: y ∈ [0, 1]

Y |X = x ∼ Bernoulli(b = f (x; θ))

output function: logistic(x) =

1
−x
1+e

loss function: binary cross-entropy

Multiclass classification: y ∈ [0, K − 1]

Y |X = x ∼ M ultinoulli(p = f (x; θ))

output function: sof tmax

loss function: categorical cross-entropy

49 / 74
Continuous output (regression)
Continuous output: y ∈ R
n

2
Y |X = x ∼ N (μ = f (x; θ), σ I)

output function: Identity

loss function: square loss

Heteroschedastic if f (x; θ) predicts both μ and σ 2

Mixture Density Network (multimodal output)

Y |X = x ∼ GM M x

f (x; θ) predicts all the parameters: the means,

covariance matrices and mixture weights
50 / 74
Initialization and normalization
Input data should be normalized to have approx. same range:
standardization or quantile normalization

51 / 74
Initialization and normalization
Input data should be normalized to have approx. same range:
standardization or quantile normalization
Initializing W h and W o :
Zero is a saddle point: no gradient, no learning

52 / 74
Initialization and normalization
Input data should be normalized to have approx. same range:
standardization or quantile normalization
Initializing W h and W o :
Zero is a saddle point: no gradient, no learning
Constant init: hidden units collapse by symmetry

53 / 74
Initialization and normalization
Input data should be normalized to have approx. same range:
standardization or quantile normalization
Initializing W h and W o :
Zero is a saddle point: no gradient, no learning
Constant init: hidden units collapse by symmetry
Solution: random init, ex: w ∼ N (0, 0.01)

54 / 74
Initialization and normalization
Input data should be normalized to have approx. same range:
standardization or quantile normalization
Initializing W h and W o :
Zero is a saddle point: no gradient, no learning
Constant init: hidden units collapse by symmetry
Solution: random init, ex: w ∼ N (0, 0.01)

Better inits: Xavier Glorot and Kaming He &

orthogonal

55 / 74
Initialization and normalization
Input data should be normalized to have approx. same range:
standardization or quantile normalization
Initializing W h and W o :
Zero is a saddle point: no gradient, no learning
Constant init: hidden units collapse by symmetry
Solution: random init, ex: w ∼ N (0, 0.01)

Better inits: Xavier Glorot and Kaming He &

orthogonal
Biases can (should) be initialized to zero

56 / 74
SGD learning rate
Very sensitive:
Too high → early plateau or even divergence
Too low → slow convergence

57 / 74
SGD learning rate
Very sensitive:
Too high → early plateau or even divergence
Too low → slow convergence
Try a large value first: η = 0.1 or even η = 1

Divide by 10 and retry in case of divergence

58 / 74
SGD learning rate
Very sensitive:
Too high → early plateau or even divergence
Too low → slow convergence
Try a large value first: η = 0.1 or even η = 1

Divide by 10 and retry in case of divergence

Large constant LR prevents final convergence
multiply η t by β < 1 after each update

59 / 74
SGD learning rate
Very sensitive:
Too high → early plateau or even divergence
Too low → slow convergence
Try a large value first: η = 0.1 or even η = 1

Divide by 10 and retry in case of divergence

Large constant LR prevents final convergence
multiply η t by β < 1 after each update
or monitor validation loss and divide η t by 2 or 10
when no progress
See ReduceLROnPlateau in Keras

60 / 74
Momentum
Accumulate gradients across successive updates:

mt = γmt−1 + η∇θ LBt (θt−1 )

θt = θt−1 − mt

γ is typically set to 0.9

61 / 74
Momentum
Accumulate gradients across successive updates:

mt = γmt−1 + η∇θ LBt (θt−1 )

θt = θt−1 − mt

γ is typically set to 0.9

Larger updates in directions where the gradient sign is constant to

accelerate in low curvature areas

62 / 74
Momentum
Accumulate gradients across successive updates:

mt = γmt−1 + η∇θ LBt (θt−1 )

θt = θt−1 − mt

γ is typically set to 0.9

Larger updates in directions where the gradient sign is constant to

accelerate in low curvature areas

Nesterov accelerated gradient

mt = γmt−1 + η∇θ LB (θt−1 − γmt−1 )

θt = θt−1 − mt

Better at handling changes in gradient direction.

63 / 74
Why Momentum Really Works

64 / 74
Why Momentum Really Works

65 / 74
Why Momentum Really Works

66 / 74
Why Momentum Really Works

67 / 74
Alternative optimizers
SGD (with Nesterov momentum)
Simple to implement
Very sensitive to initial value of η
Need learning rate scheduling

68 / 74
Alternative optimizers
SGD (with Nesterov momentum)
Simple to implement
Very sensitive to initial value of η
Need learning rate scheduling
Adam: adaptive learning rate scale for each param
Global η set to 3e-4 often works well enough
Good default choice of optimizer (often)

69 / 74
Alternative optimizers
SGD (with Nesterov momentum)
Simple to implement
Very sensitive to initial value of η
Need learning rate scheduling
Adam: adaptive learning rate scale for each param
Global η set to 3e-4 often works well enough
Good default choice of optimizer (often)
But well-tuned SGD with LR scheduling can generalize better
than Adam (with naive l2 reg)...

70 / 74
Alternative optimizers
SGD (with Nesterov momentum)
Simple to implement
Very sensitive to initial value of η
Need learning rate scheduling
Adam: adaptive learning rate scale for each param
Global η set to 3e-4 often works well enough
Good default choice of optimizer (often)
But well-tuned SGD with LR scheduling can generalize better
than Adam (with naive l2 reg)...
Promising stochastic second order methods: K-FAC and Shampoo
can be used to accelerate training of very large models.

71 / 74
The Karpathy Constant for Adam

72 / 74
Optimizers around a saddle point

Credits: Alec Radford

73 / 74

Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Lecture - 14 - FFNN
No ratings yet
Lecture - 14 - FFNN
59 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Neural Networks & Backpropagation Guide
No ratings yet
Neural Networks & Backpropagation Guide
68 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Autoencoders in Deep Learning
No ratings yet
Autoencoders in Deep Learning
73 pages
Deep Learning Module-02 Search Creators
No ratings yet
Deep Learning Module-02 Search Creators
15 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Sparse Autoencoder Overview
No ratings yet
Sparse Autoencoder Overview
15 pages
Feedforward Networks: Marco Kuhlmann
No ratings yet
Feedforward Networks: Marco Kuhlmann
53 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
61 pages
Foundations of Deep Learning
No ratings yet
Foundations of Deep Learning
30 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
Lecture NN Part1
No ratings yet
Lecture NN Part1
62 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Module 3.docxaiml
No ratings yet
Module 3.docxaiml
20 pages
CS445 - Neural Networks and Deep Learning - Lecture Notes
No ratings yet
CS445 - Neural Networks and Deep Learning - Lecture Notes
5 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Neural Network Training
No ratings yet
Neural Network Training
73 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
18 DL Regularization
No ratings yet
18 DL Regularization
41 pages
Module 1 DL
No ratings yet
Module 1 DL
84 pages
Week2 DL
No ratings yet
Week2 DL
29 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
9 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
CBOW vs Skip-Gram in Word2Vec
No ratings yet
CBOW vs Skip-Gram in Word2Vec
170 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Unit 3
No ratings yet
Unit 3
110 pages
Module 2
No ratings yet
Module 2
55 pages
Artificial Neural Networks - DL
No ratings yet
Artificial Neural Networks - DL
55 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
Lecture 4
No ratings yet
Lecture 4
288 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
IBest DeepLearning
No ratings yet
IBest DeepLearning
123 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Slides 11
No ratings yet
Slides 11
48 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Lecture 20
No ratings yet
Lecture 20
71 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
Softmax vs Sigmoid in Neural Networks
No ratings yet
Softmax vs Sigmoid in Neural Networks
15 pages
14 - Học sâu (3) - Improve DNN - v3
No ratings yet
14 - Học sâu (3) - Improve DNN - v3
129 pages
Lecture Slides 2 - Neural Networks - 2021
No ratings yet
Lecture Slides 2 - Neural Networks - 2021
42 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
10 Neural Nets
No ratings yet
10 Neural Nets
61 pages
Practical Deep Learning Techniques
No ratings yet
Practical Deep Learning Techniques
30 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Ann PPT
No ratings yet
Ann PPT
48 pages
Week 7 - Lab
No ratings yet
Week 7 - Lab
6 pages
Non-Linear Models Explained
No ratings yet
Non-Linear Models Explained
8 pages
Bai 1 Eng
No ratings yet
Bai 1 Eng
10 pages
Deep Learning: MLPs and Regularization Techniques
No ratings yet
Deep Learning: MLPs and Regularization Techniques
44 pages
MCS 226 2025
No ratings yet
MCS 226 2025
5 pages
P2 Clark
No ratings yet
P2 Clark
12 pages
Spatial Domain Image Processing Techniques
No ratings yet
Spatial Domain Image Processing Techniques
29 pages
DFA Homework Assignment
No ratings yet
DFA Homework Assignment
8 pages
Alogos Used
No ratings yet
Alogos Used
3 pages
Dynamic Copulas
No ratings yet
Dynamic Copulas
23 pages
An Overflow Problem in Network Codingfor Secure Cloud Storage
No ratings yet
An Overflow Problem in Network Codingfor Secure Cloud Storage
11 pages
Codebusters Exam-Brown 2024 C
No ratings yet
Codebusters Exam-Brown 2024 C
15 pages
Law of Conservation of Linear Momentum
No ratings yet
Law of Conservation of Linear Momentum
7 pages
Modeling and Optimization of Dynamical Systems by Unconventional Spreadsheet Functions
No ratings yet
Modeling and Optimization of Dynamical Systems by Unconventional Spreadsheet Functions
12 pages
Final Exam ANNFL 2015-1
No ratings yet
Final Exam ANNFL 2015-1
9 pages
A Quantile Regression Analysis To Investigate The Effect of Temperature and Humidity On The Spread of COVID-19
No ratings yet
A Quantile Regression Analysis To Investigate The Effect of Temperature and Humidity On The Spread of COVID-19
15 pages
Question Bank Control System (PC-EE 503) Discipline: EE1 & EE2, 5th Semester
No ratings yet
Question Bank Control System (PC-EE 503) Discipline: EE1 & EE2, 5th Semester
7 pages
Portfolio Optimization Errors
No ratings yet
Portfolio Optimization Errors
10 pages
Lab # 8 Control System
No ratings yet
Lab # 8 Control System
10 pages
Arithmetic Mean
No ratings yet
Arithmetic Mean
6 pages
100 Must Do LeetCode Problems
100% (1)
100 Must Do LeetCode Problems
36 pages
Year 3 Moduleguide 2223
No ratings yet
Year 3 Moduleguide 2223
61 pages
Erica's Statistical Distribution Errors
No ratings yet
Erica's Statistical Distribution Errors
3 pages
Quick Sort with Pivot Tracking
No ratings yet
Quick Sort with Pivot Tracking
13 pages
(H) 2019 LPP
No ratings yet
(H) 2019 LPP
7 pages
FGITA：一种基于细粒度对齐的多模态命名实体识别框架
No ratings yet
FGITA：一种基于细粒度对齐的多模态命名实体识别框架
7 pages
CBR SK Edited
No ratings yet
CBR SK Edited
12 pages
Sparse Matrix Techniques in Power Flow
No ratings yet
Sparse Matrix Techniques in Power Flow
27 pages
Deep Neural Network Hardware Architectures
No ratings yet
Deep Neural Network Hardware Architectures
65 pages
Forane 408A Thermodynamic Data
No ratings yet
Forane 408A Thermodynamic Data
1 page
Lecture (08) Fourier Series CT Signals
No ratings yet
Lecture (08) Fourier Series CT Signals
16 pages
Methods: 2.1. Mathematical Modelling of EIT. EIT Image Reconstruc
No ratings yet
Methods: 2.1. Mathematical Modelling of EIT. EIT Image Reconstruc
1 page
DSA-II Unit-4
No ratings yet
DSA-II Unit-4
53 pages
MAT2002 Applications of Differential and Difference Equations ETH 1 AC37
No ratings yet
MAT2002 Applications of Differential and Difference Equations ETH 1 AC37
3 pages