0% found this document useful (0 votes)
25 views24 pages

TP5-1.ipynb - Colab

This document outlines a practical session focused on using Koopman operator theory to describe the dynamics of a non-linear dynamical system, specifically the Duffing oscillator. It details the integration of ordinary differential equations (ODEs) using numerical solvers and the application of neural networks to learn dynamical systems. The document also introduces the Koopman operator and its properties, including the use of loss functions for training neural networks to model the dynamics effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views24 pages

TP5-1.ipynb - Colab

This document outlines a practical session focused on using Koopman operator theory to describe the dynamics of a non-linear dynamical system, specifically the Duffing oscillator. It details the integration of ordinary differential equations (ODEs) using numerical solvers and the application of neural networks to learn dynamical systems. The document also introduces the Koopman operator and its properties, including the use of loss functions for training neural networks to model the dynamics effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

30/03/2025 05:13 TP5-1.

ipynb - Colab

keyboard_arrow_down Koopman operator learning, a toy case: Duffing oscillator


-- PLEASE SUBMIT YOUR COMPLETED NOTEBOOK WITH CELL OUTPUTS --

The aim of this notebook is to describe the dynamics of a non-linear dynamical system by
means of the Koopman theory.

keyboard_arrow_down Introduction
We consider a quantity x ∈ R
n
(a vector) which evolves with time, following a dynamical
system. Think for example of the joint location of the planets in our solar system, which follows
the law of gravitation.

Formally, given an initial state x(t = 0) ∈ R


n
at time t = 0 , the time evolution of x is
governed by the following dynamical system:
ẋ(t) = f (x(t)) (1)

dx(t)
where ẋ(t) is the temporal derivative, and f is a given map describing
n n
:= : R → R
dt

the dynamics.

For a given f , it is not always possible to solve the differential equation (1) analytically. For this
reason, instead, numerical schemes are usually employed, to integrate in time t the equation (1),
so as to propagate the initial condition x(0) up to a desired time T ; think of
T
x(T ) = x(0) + ∫
t=0
f (x(t))dt . The discretization in time of eq (1) or of the integral
introduces numerical approximations, and yields estimates of x(T ) of various quality
depending on the discretization scheme.

In the field of numerical simulations, discretization schemes have been studied for a long time,
and numerical solvers already exist to provide good estimates of integrals (far better than with
the naive discretization xk+δ = xk + δ f (xk ) for a discrete time increment δ, which induces a
O(δ )
2
error at each time step).

The goal of this practical session is to make use of such numerical solvers to improve the
learning of dynamical systems with neural networks.

import numpy as np
from scipy.integrate import solve_ivp
import matplotlib.pyplot as plt
import sys
from tqdm import tqdm

from torch.utils.data import TensorDataset, DataLoader


import torch
import torch.nn as nn

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 1/24
30/03/2025 05:13 TP5-1.ipynb - Colab

import torch.nn.functional as F
import torch.autograd as autograd

# arrange the dataset


from sklearn.model_selection import train_test_split

keyboard_arrow_down Duffing oscillator


As a toy example, we consider the Duffing oscillator, where the state x
2
= (x1 , x2 ) ∈ R

follows the dynamical system described by the following ODEs:


ẋ1 = x2

3
ẋ2 = x1 − x
1

To integrate in time the ODEs, a 4th order Runge-Kutta scheme can be used.

def duffing(array_x: np.ndarray) -> np.ndarray:


array_dx = np.zeros(array_x.shape)
array_dx[0] = array_x[1]
array_dx[1] = array_x[0] - array_x[0] ** 3
return array_dx

t_max = 500 # Time-horizon integration


n_iter = 5000 # Number of time steps integration
n_initial_conditions = 60 # Number of initial conditions

dim_system = 2

# Generate initial conditions


matrix_x0 = (np.random.rand(n_initial_conditions, dim_system) - 0.5) * 4
array_t = np.linspace(0, t_max, n_iter)
array3d_xt = np.zeros((matrix_x0.shape[0], matrix_x0.shape[1], n_iter))

for i in tqdm(range(matrix_x0.shape[0])):
# Lambda function is used as solve_ivp requires a function of the form f(t, x)
ode_result = solve_ivp(lambda _t, array_x: duffing(array_x),
[0, t_max],
matrix_x0[i],
method='RK45',
t_eval=array_t)

array3d_xt[i, :] = ode_result.y

100%|██████████| 60/60 [00:17<00:00, 3.43it/s]

The following plot shows trajectories for different initial conditions:

fig = plt.figure(figsize=(20, 5))


ax = fig.add_subplot(131)
cm = plt.get_cmap("tab10")
https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 2/24
30/03/2025 05:13 TP5-1.ipynb - Colab

print(cm)
for i in range(10):
ax.plot(array3d_xt[i, 0, :], array3d_xt[i, 1, :], lw=0.5, color=cm(i))
ax.plot(array3d_xt[i, 0, 0], array3d_xt[i, 1, 0], 'o', lw=1.5, color=cm(i)) #initial
ax.set_xlabel('$x_1$', fontsize=20)
ax.set_ylabel('$x_2$', fontsize=20)
plt.show()

<matplotlib.colors.ListedColormap object at 0x789cb411f910>

keyboard_arrow_down The Koopman operator


Discontinuous in time case
Given the discrete non-linear dynamical system

xk+1 = F (xk )

where F might be the δ-discretised flow map of the continuous dynamical system in eq (1)
given by
k+δ

xk+1 = F (xk ) := xk + ∫ f (x(s))ds


k

and X = (xk )
N
k=0
the discrete time series of the system state.

The Koopman theory states that there exists an infinite-dimensional linear operator K that
advances in time all observable functions (gi )m
i=1
given by gi : R
n
→ R

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 3/24
30/03/2025 05:13 TP5-1.ipynb - Colab

Kgi (x) = gi ∘ F (x)

This way, the non-linear dynamics of x, described by F , can be turned into a linear dynamical
system, described by K , acting on another representation space, formed by the observable
quantities gi (x).

Indeed, let gi be an observable function and denoting gi k := gi (xk ) , using the previous
equation, the time evolution of the observables is given by

gi = gi (xk+1 ) = gi (F (xk )) = gi ∘ F (xk ) = Kgi (xk ) = Kgi


k+1 k

then, the linearised dynamics of the observables is given by the following equation

gi = Kgi
k+1 k

It is then sufficient to find a function g : R


n
→ R
m
with m ≫ n that embeds the state x into a
"larger enough" dimensional space m such that the linear operator K can be inferred by a
matrix K ∈ R
m×m
.

To project back the dynamics from the Koopman space (Rm , where g(x) lives) to the phase
space (Rn , where x lives), a supplementary function φ : R
m
→ R
n
is needed. Going from x to
the Koopman space and back yields φ o g = Id.

Under this condition, the functions g, φ and K can be parametrized gθ , φρ and Kϕ , and the
parameters θ, ρ and ϕ can be learned minimizing suitable loss functions.

For this purpose, given a time series X = {xk |k = 1 … N } , the following conditions hold:

1. Reconstruction error
∥φρ (gθ (xk )) − xk ∥ = 0

2. Prediction error in Koopman space


∥K ϕ gθ (xk ) − gθ (xk+1 )∥ = 0

3. Prediction error in the phase space


∥φρ (K ϕ gθ (xk )) − xk+1 ∥ = 0

The last three errors can be used as loss functions to train three different neural networks.
These different neural networks compose our architecture that can be summarized as in the
following sketch:

architecture

# Flatten the trajectories w.r.t. initial conditions


# and only keep data in the form of (dim_system, n_iter * n_initial_conditions)
matrix_x_data = array3d_xt[:, :, :-1].swapaxes(0, 1).reshape(2, -1).T
matrix_x_next_data = array3d_xt[:, :, 1:].swapaxes(0, 1).reshape(2, -1).T

(matrix_x_data_train,
matrix_x_data_test,
matrix_x_next_data_train,
matrix_x_next_data_test) = train_test_split(matrix_x_data,
https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 4/24
30/03/2025 05:13 TP5-1.ipynb - Colab

matrix_x_next_data,
test_size=0.2)

# Cast type to float32


matrix_x_data_train = matrix_x_data_train.astype(np.float32)
matrix_x_data_test = matrix_x_data_test.astype(np.float32)
matrix_x_next_data_train = matrix_x_next_data_train.astype(np.float32)
matrix_x_next_data_test = matrix_x_next_data_test.astype(np.float32)

print(matrix_x_data_train.shape,
matrix_x_data_test.shape,
matrix_x_next_data_train.shape,
matrix_x_next_data_test.shape)

(239952, 2) (59988, 2) (239952, 2) (59988, 2)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


print(device)

cuda

torch.set_default_dtype(torch.float32)
# torch.set_default_tensor_type('torch.DoubleTensor')

batch_size = 2000 # data per batch

tensor2d_x_data_train = torch.from_numpy(matrix_x_data_train).to(device)
tensor2d_x_next_data_train = torch.from_numpy(matrix_x_next_data_train).to(device)
tensor2d_x_data_test = torch.from_numpy(matrix_x_data_test).to(device)
tensor2d_x_next_data_test = torch.from_numpy(matrix_x_next_data_test).to(device)

torch_dataset_train = TensorDataset(tensor2d_x_data_train,
tensor2d_x_next_data_train)

torch_dataset_test = TensorDataset(tensor2d_x_data_test,
tensor2d_x_next_data_test)

train_dataloader = DataLoader(torch_dataset_train,
batch_size=batch_size,
shuffle=True)
test_dataloader = DataLoader(torch_dataset_test,
batch_size=batch_size,
shuffle=True) ######

# create the models


feature_dim = 2 # dimension of the Duffing oscillator
hidden_layer = 5 # number of hidden layers in g (ENCODER) and \varphi (DECODER)
output_dim = 30 # dimension in Koopman space

class Encoder(nn.Module):
def __init__(self, list_layer_dim: list):
super().__init__()
self.list_layer_dim = list_layer_dim
https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 5/24
30/03/2025 05:13 TP5-1.ipynb - Colab

self.list_FC = nn.ModuleList()
for i in range(len(self.list_layer_dim) - 1):
input_dim = self.list_layer_dim[i]
output_dim = self.list_layer_dim[i + 1]
self.list_FC.append(nn.Linear(input_dim, output_dim))

def forward(self, tensor2d_x):


for i in range(len(self.list_layer_dim) - 2):
tensor2d_x = F.elu(self.list_FC[i](tensor2d_x))
return self.list_FC[-1](tensor2d_x)

class Decoder(nn.Module):
def __init__(self, list_layer_dim: list):
super().__init__()
self.list_layer_dim = list_layer_dim
self.list_FC = nn.ModuleList()
for i in range(len(self.list_layer_dim) - 1, 0, -1):
input_dim = self.list_layer_dim[i]
output_dim = self.list_layer_dim[i - 1]
self.list_FC.append(nn.Linear(input_dim, output_dim))

def forward(self, tensor2d_x: torch.Tensor):


for i in range(len(self.list_layer_dim) - 2):
tensor2d_x = F.elu(self.list_FC[i](tensor2d_x))
return self.list_FC[-1](tensor2d_x)

class Autoencoder(nn.Module):
def __init__(self, feature_dim: int, hidden_layer: int, output_dim: int):
super().__init__()
list_layer_dim = \
[output_dim if i == hidden_layer
else feature_dim + i * (output_dim - feature_dim) // hidden_layer
for i in range(hidden_layer + 1)]
self.encoder = Encoder(list_layer_dim)
self.decoder = Decoder(list_layer_dim)

def forward(self, tensor2d_x: torch.Tensor):


tensor2d_x = self.Encoder(tensor2d_x)
return self.Decoder(tensor2d_x)

The Koopman operator K (which is linear, and thus a matrix) must have a spectral radius
ρ(K) ≤ 1 . Such condition will provide a stable -or at least a marginally stable- Koopman
operator. To fulfill this requirement, we might leverage on the Perron-Frobenius theorem.

The Perron-Frobenius th. states: if K is a m × m positive matrix i.e. kij > 0 for 1 ≤ i, j ≤ m ,
then the following inequality holds:

min ∑ kij ≤ ρ(K) ≤ max ∑ kij


i i
j j

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 6/24
30/03/2025 05:13 TP5-1.ipynb - Colab

Question 1. : Complete the KoopmanModule class to enforce ρ(K) , using the Perron-
≤ 1

Frobenius theorem. Check that the initialization fulfills this property.

class KoopmanOperator(nn.Module):
def __init__(self, koopman_operator_dim: int):
super().__init__()
self.koopman_operator_dim = koopman_operator_dim

# TODO: Complete the KoopmanOperator class


K_init = torch.rand(koopman_operator_dim, koopman_operator_dim)
K_init = K_init / (1.01*torch.max(K_init.sum(dim=1)))
self.K = nn.Parameter(K_init)

def forward(self, tensor2d_x: torch.Tensor):


# First dimension of tensor2d_x is the batch size
if tensor2d_x.shape[1] != self.koopman_operator_dim:
sys.exit(f'Wrong Input Features. Please use tensor'
f' with {self.koopman_operator_dim} Input Features')
# TODO: Implement the forward pass
return [email protected]

def check_spectral_radius(self):
# Compute eigenvalues of K
eigenvalues = torch.linalg.eigvals(self.K.data)
# Compute the spectral radius (maximum absolute value of eigenvalues)
spectral_radius = torch.max(torch.abs(eigenvalues)).item()
print(f"Spectral Radius: {spectral_radius}")
return spectral_radius

dim_observable = 10
koopman_operator = KoopmanOperator(dim_observable)

# TODO: Check the spectrum initialisation


koopman_operator.check_spectral_radius()

Spectral Radius: 0.6749839782714844


0.6749839782714844

autoencoder = Autoencoder(feature_dim,
feature_dim hidden_layer, output_dim).to(device)
koopman_operator = KoopmanOperator(output_dim).to(device)
print(autoencoder)

Autoencoder(
(encoder): Encoder(
(list_FC): ModuleList(
(0): Linear(in_features=2, out_features=7, bias=True)
(1): Linear(in_features=7, out_features=13, bias=True)
(2): Linear(in_features=13, out_features=18, bias=True)
(3): Linear(in_features=18, out_features=24, bias=True)
(4): Linear(in_features=24, out_features=30, bias=True)
)
)
https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 7/24
30/03/2025 05:13 TP5-1.ipynb - Colab
(decoder): Decoder(
(list_FC): ModuleList(
(0): Linear(in_features=30, out_features=24, bias=True)
(1): Linear(in_features=24, out_features=18, bias=True)
(2): Linear(in_features=18, out_features=13, bias=True)
(3): Linear(in_features=13, out_features=7, bias=True)
(4): Linear(in_features=7, out_features=2, bias=True)
)
)
)

learning_rate_autoencoder = 0.0001
learning_rate_koopman = 0.00001

optimiser_autoencoder = torch.optim.Adam(autoencoder.parameters(),
lr=learning_rate_autoencoder)

optimiser_koopman = torch.optim.Adam(koopman_operator.parameters(),
lr=learning_rate_koopman)

Question 2. : Define a function to compute the loss to be minimized. It should at least include
the 3 terms listed above:

Reconstruction error
Prediction error in the Koopman space
Prediction error in the phase space

Because the different objectives outlined by these losses may compete, the training can be
difficult. You may try different variations on these losses and comment your findings. In order to
improve the training process, one can for instance:

Add a multiplicative factor in front of each loss component, to balance their importance;
how the scales of different losses are related?
We can refine the loss acting upon the latent space, by using a variational autoencoder
approach. This is similar to the Gaussian likelihood used in the first practical (TD1). We
want the prediction in the latent space (i.e. the Koopman space) to be a normal
distribution N (0, 1) . Add a corresponding loss for the latent space. Difference to 0 mean
and 1 standard deviation must be thus included in the loss;
Freeze the gradients of one part of the network, for instance the encoder, for one specific
objective, using the requires_grad property. For instance:

criterion = nn.MSELoss()
...
# Compute one part loss_l of the total loss
# First deactivate gradient computation for irrelevant parts of the architecture
for p in autoencoder.encoder.parameters():
p.requires_grad = False

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 8/24
30/03/2025 05:13 TP5-1.ipynb - Colab

loss_l = criterion(pred, target)


# Restore the gradient computation
for p in autoencoder.encoder.parameters():
p.requires_grad = True
...
total_loss = loss_1 + ... + loss_l + ...

Implement the loss function here


def loss(X_, Y_, X_recon, gX_, gY_, gY_pred, Y_pred):

# To Be Implemented
return total_loss

# TODO: Implement the loss function here


# HINT: See the training process below to identify the different components of the loss
def loss_vae(tensor2d_observable_next):
mean = tensor2d_observable_next.mean(dim=0)
std = tensor2d_observable_next.std(dim=0)
# we add small regularisation term in the log to avoid computational errors
kl_divergence = 0.5 * torch.sum(std**2 + mean**2 - 1 - torch.log(std**2 + 1e-9))

return kl_divergence

def loss_reconstruction(tensor2d_decoded_x, tensor2d_x):


return F.mse_loss(tensor2d_decoded_x, tensor2d_x)

def loss_koopman_pred(tensor2d_koopman_observable_next, tensor2d_observable_next):


return F.mse_loss(tensor2d_koopman_observable_next, tensor2d_observable_next)

def loss_phase_pred(tensor2d_predict_x_next, tensor2d_x_next):


return F.mse_loss(tensor2d_predict_x_next, tensor2d_x_next)

def loss_koopman(tensor2d_x: torch.Tensor,


tensor2d_x_next: torch.Tensor,
tensor2d_decoded_x: torch.Tensor,
tensor2d_observable_next: torch.Tensor,
tensor2d_koopman_observable_next: torch.Tensor,
tensor2d_predict_x_next: torch.Tensor):

# TODO: Implement the loss function here


total_loss = loss_reconstruction(tensor2d_decoded_x, tensor2d_x) + loss_koopman_pred(
return total_loss

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 9/24
30/03/2025 05:13 TP5-1.ipynb - Colab

1. Reconstruction error
∥φρ (gθ (xk )) − xk ∥ = 0

2. Prediction error in Koopman space


∥K ϕ gθ (xk ) − gθ (xk+1 )∥ = 0

3. Prediction error in the phase space


∥φρ (K ϕ gθ (xk )) − xk+1 ∥ = 0

Question 3.: The following cell executes the training loop. You can modify it in order to display
the different intermediate losses computed in the function LOSS above. How do they evolve in
time? Justify your final choice.

alpha = 1.0 #weight_reconstruction


beta = 1.0 #weight_koopman_pred
gamma = 1.0 #weight_phase_pred
lamda = 0.01 #weight_kl_divergence

n_batch = len(train_dataloader)
n_batch_test = len(test_dataloader)
n_epoch = 100 # To be tuned

n_epochs_for_eval = 10

def plot_loss_evolution(train_losses, val_losses, loss_name, n_epochs, n_epochs_for_eval,


epochs_train = range(1, n_epochs + 1)
epochs_val = range(n_epochs_for_eval, n_epochs + 1, n_epochs_for_eval)

plt.figure(figsize=(8, 4))
plt.plot(epochs_train, train_losses[loss_name], label='Train', marker='.', linestyle=
plt.plot(epochs_val, val_losses[loss_name], label='Validation', marker='o', linestyle
plt.title(f'{loss_name.capitalize()} Loss Evolution (Weight: {weight})')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.show()

def plot_combined_losses_with_total(train_losses, val_losses, n_epochs_for_eval, n_epochs


for loss_name, weight in weights.items():
if loss_name in train_losses:
plot_loss_evolution(train_losses, val_losses, loss_name, n_epochs, n_epochs_f

epochs_train = range(1, n_epochs + 1)


epochs_val = range(n_epochs_for_eval, n_epochs + 1, n_epochs_for_eval)

total_train_loss = [sum(weights[loss_name] * train_losses[loss_name][i] for loss_name


total_val_loss = [sum(weights[loss_name] * val_losses[loss_name][i] for loss_name in

plt.figure(figsize=(8, 4))
plt.plot(epochs_train, total_train_loss, label='Train Total Loss', marker='.', linest

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 10/24
30/03/2025 05:13 TP5-1.ipynb - Colab

plt.plot(epochs_val, total_val_loss, label='Validation Total Loss', marker='o', lines


plt.title('Total Weighted Loss Evolution')
plt.xlabel('Epoch')
plt.ylabel('Total Loss')
plt.legend()
plt.grid(True)
plt.show()

def train_and_evaluate(n_epoch=100, n_epochs_for_eval=10, weight_reconstruction=1.0, weig


train_losses = {'reconstruction': [], 'koopman_pred': [], 'phase_pred': [], 'kl_diverge
val_losses = {'reconstruction': [], 'koopman_pred': [], 'phase_pred': [], 'kl_divergenc
weights = {'reconstruction': alpha, 'koopman_pred': beta, 'phase_pred': gamma, 'kl_dive

for epoch in range(n_epoch):


autoencoder.train()
koopman_operator.train()
epoch_losses = {'reconstruction': 0, 'koopman_pred': 0, 'phase_pred': 0, 'kl_diverg
# total_train_loss = 0
# total_loss1, total_loss2, total_loss3, total_loss4 = 0, 0, 0, 0
for tensor2d_batch_x, tensor2d_batch_x_next in train_dataloader:
tensor2d_batch_x = tensor2d_batch_x.to(device)
tensor2d_batch_x_next = tensor2d_batch_x_next.to(device)

optimiser_autoencoder.zero_grad()
optimiser_koopman.zero_grad()

tensor2d_observable = autoencoder.encoder(tensor2d_batch_x)
tensor2d_observable_next = autoencoder.encoder(tensor2d_batch_x_next)

tensor2d_decoded_x = autoencoder.decoder(tensor2d_observable)

tensor2d_koopman_observable_next = koopman_operator(tensor2d_observable)

tensor2d_predict_x_next = autoencoder.decoder(tensor2d_koopman_observable_next)

# tensor_loss_val = loss_koopman(tensor2d_batch_x,
# tensor2d_batch_x_next,
# tensor2d_decoded_x,
# tensor2d_observable_next,
# tensor2d_koopman_observable_next,
# tensor2d_predict_x_next)

loss_recon = loss_reconstruction(tensor2d_decoded_x, tensor2d_batch_x)


loss_koop_pred = loss_koopman_pred(tensor2d_koopman_observable_next, tensor2d_o
loss_phase = loss_phase_pred(tensor2d_predict_x_next, tensor2d_batch_x_next)
loss_kl = loss_vae(tensor2d_observable_next)

total_loss = (alpha * loss_recon + beta * loss_koop_pred + gamma * loss_phase +

total_loss.backward()
optimiser_autoencoder.step()
optimiser_koopman.step()

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 11/24
30/03/2025 05:13 TP5-1.ipynb - Colab

epoch_losses['reconstruction'] += loss_recon.item()
epoch_losses['koopman_pred'] += loss_koop_pred.item()
epoch_losses['phase_pred'] += loss_phase.item()
epoch_losses['kl_divergence'] += loss_kl.item()

for loss_name, loss_total in epoch_losses.items():


train_losses[loss_name].append(loss_total / n_batch)

if (epoch + 1) % n_epochs_for_eval == 0 or epoch == n_epoch - 1:


# always use .eval() and torch.no_grad together as .eval() only switches dropou
autoencoder.eval()
koopman_operator.eval()
with torch.no_grad():
val_epoch_losses = {'reconstruction': 0, 'koopman_pred': 0, 'phase_pred': 0
for tensor2d_batch_x, tensor2d_batch_x_next in test_dataloader:
tensor2d_batch_x, tensor2d_batch_x_next = tensor2d_batch_x.to(device),

tensor2d_observable = autoencoder.encoder(tensor2d_batch_x)
tensor2d_observable_next = autoencoder.encoder(tensor2d_batch_x_next)
tensor2d_decoded_x = autoencoder.decoder(tensor2d_observable)
tensor2d_koopman_observable_next = koopman_operator(tensor2d_observable
tensor2d_predict_x_next = autoencoder.decoder(tensor2d_koopman_observab

val_epoch_losses['reconstruction'] += loss_reconstruction(tensor2d_deco
val_epoch_losses['koopman_pred'] += loss_koopman_pred(tensor2d_koopman_
val_epoch_losses['phase_pred'] += loss_phase_pred(tensor2d_predict_x_ne
val_epoch_losses['kl_divergence'] += loss_vae(tensor2d_observable_next)

for loss_name, loss_total in val_epoch_losses.items():


val_losses[loss_name].append(loss_total / n_batch_test)

print(f"Epoch {epoch+1}/{n_epoch} - Validation Losses: Reconstruction: {val

plot_combined_losses_with_total(train_losses, val_losses, n_epochs_for_eval=10, n_epoch

train_and_evaluate(weight_reconstruction=alpha, weight_koopman_pred=beta, weight_phase_pr

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 12/24
30/03/2025 05:13 TP5-1.ipynb - Colab

Epoch 10/100 - Validation Losses: Reconstruction: 0.1298, Koopman Pred: 0.1082, Ph


Epoch 20/100 - Validation Losses: Reconstruction: 0.0492, Koopman Pred: 0.0435, Ph
Epoch 30/100 - Validation Losses: Reconstruction: 0.0066, Koopman Pred: 0.0178, Ph
Epoch 40/100 - Validation Losses: Reconstruction: 0.0014, Koopman Pred: 0.0139, Ph
Epoch 50/100 - Validation Losses: Reconstruction: 0.0007, Koopman Pred: 0.0128, Ph
Epoch 60/100 - Validation Losses: Reconstruction: 0.0004, Koopman Pred: 0.0121, Ph
Epoch 70/100 - Validation Losses: Reconstruction: 0.0003, Koopman Pred: 0.0116, Ph
Epoch 80/100 - Validation Losses: Reconstruction: 0.0002, Koopman Pred: 0.0111, Ph
Epoch 90/100 - Validation Losses: Reconstruction: 0.0002, Koopman Pred: 0.0107, Ph
Epoch 100/100 - Validation Losses: Reconstruction: 0.0002, Koopman Pred: 0.0101, P

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 13/24
30/03/2025 05:13 TP5-1.ipynb - Colab

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 14/24
30/03/2025 05:13 TP5-1.ipynb - Colab

By investigating the above curves we can see that the close match between training and
validation losses across all four terms suggests that the model generalizes well, with no signs
of overfitting. However we will test out the performance as well for the case where we set the
KL divergence to 0.

lamda = 0.0
n_epoch = 100

autoencoder = Autoencoder(feature_dim, hidden_layer, output_dim).to(device)


koopman_operator = KoopmanOperator(output_dim).to(device)
optimiser_autoencoder = torch.optim.Adam(autoencoder.parameters(),
lr=learning_rate_autoencoder)
optimiser_koopman = torch.optim.Adam(koopman_operator.parameters(),
lr=learning_rate_koopman)
train_and_evaluate(weight_reconstruction=alpha, weight_koopman_pred=beta, weight_phase_pr

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 15/24
30/03/2025 05:13 TP5-1.ipynb - Colab

Epoch 10/100 - Validation Losses: Reconstruction: 0.0659, Koopman Pred: 0.0517, Ph


Epoch 20/100 - Validation Losses: Reconstruction: 0.0186, Koopman Pred: 0.0083, Ph
Epoch 30/100 - Validation Losses: Reconstruction: 0.0039, Koopman Pred: 0.0034, Ph
Epoch 40/100 - Validation Losses: Reconstruction: 0.0022, Koopman Pred: 0.0017, Ph
Epoch 50/100 - Validation Losses: Reconstruction: 0.0014, Koopman Pred: 0.0010, Ph
Epoch 60/100 - Validation Losses: Reconstruction: 0.0005, Koopman Pred: 0.0007, Ph
Epoch 70/100 - Validation Losses: Reconstruction: 0.0002, Koopman Pred: 0.0005, Ph
Epoch 80/100 - Validation Losses: Reconstruction: 0.0002, Koopman Pred: 0.0004, Ph
Epoch 90/100 - Validation Losses: Reconstruction: 0.0001, Koopman Pred: 0.0003, Ph
Epoch 100/100 - Validation Losses: Reconstruction: 0.0001, Koopman Pred: 0.0002, P

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 16/24
30/03/2025 05:13 TP5-1.ipynb - Colab

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 17/24
30/03/2025 05:13 TP5-1.ipynb - Colab

As we can noitice in both cases, we don't see overfitting on the training set since the validation
loss and the training loss are close. However when we remove the KL divergence term from the
loss we get a slightly better performance since the losses on the validation set move from :
Reconstruction: 0.0002, Koopman Pred: 0.0101, Phase Pred: 0.0002, to -> Reconstruction:
0.0001, Koopman Pred: 0.0002, Phase Pred: 0.0001. Here we see that the losses without the KL
divergence are slightly better. This is due to the fact that optimizing 3 losses is easier than
optimizing 4. Further more by removing the Regularization factor we no longer enforce the a
latent structure to be close to that of a normal distribution. Thus we can say that using the
regularization is a better guarantee for a good generalization than not using it.

keyboard_arrow_down Verification

Question 4. : We want to ensure the Koopman operator is stable. This can be verified by
checking whether its spectral radius ρ(K) ≤ 1 . Plot the eigenvalues of the Koopman operator
in order to verify the bound on its spectral radius. You can use the numpy.linalg.eig function to
retrieve the eigenvalues of a matrix.

# TODO: Check Koopman stability and plot the eigen values of the Koopman operator a

n_grid = 30
x1_min, x1_max = -2, 2
x2_min, x2_max = -2, 2

array_x1 = np.linspace(x1_min, x1_max, n_grid, dtype=np.float32)


array_x2 = np.linspace(x2_min, x2_max, n_grid, dtype=np.float32)
matrix_grid_x1, matrix_grid_x2 = np.meshgrid(array_x1, array_x2)

array3d_dynamics = np.zeros((n_grid, n_grid, 2), dtype=np.float32)

for i in range(n_grid):
for j in range(n_grid):
x1 = matrix_grid_x1[i, j]
x2 = matrix_grid_x2[i, j]
array3d_dynamics[i, j, :] = duffing(np.array([x1, x2]))

# Set evaluation mode


autoencoder.eval()
koopman_operator.eval()

array3d_dynamics_pred = np.zeros((n_grid, n_grid, 2), dtype=np.float32)

for i in range(n_grid):
for j in range(n_grid):
x1 = matrix_grid_x1[i, j]
x2 = matrix grid x2[i, j]
https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 18/24
30/03/2025 05:13 TP5-1.ipynb - Colab
x2 matrix_grid_x2[i, j]
tensor2d_x = torch.tensor([[x1, x2]], dtype=torch.float32).to(device)
tensor2d_observable = autoencoder.encoder(tensor2d_x)
tensor2d_koopman_observable_next = koopman_operator(tensor2d_observable)
tensor2d_predict_x_next = autoencoder.decoder(tensor2d_koopman_observable_n
array_x_next = tensor2d_predict_x_next.cpu().detach().numpy().ravel()

# Here we compute a discretised version of the derivative thanks to the Koo


# and the learned encoder/decoder

# (x_{k+1} - x_k) / \delta_t = f(x_k) is approximated by (f is duffing here


# (Decod(K(Encod(x_k))) - x_k) / \delta_t

delta_time = (t_max / n_iter)


array3d_dynamics_pred[i, j, :] = (array_x_next - [x1, x2]) / delta_time

fig = plt.figure(figsize=(15, 5))


ax = fig.add_subplot(131)
ax.quiver(matrix_grid_x1,
matrix_grid_x2,
array3d_dynamics[:, :, 0],
array3d_dynamics[:, :, 1], scale=10)
ax.set_title('True')
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')

ax = fig.add_subplot(132)
ax.quiver(matrix_grid_x1,
matrix_grid_x2,
array3d_dynamics_pred[:, :, 0],
array3d_dynamics_pred[:, :, 1], scale=10)

ax.set_title('Prediction')
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')

# Compute the error


matrix_error = np.linalg.norm(array3d_dynamics - array3d_dynamics_pred, axis=2)
matrix_error_log = np.log10(matrix_error + 1e-10)

ax = fig.add_subplot(133)
cp = ax.contourf(matrix_grid_x1,
matrix_grid_x2,
matrix_error_log)

fig.colorbar(cp)
ax.set_title('Error in log scale')
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
plt.show()

keyboard_arrow_down Continuous in time case


https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 19/24
30/03/2025 05:13 TP5-1.ipynb - Colab

Considering xk as the observation of a state at time t = kδ , and xk+1 the state at time t + δ,
for δ → 0 it is also possible to define the continuous-time infinitesimal generator of the
Koopman operator family as
Kg(xk ) − g(xk ) g ∘ F (xk ) − xk
Lg(xk ) = lim =
δ→0 δ δ

The pevious expression defines the Lie derivative, and for this reason L is known as the Lie
operator. L describes the continuous dynamics of the observables in the Koopman space:

ġ (x) = Lg(x).

The latter can be further expressed as:


dg(x) dx
ġ (x(t)) = = ∇x g = ∇x g ⋅ f (x) = Lg(x).
dt dt

Given gθ , φρ and L ϕ three parameterized functions, the following conditions hold:

1. Reconstruction error
∥φρ (gθ (x)) − x∥ = 0

2. Prediction error in Koopman space


∥L ϕ gθ (x) − ∇gθ ⋅ f (x)∥ = 0

3. Prediction error in the phase space


∥φρ (L ϕ gθ (x)) − f (x)∥ = 0

Important Remark: As long as the system f is known, the three errors can be computed
without data belonging to trajectories.

# Create a dataset for continuous Koopman


# with the same amount of points of the Discontinuous Koopman case
# But here no need to have continuous trajectories
matrix_x0 = (np.random.rand(n_initial_conditions * (n_iter - 1), 2) - 0.5) * 4
matrix_system_derivative_data = np.zeros(matrix_x0.shape)
for i in tqdm(range(matrix_x0.shape[0])):
matrix_system_derivative_data[i, :] = duffing(matrix_x0[i, :])

fig = plt.figure(figsize=(5, 5))


ax = fig.add_subplot(111)
ax.quiver(matrix_x0[::50, 0],
matrix_x0[::50, 1],
matrix_system_derivative_data[::50, 0] * 0.2,
matrix_system_derivative_data[::50, 1] * 0.2, scale=10)
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
plt.show()

# create the models


feature_dim = 2 # dimension of the Duffing oscillator
hidden_layer = 5 # number of hidden layers in g (ENCODER) and \varphi (DECODER)
output_dim = 30 # dimension in Koopman space
batch si e 2000 # data per batch
https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 20/24
30/03/2025 05:13 TP5-1.ipynb - Colab
batch_size = 2000 # data per batch

(matrix_x_data_train,
matrix_x_data_test,
matrix_x_next_data_train,
matrix_x_next_data_test) = train_test_split(matrix_x0,
matrix_system_derivative_data,
test_size=0.2)

# Cast type to float32


matrix_x_data_train = matrix_x_data_train.astype(np.float32)
matrix_x_data_test = matrix_x_data_test.astype(np.float32)
matrix_x_next_data_train = matrix_x_next_data_train.astype(np.float32)
matrix_x_next_data_test = matrix_x_next_data_test.astype(np.float32)

print(matrix_x_data_train.shape,
matrix_x_data_test.shape,
matrix_x_next_data_train.shape,
matrix_x_next_data_test.shape)

torch_dataset_train = TensorDataset(torch.from_numpy(matrix_x_data_train),
torch.from_numpy(matrix_x_next_data_train))
torch_dataset_test = TensorDataset(torch.from_numpy(matrix_x_data_test),
torch.from_numpy(matrix_x_next_data_test))

train_dataloader = DataLoader(torch_dataset_train,
batch_size=batch_size,
shuffle=True)
test_dataloader = DataLoader(torch_dataset_test,
batch_size=batch_size,
shuffle=True)

class Encoder(nn.Module):
def __init__(self, list_layer_dim):
super().__init__()
self.list_layer_dim = list_layer_dim
self.list_FC = nn.ModuleList()
for i in range(len(self.list_layer_dim) - 1):
dim_input = self.list_layer_dim[i]
dim_output = self.list_layer_dim[i + 1]
self.list_FC.append(nn.Linear(dim_input, dim_output))

def forward(self, tensor2d_x):


for i in range(len(self.list_layer_dim) - 2):
tensor2d_x = F.elu(self.list_FC[i](tensor2d_x))
return self.list_FC[-1](tensor2d_x)

class Decoder(nn.Module):
def __init__(self, list_layer_dim):
super().__init__()
self.list_layer_dim = list_layer_dim
self.list_FC = nn.ModuleList()
for i in range(len(self.list_layer_dim) - 1, 0, -1):

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 21/24
30/03/2025 05:13 TP5-1.ipynb - Colab

dim_input = self.list_layer_dim[i]
dim_output = self.list_layer_dim[i - 1]
self.list_FC.append(nn.Linear(dim_input, dim_output))

def forward(self, tensor2d_x):


for i in range(len(self.list_layer_dim) - 2):
tensor2d_x = F.elu(self.list_FC[i](tensor2d_x))
return self.list_FC[-1](tensor2d_x)

class Autoencoder(nn.Module):
def __init__(self, feature_dim
feature_dim, hidden_layer, output_dim):
super().__init__()

list_layer_dim = \
[output_dim if i == hidden_layer
else feature_dim + i * (output_dim - feature_dim)
feature_dim // hidden_layer
for i in range(hidden_layer + 1)]

self.encoder = Encoder(list_layer_dim)
self.decoder = Decoder(list_layer_dim)

def forward(self, tensor2d_x: torch.Tensor):


tensor2d_x = self.encoder(tensor2d_x)
return self.decoder(tensor2d_x)

The Lie operator must be defined such that it will be always stable by construction. To do that,
we consider a matrix of parameters Ψ ∈ R
m×m
and a vector of parameters Γ ∈ R
m
. The
resulting Lie operator will be of the form:
T
L = (Ψ − Ψ ) − diag(|Γ|)

with eigenvalues whose real part R(λ) ≤ 0 . See


https://math.stackexchange.com/questions/952233/eigenvalues-of-the-sum-of-a-diagonal-
matrix-and-a-skew-symmetric-matrix for the mathematical proof (identify the matrix). Moreover
if λ ∈ C is an eigenvalue of L, it turns out that its real part R(λ) ∝ ∥Γ∥ , i.e. it only depends
on Γ.

Remark: −diag(|Γ|) is always a diagonal matrix with non-positive elements.

Question 4. : As you did for the discrete case, you now have to implement the LieModule
module. It should have the form indicated above to guarantee R(λ) ≤ 0 . Check that the
initialization fulfills this property.

class LieModule(nn.Module):
def __init__(self, lie_operator_dim: int):
super().__init__()
self.lie_operator_dim = lie_operator_dim
# TODO: Complete function

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 22/24
30/03/2025 05:13 TP5-1.ipynb - Colab

def forward(self, tensor2d_x: torch.Tensor):


if tensor2d_x.shape[1] != self.lie_operator_dim:
sys.exit(f'Wrong Input Features. Please use tensor'
f' with {self.koopman_operator_dim} Input Features')

# TODO: Implement forward

autoencoder = Autoencoder(feature_dim,
feature_dim hidden_layer, output_dim).to(device)
lie_operator = LieModule(output_dim).to(device)
print(autoencoder)

Some tricks are needed to train. If the autoencoder and the Lie model are learned at the same
speed, the training turns out to be highly unstable since the three loss functions have moving
targets. For this reason, the Lie learning rate has been chosen smaller than the autoencoder
one.

learning_rate_autoencoder = 0.0001
learning_rate_lie = 0.00001

optimiser_autoencoder = torch.optim.Adam(autoencoder.parameters(),
lr=learning_rate_autoencoder,
weight_decay=1e-3)
optimiser_lie = torch.optim.Adam(lie_operator.parameters(),
lr=learning_rate_lie,
weight_decay=1e-3)

A further loss is considered to stabilize the learning stage. The state x belongs to a compact
set, since it is the solution of a dissipative dynamical system. This is not true for g(x) (we need
to choose appropriate activation functions to have appropriate Liptchitz guarantees). To avoid
discrepancies in magnitudes of gi (x), a regularization loss is added:
1/2

1 1
2
μ = ∑ gi (x) = 0 and σ = ( ∑(gi (x) − μ) ) = 1
m m
m m

inspired by VAE.

For the training to be smooth, the encoder parameters are not affected by the prediction loss in
phase space. This is based on an empirical observation and is motivated by the fact that the
encoder appears in the three losses and plays a competitive role against the decoder and the
Lie model. This should not affect the results since the encoder remains coupled with the
decoder in the reconstruction loss and with the Lie operator in the prediction loss in Koopman
space.

Question 5. : Implement the loss function similarly to what you did for the Question 2. Note that
here you should use the dynamics f and its values for a set of points belonging to the domain
2
[−2, 2] while no data from proper trajectories are needed.
https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 23/24
30/03/2025 05:13 TP5-1.ipynb - Colab

# Implement the loss function here


# See the training process below to identify the different components of the loss
def loss(tensor2d_x: torch.Tensor,
tensor2d_x_next: torch.Tensor,
tensor2d_decoded_x: torch.Tensor,
tensor2d_observable: torch.Tensor,
tensor2d_lie_observable_next: torch.Tensor,
tensor2d_predict_x_next: torch.Tensor,
tensor2d_jvp: torch.Tensor):

# TODO: Implement loss


return total_loss

Since trajectories are not needed, random states can be sampled from the system manifold
,
x1 ∈ [−2, 2] x2 ∈ [−2, 2] .

n_batch = len(train_dataloader)
n_epoch = 5 # To be tuned

for epoch in range(n_epoch):


autoencoder.train()
lie_operator.train()
total_train_loss = 0
total_loss1, total_loss2, total_loss3, total_loss4 = 0, 0, 0, 0
for tensor2d_batch_x, tensor2d_batch_x_next in train_dataloader:
tensor2d_batch_x = tensor2d_batch_x.to(device)
tensor2d_batch_x_next = tensor2d_batch_x_next.to(device)

optimiser_autoencoder.zero_grad()
optimiser_lie.zero_grad()

# dgX = lie_operator * gX
# jvp = \nabla_x g (x) * f(x) (jvp: jacobian vector product)
(tensor2d_observable, tensor2d_jvp) = \
autograd.functional.jvp(autoencoder.encoder,
tensor2d_batch_x,
tensor2d_batch_x_next,
create_graph=True)

tensor2d_decoded_x = autoencoder.decoder(tensor2d_observable)

tensor2d_lie_observable_next = lie_operator(tensor2d_observable)
tensor2d_predict_x_next = autoencoder.decoder(tensor2d_lie_observable_next)

tensor_loss_val = \
loss(tensor2d_x=tensor2d_batch_x,
tensor2d_x_next=tensor2d_batch_x_next,
tensor2d_decoded_x=tensor2d_decoded_x,
tensor2d_observable=tensor2d_observable,
tensor2d_lie_observable_next=tensor2d_lie_observable_next,

https://colab.research.google.com/drive/1gp5p4hXFuHAxodjDtOLC_v7iI02kncy6#scrollTo=2e3dedc5 24/24

You might also like