Physics-informed diffusion models in spectral space

Davide Gallon^1,2, Philippe von Wurstemberger³, Patrick Cheridito⁴, and Arnulf Jentzen^5,6
¹Applied Mathematics: Institute for Analysis and Numerics,
University of Münster, Germany, e-mail: davide.gallon@uni-muenster.de
²Department of Mathematics and RiskLab, ETH Zurich,
Switzerland, e-mail: dgallon@ethz.ch
³School of Data Science, The Chinese University of Hong Kong,
Shenzhen (CUHK-Shenzhen), China, e-mail: philippevw@cuhk.edu.cn
⁴Department of Mathematics and RiskLab, ETH Zurich,
Switzerland, e-mail: patrickc@ethz.ch
⁵School of Data Science and School of Artificial Intelligence,
The Chinese University of Hong Kong, Shenzhen
(CUHK-Shenzhen), China, e-mail: ajentzen@cuhk.edu.cn
⁶Applied Mathematics: Institute for Analysis and Numerics,
University of Münster, Germany, e-mail: ajentzen@uni-muenster.de

Abstract

We propose a methodology that combines generative latent diffusion models with physics-informed machine learning to generate solutions of parametric partial differential equations (PDEs) conditioned on partial observations, which includes, in particular, forward and inverse PDE problems. We learn the joint distribution of PDE parameters and solutions via a diffusion process in a latent space of scaled spectral representations, where Gaussian noise corresponds to functions with controlled regularity. This spectral formulation enables significant dimensionality reduction compared to grid-based diffusion models and ensures that the induced process in function space remains within a class of functions for which the PDE operators are well defined. Building on diffusion posterior sampling, we enforce physics-informed constraints and measurement conditions during inference, applying Adam-based updates at each diffusion step. We evaluate the proposed approach on Poisson, Helmholtz, and incompressible Navier–Stokes equations, demonstrating improved accuracy and computational efficiency compared with existing diffusion-based PDE solvers, which are state of the art for sparse observations. Code is available at https://github.com/deeplearningmethods/PISD.

1 Introduction

Deep learning approaches for PDEs have progressed toward increasingly general representations of solution spaces: from physics-informed neural networks (PINNs) [29, 33], which approximate individual PDE instances via residual-based objectives, to neural operators [1, 16, 22, 18, 19], which learn solution maps for families of parametric PDEs, and more recently to generative models that learn distributions over PDE parameters and solutions [10, 4]. Through conditional sampling, the generative perspective naturally supports forward and inverse problems as well as reconstruction from sparse or noisy measurements, which are settings that are often ill-posed for classical solvers.

In this work, we develop such a generative framework, which we term physics-informed spectral diffusion (PISD). Our approach represents functions via scaled spectral coefficients and trains a diffusion model [8, 34, 12] in the resulting finite-dimensional latent space [30, 35]. The scaling is obtained from the data distribution and ensures that the diffusion process induced in function space remains within a class of functions for which the underlying PDE operators are well defined. In particular, we show that if the data distribution satisfies a Sobolev regularity condition, then the induced diffusion process in function space preserves this regularity. At inference time, we enforce physics-informed constraints and measurement conditions using a variant of diffusion posterior sampling (DPS) [3] with Adam-based updates [14]. We present a schematic of the PISD method in Figure˜1 and describe it in detail in Section˜3.

Refer to caption — Figure 1: Overview of physics-informed spectral diffusion (PISD): The model learns to generate function valued solutions by performing a diffusion process in a latent space of scaled spectral function representations. At inference, PDE and observation constraints are enforced via Adam-based guidance.

Contributions.

1.

We introduce PISD, a generative framework that learns joint distributions of PDE parameters and solutions using diffusion models in a scaled spectral latent space with physics-informed guidance at inference.
2.

We establish that appropriate scaling of spectral coefficients induces a diffusion process in function space with controlled Sobolev regularity, ensuring that PDE operators are well defined throughout the generative process.
3.

We demonstrate that the spectral formulation enables substantial dimensionality reduction compared to grid-based diffusion models, leading to significant computational speedups.
4.

We extend DPS with Adam-based inference-time updates and show improvements over standard gradient-based guidance.

We test our method on Poisson, Helmholtz, and incompressible Navier–Stokes equations. In our experiments, the PISD method reduces inference time by a factor of 3 to 15 relative to state-of-the-art baselines, while matching or improving accuracy by up to a factor of 10.

2 Related Work

Physics-informed diffusion models.

Several recent works combine diffusion models with physics-informed constraints. DiffusionPDE [10] and CoCoGen [11] enforce PDE constraints via residual gradients during sampling, but operate on grid-based discretizations with finite differences. In the infinite-resolution limit, the PDE residual becomes ill-defined since the Gaussian noise considered in the diffusion process converges to spatial white noise. Consequently, these methods apply PDE guidance only near the end of the reverse process when the function is already somewhat regular. We compare against DiffusionPDE in Section˜4 and show that PISD achieves significantly lower PDE residuals. Physics-informed diffusion models in [2] similarly use grid-based finite differences but enforce PDE constraints during training rather than inference. Physics-informed diffusion models in [31] address flow field reconstruction with a method closely related to DiffusionPDE. Pi-Fusion [28] also uses a physics-informed guidance term during inference, but only considers forward problems. FunDiff [36] operates in a learned latent space with a continuous vision transformer (CViT) decoder that can be differentiated at arbitrary locations. However, PDE constraints are enforced only during encoder training, not at inference.

Diffusion models for PDEs.

Other approaches use diffusion models to generate PDE solutions without physics-informed losses. FunDPS [38, 23] extends DPS to function-space-valued diffusion processes, providing a rigorous theoretical foundation for the solution of inverse problems with diffusion models in function spaces. Several methods incorporate neural operators into the denoiser architecture [25, 9, 37]. The wavelet diffusion neural operator of Hu et al. [9] considers diffusions in the space of wavelet coefficients and is thus related to our spectral approach but targets functions with abrupt changes rather than ensuring smoothness. Additional diffusion-based PDE methods that do not employ physics-informed losses or function-space formulations include [17, 32].

Diffusion models in infinite-dimensional function spaces.

A growing body of work extends diffusion models to infinite-dimensional function spaces. Spectral diffusion processes [26] formulate diffusions in the space of spectral coefficients, similar to our approach, however without the data-dependent scaling that ensures regularity throughout the diffusion process. [13] generalizes discrete-time diffusion models [8] to infinite-dimensional function spaces, explicitly considering diffusion processes over functions with prescribed Sobolev regularity. [27, 7, 21, 6, 20] develop continuous-time diffusion models in infinite-dimensional function spaces, with particular attention to consistency across discretization levels. [24] generalizes the probability-flow ODE to infinite-dimensional function spaces.

3 Method

3.1 Problem setting

We consider an abstract PDE problem given by a PDE residual functional $\mathcal{L}_{\mathrm{PDE}}\colon\mathcal{F}\to[0,\infty)$ defined on a Hilbert space $\mathcal{F}$ . Our goal is to generate samples $f\in\mathcal{F}$ satisfying $\mathcal{L}_{\mathrm{PDE}}(f)=0$ , typically conditioned on additional constraints such as boundary conditions or sparse measurements. To this end, we assume a prior distribution $\nu\colon\mathcal{B}(\mathcal{F})\to[0,1]$ supported on PDE solutions, meaning that

\begin{split}\nu(\{f\in\mathcal{F}\colon\mathcal{L}_{\mathrm{PDE}}(f)=0\})=1.\end{split}

(1)

In addition, we denote by $X_{\mathrm{data}}\in\mathcal{F}$ a random variable which is distributed according to $\nu$ .

Example 3.1 (Poisson equation).

To fix ideas, we briefly specify the problem setting in the case of a 2-dimensional Poisson equation. For this, let $\mathcal{U}=H^{2}_{0}([0,1]^{2})$ , $\mathcal{A}=L^{2}([0,1]^{2})$ , let $O\colon\mathcal{A}\to\mathcal{U}$ be the solution operator which assigns to all $a\in\mathcal{A}$ the solution $u\in\mathcal{U}$ of the Poisson equation

\Delta u=a\quad\text{on}\quad[0,1]^{2}

(2)

with zero boundary conditions, and let $A_{\mathrm{data}}\in\mathcal{A}$ be a random variable. We then choose $\mathcal{F}=\mathcal{U}\times\mathcal{A}$ , $X_{\mathrm{data}}=(O(A_{\mathrm{data}}),A_{\mathrm{data}})$ and define for all $(u,a)\in\mathcal{F}$ that

\mathcal{L}_{\mathrm{PDE}}((u,a))=\|\Delta u-a\|^{2}_{L^{2}}.

(3)

3.2 Diffusion model in spectral space

To approximately generate function samples from $\nu$ , we suggest to use a diffusion model over a finite-dimensional spectral encoding $\mathcal{E}\colon\mathcal{F}\to\mathbb{R}^{l}$ of functions in $\mathcal{F}$ . In all our experiments, the function $\mathcal{E}$ will correspond to a suitably truncated and normalized Fourier transform (cf. Section˜3.4 below). We denote by $\mathcal{I}\colon\mathbb{R}^{l}\to\mathcal{F}$ the inverse transform of $\mathcal{E}$ for which we have for all $f\in\mathcal{F}$ that

\mathcal{I}(\mathcal{E}(f))\approx f.

(4)

Following [12], we train a denoiser $D\colon\mathbb{R}^{\mathfrak{d}}\times\mathbb{R}^{l}\times(0,\sigma_{\mathrm{max}}]\to\mathbb{R}^{l}$ with $\mathfrak{d}$ trainable parameters to approximate for all noise levels $\sigma\in(0,\sigma_{\mathrm{max}}]$ that

\begin{split}D_{\theta^{\ast}}(\hat{X}_{\mathrm{data}}+\sigma N,\sigma)\approx\hat{X}_{\mathrm{data}},\end{split}

(5)

where $\hat{X}_{\mathrm{data}}=\mathcal{E}(X_{\mathrm{data}})$ is the spectral encoding of the data $X_{\mathrm{data}}$ and $N\sim\mathcal{N}(0,I_{l})$ is a Gaussian random variable independent of $X_{\mathrm{data}}$ . For a suitable noise schedule $\sigma\colon[0,T]\to(0,\sigma_{\mathrm{max}}]$ , we then expect reverse-time solutions of the ODE

\begin{split}\mathrm{d}\hat{x}_{t}=-\dot{\sigma_{t}}(\sigma_{t})^{-1}(D_{\theta^{\ast}}(\hat{x}_{t},\sigma_{t})-\hat{x}_{t})\,\mathrm{d}t\end{split}

(6)

for $t\in[0,T]$ with $\hat{x}_{T}\sim\mathcal{N}(0,\sigma_{\mathrm{max}}^{2}I_{l})$ to gradually remove noise so that the initial value $\hat{x}_{0}\overset{d}{\approx}\hat{X}_{\mathrm{data}}$ is approximately distributed like the spectral encoding of the data $\hat{X}_{\mathrm{data}}$ . With ˜4 it thus follows that $\mathcal{I}(\hat{x}_{0})$ is approximately distributed according to $\nu$ .

3.3 Physics-informed and measurement guidance

In the next step, we add two guidance terms to the ODE in ˜6 to, first, help the model enforce the PDE conditions, and, second, generate samples conditioned on partial measurements. More specifically, for a measurement operator $\mathcal{M}\colon\mathcal{F}\to\mathbb{R}^{m}$ and a given measurement $y\in\mathbb{R}^{m}$ , we want to ensure that

\begin{split}\mathcal{M}(\mathcal{I}(\hat{x}_{0}))=y\qquad\text{and}\qquad\mathcal{L}_{\mathrm{PDE}}(\mathcal{I}(\hat{x}_{0}))=0.\end{split}

(7)

For this we use the DPS technique developed in [3, 10] which enables weak enforcement of guidance conditions in diffusion models by adding forcing terms to the backward diffusion process in the direction of the gradient of the target quantities. Applying this to the reverse-time ODE in ˜6 with the conditions in ˜7, and introducing guidance weights $\lambda_{\mathrm{obs}},\lambda_{\mathrm{PDE}}\in[0,\infty)$ , we obtain the guided reverse-time ODE:

\begin{split}\mathrm{d}\hat{x}_{t}=&\Bigg(-\dot{\sigma_{t}}(\sigma_{t})^{-1}(D_{\theta^{\ast}}(\hat{x}_{t},\sigma_{t})-\hat{x}_{t})+\lambda_{\mathrm{obs}}\nabla_{\hat{x}_{t}}\Big[\lVert y-\mathcal{M}(\mathcal{I}(D_{\theta^{\ast}}(\hat{x}_{t},\sigma_{t})))\rVert^{2}\Big]\\ &+\lambda_{\mathrm{PDE}}\nabla_{\hat{x}_{t}}\Big[\mathcal{L}_{\mathrm{PDE}}(\mathcal{I}(D_{\theta^{\ast}}(\hat{x}_{t},\sigma_{t})))\Big]\Bigg)\mathrm{d}t\end{split}

(8)

for $t\in[0,T]$ with $\hat{x}_{T}\sim\mathcal{N}(0,\sigma_{\mathrm{max}}^{2}I_{l})$ . To obtain a concrete algorithm from ˜8, it remains to discretize the ODE in reverse time. We observe that, under a reverse-time Euler discretization of the ODE in ˜8, the contributions of the guidance terms correspond to gradient descent steps. Motivated by this, we suggest to replace these gradient descent updates by a more advanced gradient-based optimizer such as Adam [14]. We show empirically that this leads to significantly better results than standard gradient descent (see Appendix˜C). We present the resulting PISD method in Algorithm˜1.

Algorithm 1 Physics-informed diffusion model in spectral space

Training Phase

PDE loss functional

\mathcal{L}_{\mathrm{PDE}}\colon\mathcal{F}\to[0,\infty)

, data set

X_{1},\ldots,X_{M}\in\mathcal{F}

with

\forall\,i\in\{1,\ldots,M\}\colon\mathcal{L}_{\mathrm{PDE}}(X_{i})=0

Choose

\mathcal{E}\colon\mathcal{F}\to\mathbb{R}^{l}

\mathcal{I}\colon\mathbb{R}^{l}\to\mathcal{F}

based on

X_{1},\ldots,X_{M}

(cf. Section˜3.4)

Choose denoiser

D\colon\mathbb{R}^{\mathfrak{d}}\times\mathbb{R}^{l}\times(0,\sigma_{\mathrm{max}}]\to\mathbb{R}^{l}

\theta^{\ast}\leftarrow\underset{\theta\in\mathbb{R}^{\mathfrak{d}}}{\operatorname{argmin}}\,\mathbb{E}\Big[\lVert D_{\theta}(\mathcal{E}(X_{i})+\sigma N,\sigma)-\mathcal{E}(X_{i})\rVert^{2}\Big]

with

(i,\sigma,N)\sim\mathcal{U}_{\{1,\ldots,M\}}\times\mathcal{U}_{[0,\sigma_{\mathrm{max}}]}\times\mathcal{N}(0,I_{l})

Inference/Sampling Phase

Number of steps

N\in\mathbb{N}

, noise schedule

(\sigma_{n})_{n\in\{0,\ldots,N\}}\subseteq(0,\sigma_{\mathrm{max}}]

, measurement operator

\mathcal{M}\colon\mathcal{F}\to\mathbb{R}^{m}

, measurement

y\in\mathbb{R}^{m}

, guidance weights

\lambda_{\mathrm{obs}},\lambda_{\mathrm{PDE}}\in[0,\infty)

\text{ADAM}\leftarrow

initialize Adam optimizer on

\mathbb{R}^{l}

\hat{x}\sim\mathcal{N}(0,\sigma_{\mathrm{max}}^{2}I_{l})

for

n=1

N

G_{\rm obs}\leftarrow\nabla_{\hat{x}}\Big[\lVert y-\mathcal{M}(\mathcal{I}(D_{\theta^{\ast}}(\hat{x},\sigma_{n})))\rVert^{2}\Big]

G_{\rm pde}\leftarrow\nabla_{\hat{x}}\Big[\mathcal{L}_{\mathrm{PDE}}(\mathcal{I}(D_{\theta^{\ast}}(\hat{x},\sigma_{n})))\Big]

\hat{x}\leftarrow\hat{x}-(\sigma_{n})^{-1}(D_{\theta^{\ast}}(\hat{x},\sigma_{n})-\hat{x})(\sigma_{n-1}-\sigma_{n})

\hat{x}\leftarrow\hat{x}-\text{ADAM}(\lambda_{\mathrm{obs}}G_{\rm obs}+\lambda_{\mathrm{PDE}}G_{\rm pde})

end for

Return

\mathcal{I}(\hat{x})

3.4 Spectral encoding

In this section we specify the encoding $\mathcal{E}$ introduced in Section˜3.2. Our approach is based on truncated spectral representations of functions in $\mathcal{F}$ , combined with a frequency-wise normalization determined from the data distribution. Spectral representations are natural in the PDE context, as they allow for explicit evaluation of differential operators.

The normalization plays a central role in our method and is therefore treated as part of the encoding rather than as a standard preprocessing step. Unlike conventional data normalization, our scaling is applied in latent space rather than in physical space, is performed independently for each spectral coefficient, and, because the latent variables correspond to Fourier modes, induces a form of regularization in function space. As we show below, this scaling ensures that Gaussian noise in the latent space corresponds to functions with controlled regularity, which is essential to ensure that PDE operators are well defined throughout the diffusion process.

For concreteness, we present an encoding for the case $\mathcal{F}=L^{2}(\mathbb{T}^{d})$ , where $\mathbb{T}=\mathbb{R}/\mathbb{Z}$ denotes the torus. When $\mathcal{F}$ consists of tuples of functions, as in Example˜3.1, we apply our encoding approach component-wise and concatenate the results. We consider complex-valued functions and Fourier series here for ease of presentation; in our experiments we sometimes also work with real-valued functions using sine or cosine series.

Let $(\phi_{n})_{n\in\mathbb{Z}^{d}}\subseteq L^{2}(\mathbb{T}^{d})$ be the Fourier basis on $\mathbb{T}^{d}$ , given for all $n\in\mathbb{Z}^{d}$ by

\forall\,x\in\mathbb{T}^{d}\colon\quad\phi_{n}(x)=e^{2\pi i\langle n,x\rangle}

(9)

and for all $f\in L^{2}(\mathbb{T}^{d})$ , $n\in\mathbb{Z}^{d}$ let $\hat{f}(n)=\langle f,\phi_{n}\rangle_{L^{2}(\mathbb{T}^{d})}$ denote the $n$ -th Fourier coefficient of $f$ . To normalize the spectral representation, we define for each $n\in\mathbb{Z}^{d}$ the standard deviation

\textstyle(s_{n})^{2}=\mathrm{Var}\big(\widehat{X_{\mathrm{data}}}(n)\big).

(10)

Given a truncation set $K\subset\mathbb{Z}^{d}$ with $|K|=l$ , we define the encoding $\mathcal{E}\colon\mathcal{F}\to\mathbb{C}^{K}\cong\mathbb{C}^{l}$ and its inverse $\mathcal{I}\colon\mathbb{C}^{K}\to\mathcal{F}$ for all $f\in\mathcal{F}$ , $k\in K$ , $\alpha\in\mathbb{C}^{K}$ by

\textstyle\mathcal{E}(f)(k)=\frac{\hat{f}(k)}{s_{k}}\quad\text{and}\quad\mathcal{I}(\alpha)=\sum_{n\in K}s_{n}\alpha(n)\phi_{n}.

(11)

The following lemma shows that this scaling ensures that Gaussian noise in the latent space corresponds to a suitably regular function in $\mathcal{F}$ , provided the data exhibits the same regularity.

Lemma 3.2.

Let $k\in\mathbb{N}$ , assume $\mathbb{E}\big[\lVert X_{\mathrm{data}}\rVert_{H^{k}(\mathbb{T}^{d})}^{2}\big]<\infty$ , let $(Z_{n})_{n\in\mathbb{Z}^{d}}$ be i.i.d. $\mathcal{N}(0,1)$ random variables, and let $N\in L^{2}(\mathbb{T}^{d})$ be the random function given by

\textstyle N=\sum_{n\in\mathbb{Z}^{d}}s_{n}Z_{n}\phi_{n}.

(12)

Then $\mathbb{E}\big[\lVert N\rVert_{H^{k}(\mathbb{T}^{d})}^{2}\big]<\infty$ .

Lemma˜3.2 is proven in Appendix˜A. Lemma˜3.2 implies that if the data distribution has finite second moments in the Sobolev space $H^{k}(\mathbb{T}^{d})$ , then Gaussian noise in the latent space induces a random function with the same Sobolev regularity. Consequently, the forward process in PISD, obtained by gradually adding Gaussian noise in latent space, remains within $H^{k}(\mathbb{T}^{d})$ . This ensures that the PDE operators used in the PISD guidance are well defined throughout the generative process.

In contrast, the forward process of grid-based diffusion models in physical space converges, in the limit of infinite resolution, to spatial white noise, which is not differentiable. As a result, the methods proposed in [10, 31] apply PDE guidance only after or during the final $10\%$ of the reverse process, whereas PISD enforces PDE constraints throughout. Figure˜2 illustrates this difference, and our numerical results show that enforcing PDE constraints throughout enables PISD to achieve much lower PDE residuals than DiffusionPDE (cf. Section˜4.2).

Remark 3.3 (Choice of truncation set).

The truncation set $K$ determines the dimension of the latent space and hence strongly influences inference time. The canonical choice is a cube $K=\{n\in\mathbb{Z}^{d}:\lVert n\rVert_{\infty}\leq c\}$ (cf. Figure˜4) for some constant $c\in(0,\infty)$ . To reduce dimensionality, we also experiment with approximate hyperbolic truncation $K=\{n\in\mathbb{Z}^{d}:|n_{1}\cdots n_{d}|\leq c\}$ (cf. Figure˜3).

4 Numerical results

In this section we showcase the performance of PISD on the Poisson, Helmholtz, and Navier–Stokes equations and compare it to the state-of-the-art methods DiffusionPDE [10] and FunDPS [38].

4.1 Implementation details

Dataset and training.

All datasets consist of solutions and coefficients generated from the target PDEs at a resolution of $128\times 128$ . As denoiser architecture we use a Vision Transformer (ViT) [15] that we train based on ˜5. We have experimented with other architectures which all yield comparable performance.

Adam guidance.

During inference, we enforce PDE and observation constraints using DPS [3], replacing the standard gradient descent updates with Adam [14] (cf. Section˜3.3). We find that maintaining optimizer state across diffusion steps is critical for stable and accurate PDE-constrained generation; a comparison with standard DPS in Appendix˜C shows that Adam-based guidance yields significantly better results, particularly for inverse problems.

Comparison with other methods.

We compare PISD to the state-of-the-art diffusion-based methods DiffusionPDE [10] and FunDPS [38]. Neural-operator-based methods (e.g., PINO, FNO, DeepONet) are not included in the quantitative comparisons, as prior studies, cf. [10, 38], have shown that they perform poorly in sparse observation regimes and are not designed to handle the settings considered here.

4.2 Poisson and Helmholtz equations

Following [10, 38], we first consider partial differential equations posed on a bounded domain with homogeneous Dirichlet boundary conditions. Let $\Omega=(0,1)^{2}$ and denote by $\partial\Omega$ its boundary. We consider the Poisson equation

\Delta u(x)=a(x),\quad x\in\Omega,\qquad u(x)=0,\quad x\in\partial\Omega

(13)

and the Helmholtz equation

\begin{gathered}\Delta u(x)+u(x)=a(x),\quad x\in\Omega,\qquad u(x)=0,\quad x\in\partial\Omega.\end{gathered}

(14)

As in Example˜3.1, we want to generate functions from the space $\mathcal{F}=H^{2}_{0}(\Omega)\times L^{2}(\Omega)$ . As PDE residual for the Poisson equation we consider for all $(u,a)\in\mathcal{F}$ that

\mathcal{L}_{\mathrm{PDE}}(u,a)=\|\Delta u-a\|^{2}_{L^{2}(\Omega)}.

(15)

and for the Helmholtz equation we consider for all $(u,a)\in\mathcal{F}$ that

\mathcal{L}_{\mathrm{PDE}}(u,a)=\|\Delta u+u-a\|^{2}_{L^{2}(\Omega)}.

(16)

To automatically enforce the Dirichlet boundary conditions, we base our encodings of $u\in H^{2}_{0}(\Omega)$ for the PISD method on the sine transform given for all $f\colon\Omega\to\mathbb{R}$ , $k=(k_{1},k_{2})\in\mathbb{N}^{2}$ by

\hat{f}(k)=\int_{\Omega}f(x)\sin(\pi k_{1}x_{1})\sin(\pi k_{2}x_{2})\mathrm{d}(x_{1},x_{2})

(17)

with the corresponding inverse transform $\mathcal{I}$ based on the sine series given for all $\alpha\in\mathbb{R}^{\mathbb{N}^{2}}$ by

\sum_{k\in\mathbb{N}^{2}}\alpha(k)\sin(\pi k_{1}x_{1})\sin(\pi k_{2}x_{2}).

(18)

The encoding for $a\in L^{2}(\Omega)$ is also based on a sine transform. Since $a$ does not satisfy zero boundary conditions, we first extend it smoothly to a larger domain on which the extended function vanishes at the boundary, then apply the sine transform. The inverse transform is obtained by evaluating the sine series and restricting to $\Omega$ . To compute the PDE residual at inference time, we use the formula

\begin{split}\forall\,k\in\mathbb{N}^{2}\colon\quad\widehat{\Delta u}(k)=-\pi^{2}\lVert k\rVert_{2}^{2}\hat{u}(k)\end{split}

(19)

which can be conveniently computed in terms of the latent coefficients corresponding to $u$ .

Table 1: Forward problem, sparse observations on

a

PDE	Obs.	PISD (ours)		DiffusionPDE		FunDPS
		Rel. err	PDE res.	Rel. err	PDE res.	Rel. err	PDE res.
Poisson	500	3.08 $\pm$ 1.71 %	0.87	4.06 $\pm$ 1.51 %	237.49	2.23 $\pm$ 1.50 %	4619.34
	1000	1.47 $\pm$ 0.90 %	2.31	3.35 $\pm$ 0.99 %	207.28	1.54 $\pm$ 1.05 %	3807.58
	Full	0.05 $\pm$ 0.03 %	3.78	4.04 $\pm$ 1.50 %	190.70	0.87 $\pm$ 0.44 %	3338.32
Helmholtz	500	3.47 $\pm$ 1.70 %	0.37	9.55 $\pm$ 4.16 %	4852.36	2.08 $\pm$ 0.98 %	3316.68
	1000	1.59 $\pm$ 0.77 %	0.52	7.46 $\pm$ 2.76 %	4690.36	1.53 $\pm$ 0.88 %	3307.42
	Full	0.04 $\pm$ 0.01 %	3.97	8.25 $\pm$ 3.65 %	5600.25	1.14 $\pm$ 0.81 %	2714.46

Table 2: Inverse problem, sparse observations on

u

PDE	Obs.	PISD (ours)		DiffusionPDE		FunDPS
		Rel. err	PDE res.	Rel. err	PDE res.	Rel. err	PDE res.
Poisson	500	13.81 $\pm$ 3.11 %	0.45	22.17 $\pm$ 6.61 %	178.46	21.09 $\pm$ 7.10 %	587.99
	1000	12.09 $\pm$ 2.68 %	0.44	18.14 $\pm$ 6.04 %	203.62	20.47 $\pm$ 6.79 %	460.62
	Full	7.95 $\pm$ 1.71 %	1.33	14.03 $\pm$ 4.31 %	190.10	19.84 $\pm$ 0.65 %	429.90
Helmholtz	500	12.76 $\pm$ 2.42 %	0.21	19.33 $\pm$ 5.82 %	8916.58	16.26 $\pm$ 4.46 %	1933.61
	1000	11.19 $\pm$ 1.97 %	0.20	17.03 $\pm$ 5.08 %	13303.46	14.93 $\pm$ 3.90 %	2036.62
	Full	9.03 $\pm$ 2.11 %	0.84	15.23 $\pm$ 4.73 %	19010.48	13.97 $\pm$ 3.60 %	664.21

Table 3: Sparse observations on both

a

and

u

PDE	Obs.		PISD (ours)		DiffusionPDE		FunDPS
			Rel. err	PDE res.	Rel. err	PDE res.	Rel. err	PDE res.
Poisson	100	$a$	18.10 $\pm$ 4.57 %	1.87	18.27 $\pm$ 5.25 %	227.30	23.04 $\pm$ 8.64 %	4684.91
	100	$u$	1.19 $\pm$ 0.58 %	1.87	1.28 $\pm$ 0.62 %	227.30	2.36 $\pm$ 1.35 %	4684.91
	200	$a$	13.35 $\pm$ 3.61 %	2.03	13.96 $\pm$ 4.27 %	249.15	16.00 $\pm$ 5.41 %	4085.06
	200	$u$	0.46 $\pm$ 0.22 %	2.03	0.62 $\pm$ 0.28 %	249.15	1.23 $\pm$ 0.63 %	4085.06
Helmholtz	100	$a$	20.07 $\pm$ 4.33 %	0.55	18.30 $\pm$ 5.65 %	12570.00	22.69 $\pm$ 7.62 %	4850.78
	100	$u$	1.18 $\pm$ 0.47 %	0.55	1.48 $\pm$ 0.61 %	12570.00	2.33 $\pm$ 0.94 %	4850.78
	200	$a$	16.05 $\pm$ 3.27 %	0.56	14.20 $\pm$ 4.08 %	11470.51	16.37 $\pm$ 4.64 %	4596.16
	200	$u$	0.50 $\pm$ 0.20 %	0.56	1.07 $\pm$ 0.30 %	11470.51	1.50 $\pm$ 0.56 %	4596.16

Results.

The trained models are applied to three problem classes. Table˜1 reports results for forward problems, where the solution is inferred from sparse or full observations of the coefficient $a$ . Table˜2 reports results for inverse problems, where the coefficient is inferred from sparse or full observations of the solution $u$ . Table˜3 addresses joint reconstruction, where both the coefficient and solution are recovered from sparse observations on both the solution and the coefficient. The results for the forward and inverse problem are averaged over 100 independent runs and the results for the joint reconstruction problem are averaged over 50 independent runs. In the forward problem, PISD achieves performance comparable to DiffusionPDE and FunDPS across all observation levels, and becomes more accurate as the number of observations increases. In the inverse problem and joint reconstruction problem, PISD consistently outperforms DiffusionPDE and FunDPS across all observation regimes.

Beyond matching or improving accuracy, PISD has a significantly faster inference time compared to DiffusionPDE and FunDPS due to the reduced dimensionality of the spectral latent space. From a spatial resolution of $128\times 128$ , we retain only $44\times 44$ modes, reducing inference time on a GeForce RTX 2080 Ti from 802 seconds (DiffusionPDE) and 152 seconds (FunDPS) to approximately 52 seconds (Table˜7).

Across all tasks, PISD yields significantly lower PDE residuals than the baselines (cf. Figure˜5 for an illustration of the PDE residuals). This suggests that the remaining error in PISD is dominated by the inherent uncertainty of the ill-posed problem under sparse observations, rather than by its inability to satisfy the PDE. This interpretation is supported by the observation that PISD’s accuracy advantage over the baselines is most pronounced when more observations are available and the problem becomes less ill-posed.

4.3 Navier–Stokes equations (Unbounded Domain)

We consider the incompressible Navier–Stokes equations in vorticity form on $\Omega=\mathbb{T}^{2}=\mathbb{R}^{2}/\mathbb{Z}^{2}$ with periodic boundary conditions:

\begin{gathered}\partial_{t}w(x,\tau)+v(x,\tau)\cdot\nabla w(x,\tau)=\nu\,\Delta w(x,\tau)+q(x),\qquad\nabla\cdot v(x,\tau)=0,\quad x\in\Omega,\;\tau\in(0,T].\end{gathered}

(20)

Here $v=(v_{1},v_{2})\colon\Omega\to\mathbb{R}^{2}$ denotes the velocity field, $w=\nabla\times v\colon\Omega\to\mathbb{R}$ the vorticity, $\nu\in(0,1)$ the kinematic viscosity, and $q\colon\Omega\to\mathbb{R}$ , $q(x_{1},x_{2})=0.1(\sin(2\pi(x_{1}+x_{2}))+\cos(2\pi(x_{1}+x_{2})))$ , is a fixed source term.

We discretize the time domain into $N=10$ steps $0=t_{1}<t_{2}<\cdots<t_{N}=T$ and aim to generate the vorticity field $w$ on those time steps. As generation space for the PISD method we choose $\mathcal{F}=(H^{2}(\mathbb{T}^{2}))^{N}$ , representing the vorticity at each time step. The encoding $\mathcal{E}$ and its inverse $\mathcal{I}$ are based on the complex Fourier transform applied to each time step separately as described in Section˜3.4 (see in particular ˜9, 10 and 11).

The Biot–Savart law relates the velocity and vorticity fields via their Fourier coefficients: for all $\tau\in(0,T]$ , $k\in\mathbb{Z}^{2}\setminus\{0\}$ we have that

\begin{split}\widehat{v_{1}(\cdot,\tau)}(k)=i\,\frac{k_{2}}{\lVert k\rVert^{2}}\,\widehat{w(\cdot,\tau)}(k),\qquad\widehat{v_{2}(\cdot,\tau)}(k)=-i\,\frac{k_{1}}{\lVert k\rVert^{2}}\,\widehat{w(\cdot,\tau)}(k).\end{split}

(21)

We denote by $\mathcal{V}\colon H^{2}(\mathbb{T}^{2})\to H^{3}(\mathbb{T}^{2})$ the corresponding operator mapping vorticity to velocity.

The PDE residual is formulated using finite differences in time. For all $w=(w_{1},\ldots,w_{N})\in\mathcal{F}$ we define that

\begin{split}&\mathcal{L}_{\mathrm{PDE}}(w)=\textstyle\sum\limits_{i=2}^{N-1}\Big\lVert\frac{w_{i+1}-w_{i-1}}{t_{i+1}-t_{i-1}}-\mathcal{V}(w_{i})\cdot\nabla w_{i}-\nu\Delta w_{i}-q\Big\rVert_{L^{2}(\mathbb{T}^{2})}^{2}\\ &=\textstyle\sum\limits_{i=2}^{N-1}\textstyle\sum\limits_{k\in\mathbb{Z}^{2}\setminus\{0\}}\big\lvert\tfrac{\widehat{w_{i+1}}(k)-\widehat{w_{i-1}}(k)}{t_{i+1}-t_{i-1}}-\widehat{\mathcal{V}(w_{i})\cdot\nabla w_{i}}(k)-\nu\lVert k\rVert_{2}^{2}\widehat{w_{i}}(k)-\widehat{q}(k)\big\rvert^{2}.\end{split}

(22)

During inference, we evaluate the PDE residual using the latter equation which is again conveniently expressed in terms of the latent coefficients, except for the nonlinear advection terms $\widehat{\mathcal{V}(w_{i})\cdot\nabla w_{i}}(k)$ which are computed via a pseudo-spectral method: the spatial derivative of $\nabla w_{i}$ is computed in explicitly in Fourier space and the Fourier coefficients of $\mathcal{V}(w_{i})$ are computed using the Biot–Savart law in ˜21, both $\mathcal{V}(w_{i})$ and $\nabla w_{i}$ are transformed to physical space for pointwise multiplication, and the result is then transformed back to Fourier space.

Results.

We apply the trained model to generate the full spatio-temporal evolution of the vorticity field at time steps $t_{1},\ldots,t_{10}$ , conditioned on sparse observations at various times. Results are reported in Table˜4 and averaged over 50 independent runs. Unlike DiffusionPDE and FunDPS, which generate only initial or terminal states, our formulation naturally supports interpolation and conditioning on initial, final, or intermediate observations within a single unified framework. PISD is particularly well suited for such temporal conditioning due to its fast inference: from a spatial resolution of $128\times 128$ , we retain only $32\times 32$ modes per time step.

Table 4: Navier–Stokes with sparse-in-time observations.

Obs.	Time	Data	PISD (ours)		DiffusionPDE	FunDPS
			Rel. err	PDE res.	Rel. err	Rel. err
500	$t_{1}$	$\checkmark$	4.76 $\pm$ 0.65 %	–	5.31 $\pm$ 0.61 %	5.88 $\pm$ 1.09 %
	$t_{2}$	–	3.78 $\pm$ 0.53 %	0.21	–	–
	$t_{3}$	–	3.29 $\pm$ 0.37 %	0.11	–	–
	$t_{4}$	–	3.21 $\pm$ 0.40 %	0.08	–	–
	$t_{5}$	–	3.20 $\pm$ 0.47 %	0.06	–	–
	$t_{6}$	–	3.11 $\pm$ 0.52 %	0.07	–	–
	$t_{7}$	–	2.59 $\pm$ 0.44 %	0.09	–	–
	$t_{8}$	–	2.09 $\pm$ 0.31 %	0.12	–	–
	$t_{9}$	–	1.53 $\pm$ 0.30 %	0.26	–	–
	$t_{10}$	$\checkmark$	0.24 $\pm$ 0.07 %	–	0.51 $\pm$ 0.05 %	0.30 $\pm$ 0.07 %
500	$t_{1}$	–	4.03 $\pm$ 0.39 %	–	–	–
	$t_{2}$	–	2.46 $\pm$ 0.31 %	0.13	–	–
	$t_{3}$	–	1.99 $\pm$ 0.26 %	0.10	–	–
	$t_{4}$	$\checkmark$	0.84 $\pm$ 0.24 %	0.09	–	–
	$t_{5}$	–	1.22 $\pm$ 0.25 %	0.09	–	–
	$t_{6}$	–	1.07 $\pm$ 0.20 %	0.09	–	–
	$t_{7}$	$\checkmark$	0.38 $\pm$ 0.12 %	0.11	–	–
	$t_{8}$	–	1.64 $\pm$ 0.33 %	0.10	–	–
	$t_{9}$	–	1.55 $\pm$ 0.31 %	0.13	–	–
	$t_{10}$	–	2.39 $\pm$ 0.62 %	–	–	–
200	$t_{1}$	$\checkmark$	8.82 $\pm$ 1.36 %	–	10.09 $\pm$ 1.57 %	10.83 $\pm$ 1.83 %
	$t_{2}$	–	7.13 $\pm$ 1.07 %	0.12	–	–
	$t_{3}$	–	6.06 $\pm$ 0.89 %	0.20	–	–
	$t_{4}$	–	5.24 $\pm$ 0.80 %	0.14	–	–
	$t_{5}$	–	4.57 $\pm$ 0.69 %	0.10	–	–
	$t_{6}$	–	4.02 $\pm$ 0.66 %	0.11	–	–
	$t_{7}$	–	3.38 $\pm$ 0.58 %	0.15	–	–
	$t_{8}$	–	2.70 $\pm$ 0.52 %	0.18	–	–
	$t_{9}$	–	2.16 $\pm$ 0.45 %	0.10	–	–
	$t_{10}$	$\checkmark$	1.63 $\pm$ 0.45 %	–	1.77 $\pm$ 0.45 %	2.51 $\pm$ 0.57 %

5 Conclusion

We introduced physics-informed spectral diffusion (PISD), a generative framework for parametric PDEs that operates in a latent space of scaled spectral coefficients and enforces physics-informed constraints during inference using Adam-based updates. By normalizing spectral coefficients according to the data distribution, PISD ensures that the diffusion process remains within a class of functions with controlled Sobolev regularity, allowing PDE guidance throughout the sampling process and yielding significantly lower PDE residuals than existing methods.

Across forward and inverse problems for Poisson, Helmholtz, and Navier–Stokes equations, PISD matches or outperforms existing diffusion-based PDE solvers in reconstruction accuracy while significantly reducing inference time, achieving roughly a 3× speedup compared to FunDPS and a 15× speedup compared to DiffusionPDE.

Overall, PISD provides a physically grounded and computationally efficient approach for generative PDE modeling.

6 Limitations and future work

PISD relies on a spectral representation that must be specified for each PDE, and is naturally suited to regular domains with periodic, homogeneous Dirichlet, or Neumann boundary conditions. Extending the approach to irregular geometries or more general boundary conditions would require techniques such as domain decomposition or alternative function bases. A promising direction for future work is to replace hand-crafted spectral encodings with learned encoder-decoder pairs, for instance using neural operators as in FunDiff [36], which could enable automatic adaptation to diverse problem settings. Additionally, the current framework requires a dataset of solution fields for training. Extending to low-data regimes is an important direction for future work.

Acknowledgments

This work has been partially funded by the National Science Foundation of China (NSFC) under grant number W2531010. Calculations (or parts of them) for this publication were performed on the HPC cluster PALMA II of the University of Münster, subsidised by the DFG (INST 211/667-1). Financial support from Swiss National Science Foundation Grant 10003723 is gratefully acknowledged. Moreover, we gratefully acknowledge the Cluster of Excellence EXC 2044-390685587, Mathematics Münster: Dynamics-Geometry-Structure funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation).

References

[1] Anandkumar, A., Azizzadenesheli, K., Bhattacharya, K., Kovachki, N., Li, Z., Liu, B., and Stuart, A. Neural operator: Graph kernel network for partial differential equations. In ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations (2019).
[2] Bastek, J.-H., Sun, W., and Kochmann, D. M. Physics-informed diffusion models. arXiv:2403.14404 (2024).
[3] Chung, H., Kim, J., Mccann, M. T., Klasky, M. L., and Ye, J. C. Diffusion posterior sampling for general noisy inverse problems. In International Conference on Learning Representations (2023).
[4] Ciftci, K., and Hackl, K. A physics-informed gan framework based on model-free data-driven computational mechanics. Computer Methods in Applied Mechanics and Engineering 424 (2024), 116907.
[5] Einsiedler, M., and Ward, T. Functional Analysis, Spectral Theory, and Applications. Springer International Publishing, 2017.
[6] Franzese, G., Corallo, G., Rossi, S., Heinonen, M., Filippone, M., and Michiardi, P. Continuous-time functional diffusion processes. In Thirty-seventh Conference on Neural Information Processing Systems (2023).
[7] Hagemann, P., Mildenberger, S., Ruthotto, L., Steidl, G., and Yang, N. T. Multilevel diffusion: Infinite dimensional score-based diffusion models for image generation. arXiv:2303.04772 (Mar. 2023).
[8] Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (2020), H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Curran Associates, Inc., pp. 6840–6851.
[9] Hu, P., Wang, R., Zheng, X., Zhang, T., Feng, H., Feng, R., Wei, L., Wang, Y., Ma, Z.-M., and Wu, T. Wavelet diffusion neural operator. arXiv:2412.04833 (Dec. 2024).
[10] Huang, J., Yang, G., Wang, Z., and Park, J. J. Diffusionpde: Generative pde-solving under partial observation. Advances in Neural Information Processing Systems 37 (2024), 130291–130323.
[11] Jacobsen, C., Zhuang, Y., and Duraisamy, K. Cocogen: Physically consistent and conditioned score-based generative models for forward and inverse problems. SIAM Journal on Scientific Computing 47, 2 (2025), C399–C425.
[12] Karras, T., Aittala, M., Aila, T., and Laine, S. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems (2022), S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35, Curran Associates, Inc., pp. 26565–26577.
[13] Kerrigan, G., Ley, J., and Smyth, P. Diffusion generative models in infinite dimensions. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics (25–27 Apr 2023), F. Ruiz, J. Dy, and J.-W. van de Meent, Eds., vol. 206 of Proceedings of Machine Learning Research, PMLR, pp. 9538–9563.
[14] Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. arXiv:1412.6980 (Dec. 2014).
[15] Kolesnikov, A., Dosovitskiy, A., Weissenborn, D., Heigold, G., Uszkoreit, J., Beyer, L., Minderer, M., Dehghani, M., Houlsby, N., Gelly, S., Unterthiner, T., and Zhai, X. An image is worth 16x16 words: Transformers for image recognition at scale.
[16] Kovachki, N., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A., and Anandkumar, A. Neural operator: Learning maps between function spaces with applications to pdes. Journal of Machine Learning Research 24, 89 (2023), 1–97.
[17] Li, Z., Han, W., Zhang, Y., Fu, Q., Li, J., Qin, L., Dong, R., Sun, H., Deng, Y., and Yang, L. Learning spatiotemporal dynamics with a pretrained generative model. Nature Machine Intelligence 6, 12 (Dec. 2024), 1566–1579.
[18] Li, Z., Kovachki, N. B., Azizzadenesheli, K., liu, B., Bhattacharya, K., Stuart, A., and Anandkumar, A. Fourier neural operator for parametric partial differential equations. In International Conference on Learning Representations (2021).
[19] Li, Z., Zheng, H., Kovachki, N., Jin, D., Chen, H., Liu, B., Azizzadenesheli, K., and Anandkumar, A. Physics-informed neural operator for learning partial differential equations. ACM / IMS J. Data Sci. 1, 3 (May 2024).
[20] Lim, J. H., Kovachki, N. B., Baptista, R., Beckham, C., Azizzadenesheli, K., Kossaifi, J., Voleti, V., Song, J., Kreis, K., Kautz, J., Pal, C., Vahdat, A., and Anandkumar, A. Score-based diffusion models in function space. Journal of Machine Learning Research 26, 158 (2025), 1–62.
[21] Lim, S., YOON, E. B., Byun, T., Kang, T., Kim, S., Lee, K., and Choi, S. Score-based generative modeling through stochastic evolution equations in hilbert spaces. In Advances in Neural Information Processing Systems (2023), A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., vol. 36, Curran Associates, Inc., pp. 37799–37812.
[22] Lu, L., Jin, P., Pang, G., Zhang, Z., and Karniadakis, G. E. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature Machine Intelligence 3, 3 (Mar. 2021), 218–229.
[23] Mammadov, A., Berner, J., Azizzadenesheli, K., Ye, J. C., and Anandkumar, A. Diffusion-based inverse solver on function spaces with applications to pdes. Machine Learning and the Physical Sciences Workshop at NeurIPS (2024).
[24] Na, K., Lee, J., Yun, S.-Y., and Lim, S. Probability-flow ode in infinite-dimensional function spaces. arXiv:2503.10219 (Mar. 2025).
[25] Oommen, V., Bora, A., Zhang, Z., and Karniadakis, G. E. Integrating neural operators with diffusion models improves spectral representation in turbulence modeling. arXiv:2409.08477 (Sept. 2024).
[26] Phillips, A., Seror, T., Hutchinson, M. J., Bortoli, V. D., Doucet, A., and Mathieu, E. Spectral diffusion processes. In NeurIPS 2022 Workshop on Score-Based Methods (2022).
[27] Pidstrigach, J., Marzouk, Y., Reich, S., and Wang, S. Infinite-dimensional diffusion models. Journal of Machine Learning Research 25, 414 (2024), 1–52.
[28] Qiu, J., Huang, J., Zhang, X., Lin, Z., Pan, M., Liu, Z., and Miao, F. Pi-fusion: Physics-informed diffusion model for learning fluid dynamics. arXiv:2406.03711 (June 2024).
[29] Raissi, M., Perdikaris, P., and Karniadakis, G. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378 (2019), 686–707.
[30] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 10674–10685.
[31] Shu, D., Li, Z., and Farimani, A. B. A physics-informed diffusion model for high-fidelity flow field reconstruction. Journal of Computational Physics 478 (2023), 111972.
[32] Shysheya, A., Diaconu, C., Bergamin, F., Perdikaris, P., Hernández-Lobato, J. M., Turner, R. E., and Mathieu, E. On conditional diffusion models for pde simulations. In Advances in Neural Information Processing Systems (2024), A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, Eds., vol. 37, Curran Associates, Inc., pp. 23246–23300.
[33] Sirignano, J., and Spiliopoulos, K. Dgm: A deep learning algorithm for solving partial differential equations. Journal of Computational Physics 375 (2018), 1339–1364.
[34] Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (2021).
[35] Vahdat, A., Kreis, K., and Kautz, J. Score-based generative modeling in latent space. In Advances in Neural Information Processing Systems (2021), M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34, Curran Associates, Inc., pp. 11287–11302.
[36] Wang, S., Dou, Z., Liu, T.-R., and Lu, L. Fundiff: Diffusion models over function spaces for physics-informed generative modeling. arXiv:2506.07902 (2025).
[37] Yang, G., and Sommer, S. A denoising diffusion model for fluid field prediction. arXiv:2301.11661 (Jan. 2023).
[38] Yao, J., Mammadov, A., Berner, J., Kerrigan, G., Ye, J. C., Azizzadenesheli, K., and Anandkumar, A. Guided diffusion sampling on function spaces with applications to PDEs. In The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025).

Appendix A Proof of Lemma˜3.2

Proof.

The proof of Lemma˜3.2 relies on the following elementary connection between Sobolev spaces and Fourier coefficients (cf., e.g., Lemma 5.4 in [5]). We have that

\textstyle\begin{split}H^{k}(\mathbb{T}^{d})=\left\{f\in L^{2}(\mathbb{T}^{d})\colon\sum_{n\in\mathbb{Z}^{d}}\lvert\hat{f}(n)\rvert^{2}\lVert n\rVert_{2}^{2k}<\infty\right\}\end{split}

(23)

and for all $f\in H^{k}(\mathbb{T}^{d})$ the Sobolev norm is equivalent to

\begin{split}\lVert f\rVert_{H^{k}(\mathbb{T}^{d})}^{2}\simeq\sum_{n\in\mathbb{Z}^{d}}\lvert\hat{f}(n)\rvert^{2}\lVert n\rVert_{2}^{2k}.\end{split}

(24)

Note that (12) and the fact that $(\phi_{n})_{n\in\mathbb{Z}^{d}}$ is an orthonormal basis of $L^{2}(\mathbb{T}^{d})$ implies that for all $n\in\mathbb{Z}^{d}$ we have that

\begin{split}\hat{N}(n)=s_{n}Z_{n}.\end{split}

(25)

The assumption that $\mathbb{E}\big[\lVert X_{\mathrm{data}}\rVert_{H^{k}(\mathbb{T}^{d})}^{2}\big]<\infty$ , ˜10, and ˜24 therefore imply that

\begin{split}\mathbb{E}\left[\lVert N\rVert_{H^{k}(\mathbb{T}^{d})}^{2}\right]&\simeq\mathbb{E}\left[\sum_{n\in\mathbb{Z}^{d}}\lvert\hat{N}(n)\rvert^{2}\lVert n\rVert_{2}^{2k}\right]\\ &=\mathbb{E}\left[\sum_{n\in\mathbb{Z}^{d}}\lvert s_{n}Z_{n}\rvert^{2}\lVert n\rVert_{2}^{2k}\right]\\ &=\sum_{n\in\mathbb{Z}^{d}}(s_{n})^{2}\lVert n\rVert_{2}^{2k}\,\mathbb{E}\left[\lvert Z_{n}\rvert^{2}\right]\\ &=\sum_{n\in\mathbb{Z}^{d}}\mathrm{Var}\big(\widehat{X_{\mathrm{data}}}(n)\big)\lVert n\rVert_{2}^{2k}\\ &\leq\sum_{n\in\mathbb{Z}^{d}}\mathbb{E}\left[\lvert\widehat{X_{\mathrm{data}}}(n)\rvert^{2}\right]\lVert n\rVert_{2}^{2k}\\ &=\mathbb{E}\left[\sum_{n\in\mathbb{Z}^{d}}\lvert\widehat{X_{\mathrm{data}}}(n)\rvert^{2}\lVert n\rVert_{2}^{2k}\right]\\ &\simeq\mathbb{E}\left[\lVert X_{\mathrm{data}}\rVert_{H^{k}(\mathbb{T}^{d})}^{2}\right]<\infty.\end{split}

(26)

∎

Appendix B Experiment description.

Dataset and spectral transform preprocessing.

All experiments are conducted using the same datasets introduced in the DiffusionPDE work [10]. The datasets consist of pairs of PDE coefficients and corresponding solutions generated numerically for the target equations at a spatial resolution of $128\times 128$ . For the Navier–Stokes equations, we consider trajectories consisting of 10 consecutive time steps, which correspond to one second of physical time in the underlying simulation. Each training sample therefore contains the full spatio-temporal evolution of the vorticity field over this interval. We use the original training and test splits provided in that work to ensure a fair and direct comparison with existing diffusion-based PDE solvers. Prior to training, all fields are transformed into the spectral domain. For the Poisson and Helmholtz equations, we employ a discrete sine transform, while for the Navier–Stokes equations we use a standard Fourier transform appropriate for the periodic boundary condition.

To reduce the effective dimensionality and focus learning on the dominant modes, we truncate the spectral coefficients. For Poisson and Helmholtz, we retain $44\times 44$ modes using a hyperbolic truncation strategy (see Figure˜3), which prioritizes low-frequency modes while gradually reducing high-frequency components.

For Navier–Stokes, we retain the inner $32\times 32$ square of Fourier modes, after applying fftshift, which, after inverting the shift, results in Figure˜4.

All subsequent training and inference operations are performed in this truncated spectral space. The training objective is a standard denoising diffusion loss applied to the truncated coefficients, ensuring that the model captures the dominant physical structures while remaining computationally efficient.

The truncation of spectral coefficients is motivated by two considerations. First, retaining only the dominant modes results in negligible reconstruction error: even when reducing the full $128\times 128$ resolution to a $44\times 44$ coefficient grid for Poisson and Helmholtz, or $32\times 32$ for Navier–Stokes, the reconstructed fields closely match the original solutions. This ensures that the diffusion model can learn and generate high-fidelity solutions while operating in a lower-dimensional space.

Note that for the Poisson and Helmholtz equations, the sine transform requires that the input satisfies homogeneous Dirichlet boundary conditions. When transforming the coefficients, we no longer satisfy this constraint. To address this, we pad the coefficients $a$ by adding four external layers which gradually decreased boundary to $0$ , enforcing consistency with the sine transform. This step allows the truncated $44\times 44$ coefficient grid to encode the essential information of the solution while remaining compatible with the sine transform, ensuring accurate training and inference.

Training details.

All trainings were performed on 4 NVIDIA GeForce RTX 2080 GPUs. Training time for the Poisson and Helmholtz datasets was approximately 3 hours, while training for the Navier–Stokes dataset required about 10 hours as we consider solutions during all the time steps, even though the truncated spectral representation is larger.

The network architecture is based on a Vision Transformer (ViT), with adaptations for Navier–Stokes to handle the larger input size. We also experimented with a U-Net architecture, which provided comparable results in terms of accuracy. In all cases, our model has 2M parameters compared to the 54M in the DiffusionPDE and FunDPS’s networks.

Inference details.

For the Poisson and Helmholtz equations, we perform 100 independent simulations in each experimental setting to estimate variability, while for Navier–Stokes we use 50 simulations. The principal metric we report is the relative error, while we also evaluate the PDE residual to quantify how well the generated solutions satisfy the underlying differential equations.

Our method consistently produces lower PDE residuals compared to baseline approaches, thanks to the accurate computation of derivatives in Fourier space. For example, in Figure˜5 we compare the Laplacian of the solution $u$ computed with our Fourier-based derivatives versus finite-difference approximations obtained in other methods. The figure illustrates that our approach captures the differential structure more accurately, which directly contributes to improved PDE consistency.

The guidance coefficients used during inference vary slightly depending on the task and number of observations. We report the values for the Poisson and Helmholtz equations in Table˜5 and for the Navier–Stokes equations in Table˜6, providing a complete reference for reproducibility. Additionally, Table˜7 reports the time required to generate a single solution on a GeForce RTX 2080 Ti for Poisson and Helmholtz problems. A direct comparison of generation time for Navier–Stokes is not possible, as our method generates the full temporal trajectory rather than only the initial and final states, as done by the other approaches; generating one full solution trajectory requires approximately 7 minutes.

Table 5: Guidance coefficients used for Poisson and Helmholtz problems under different tasks and observations.

Case	Obs.	$\boldsymbol{\zeta_{u}}$	$\boldsymbol{\zeta_{a}}$	$\boldsymbol{\zeta_{\text{PDE}}}$
Forward	500	0	0.05	0.0005
	1000	0	0.05	0.0005
	Full	0	0.05	0.0001
Inverse	500	20	0	0.00005
	1000	20	0	0.00005
	Full	40	0	0.000005
Double	100	40	0.05	0.0002
Double	200	40	0.05	0.0002

Table 6: Guidance coefficients used for the Navier–Stokes equations with sparse observations.

Obs.	$\boldsymbol{\zeta_{obs}}$	$\boldsymbol{\zeta_{\text{PDE}}}$
200	0.0001	0.5
500	0.0001	0.5

Table 7: Time (in seconds) to generate one solution on a Geforce RTX 2080 Ti.

PDE	PISD (ours)	DiffusionPDE	FunDPS
Poisson	52	802	153
Helmholtz	52	802	171

Frequency-aware Adam guidance.

During inference, PDE and observation constraints are enforced via gradient-based guidance. Unlike standard diffusion guidance, which typically uses plain gradient descent, we employ a frequency-aware variant of the Adam optimizer. The first- and second-moment estimates are maintained across diffusion steps, and updates are modulated by frequency-dependent weights to prioritize physically meaningful low-frequency modes.

Specifically, let $\hat{X}_{k}$ denote the Fourier coefficient at mode $k$ , $g_{k}$ the gradient, and $w_{k}$ the frequency weight. The update at step $t$ is computed as:

$\displaystyle m_{k}^{(t)}$	$\displaystyle=\beta_{1}\,m_{k}^{(t-1)}+(1-\beta_{1})g_{k},$	(27)
$\displaystyle v_{k}^{(t)}$	$\displaystyle=\beta_{2}\,v_{k}^{(t-1)}+(1-\beta_{2})g_{k}^{2},$	(28)
$\displaystyle\hat{X}_{k}^{(t+1)}$	$\displaystyle=\hat{X}_{k}^{(t)}-\eta_{k}\frac{m_{k}^{(t)}}{\sqrt{v_{k}^{(t)}}+\epsilon},$	(29)

where the effective learning rate $\eta_{k}$ depends on the frequency mode:

\eta_{k}=\begin{cases}\text{lr}_{\text{low}}\cdot w_{k},&\text{for low frequencies}\\ \text{lr}_{\text{high}}\cdot w_{k},&\text{for high frequencies}.\end{cases}

(30)

The parameters used in our experiments are:

•

$\beta_{1}=0.985$ and $\beta_{2}=0.98$ for all sparse or partial observation cases
•

$\beta_{1}=0.97$ and $\beta_{2}=0.98$ for fully observed scenarios
•

$\text{lr}_{\text{low}}=0.2$ , $\text{lr}_{\text{high}}=0.01$ .

This formulation allows the low-frequency modes to be updated aggressively, capturing the main structure of the solution, while high-frequency modes are updated conservatively to reduce noise amplification. This design is important for stabilizing PDE-constrained diffusion and achieving low PDE residuals in all tested scenarios.

Appendix C Additional results.

Comparison between gradient descent and Adam guidance.

We compare standard gradient descent guidance with Adam-based guidance for the Poisson equation under forward and inverse problem settings, using different numbers of observations. The results are summarized in Table 8. Across all configurations, Adam consistently achieves lower relative error and reduced PDE residuals compared to gradient descent. In addition, Adam exhibits substantially lower variance indicating improved stability during inference.

The advantages of Adam guidance are particularly pronounced in the inverse problem, where gradient descent guidance often fails to converge to meaningful solutions, leading to large reconstruction errors and high variability. In contrast, Adam effectively balances the competing guidance terms and seems to act as a regularizer. These results support the use of adaptive optimization methods for diffusion guidance, rather than standard gradient descent, especially when enforcing PDE constraints.

Table 8: Comparison Adam and SGD with Poisson PDE.

Case	Obs.	Adam			SGD
		Rel. err.	PDE res.	Obs. rel. err.	Rel. err.	PDE res.	Obs. rel. err.
Forward	500	3.08 $\pm$ 1.71 %	0.87	0.11 %	3.60 $\pm$ 3.04 %	12.35	10.99 %
	1000	1.47 $\pm$ 0.90 %	2.31	0.10 %	2.12 $\pm$ 1.34 %	12.66	12.30 %
	Full	0.05 $\pm$ 0.03 %	3.78	2.38 %	0.90 $\pm$ 0.43 %	13.26	13.55 %
Inverse	500	13.81 $\pm$ 3.11 %	0.45	0.12 %	49.25 $\pm$ 15.42 %	7.73	5.85 %
	1000	12.09 $\pm$ 2.68 %	0.44	0.12 %	50.52 $\pm$ 15.56 %	19.65	6.28 %
	Full	7.95 $\pm$ 1.71 %	1.33	0.10 %	50.41 $\pm$ 15.59 %	22.23	6.02 %

Inference process figures.

Here we provide a comparison in the inference process between our method and the DiffusionPDE paper.

Qualitative results.

We present qualitative examples of solutions generated by PISD, DiffusionPDE, and FunDPS under sparse observation regimes. Figures˜7 and 8 illustrate forward and inverse problems for Poisson and Helmholtz equations with $500$ observations, showing reconstructions of both the solution $u$ and the coefficient $a$ , together with the corresponding pointwise error maps.

Figure˜9 shows a Navier–Stokes example conditioned only on sparse observations at the first and last time steps. The model successfully reconstructs the full spatio-temporal evolution of the flow, producing coherent intermediate dynamics that satisfy the governing equations. These qualitative results complement the quantitative comparisons and highlight the ability of PISD to enforce PDE constraints while maintaining global consistency under sparse supervision.