Real time filtering algorithms

Chang Qinlabel=e1]qinchang24@mails.ucas.ac.cn State Key Laboratory of Optical Field Manipulation Science and Technology, Institute of Optics and Electronics, Chinese Academy of Sciences
Chengdu 610209, China
Key Laboratory of Optical Engineering, Chinese Academy of Sciences
Chengdu 610209, China
University of Chinese Academy of Sciences
Beijing 101408, China
Yikun Lilabel=e2]liyikun24@mails.ucas.ac.cn State Key Laboratory of Optical Field Manipulation Science and Technology, Institute of Optics and Electronics, Chinese Academy of Sciences
Chengdu 610209, China
Key Laboratory of Optical Engineering, Chinese Academy of Sciences
Chengdu 610209, China
University of Chinese Academy of Sciences
Beijing 101408, China
Ru Qianlabel=e3]qianru25@mails.ucas.ac.cn State Key Laboratory of Optical Field Manipulation Science and Technology, Institute of Optics and Electronics, Chinese Academy of Sciences
Chengdu 610209, China
Key Laboratory of Optical Engineering, Chinese Academy of Sciences
Chengdu 610209, China
University of Chinese Academy of Sciences
Beijing 101408, China
Jiayi Kanglabel=e4]kangjiayi@bimsa.cn Hetao Institute of Mathematics and Interdisciplinary Sciences (HIMIS)
Shenzhen 518000, Guangdong, P. R. China
Beijing Institute of Mathematical Sciences and Applications (BIMSA)
Beijing 101408, P. R. China
Yao Maolabel=e5]maoyao@ioe.ac.cn State Key Laboratory of Optical Field Manipulation Science and Technology, Institute of Optics and Electronics, Chinese Academy of Sciences
Chengdu 610209, China
Key Laboratory of Optical Engineering, Chinese Academy of Sciences
Chengdu 610209, China
University of Chinese Academy of Sciences
Beijing 101408, China

(\sday3 1 2022)

Abstract

This paper presents a systematic review of recent advances in nonlinear filtering algorithms, structured into three principal categories: Kalman-type methods, Monte Carlo methods, and the Yau-Yau algorithm. For each category, we provide a comprehensive synthesis of theoretical developments, algorithmic variants, and practical applications that have emerged in recent years. Importantly, this review addresses both continuous-time and discrete-time system formulations, offering a unified review of filtering methodologies across different frameworks. Furthermore, our analysis reveals the transformative influence of artificial intelligence breakthroughs on the entire nonlinear filtering field, particularly in areas such as learning-based filters, neural network-augmented algorithms, and data-driven approaches.

keywords:

Nonlinear filtering, Kalman filtering, Yau-Yau algorithm, Deep learning

^†^†volume: 0^†^†issue: 0

\startlocaldefs\endlocaldefs

, , , , t1Corresponding author

1 Introduction

Nonlinear filtering is widely recognized as a fundamental and challenging problem in modern signal processing, control theory, and machine learning. The objective is to estimate the hidden state of a dynamical system based on noisy observations, where either the system dynamics, observation model, or both exhibit nonlinear characteristics. This problem pervades numerous applications including robotics [robotics2011], autonomous vehicles [LIU2018605], financial modeling [wells2013kalman], biomedical signal processing [biomedical2015], and aerospace systems [saturno2025fbg].

The journey of nonlinear filtering began with the seminal work of Kalman in 1960, which provided an optimal solution for linear Gaussian systems. However, real-world systems rarely conform to these idealized assumptions. Most practical systems exhibit nonlinear dynamics [wan2000unscented], non-Gaussian noise distributions [11175567], and model uncertainties [bai2018extended], necessitating the development of more sophisticated filtering techniques.

In contrast to Kalman-type methods, a series of Monte Carlo methods have been developed for nonlinear filtering [rigatos2009particle]. These techniques approach the problem by generating random samples to approximate the posterior probability distribution of the system state. This sampling-based perspective offers a powerful and versatile framework for addressing complex nonlinear scenarios where traditional methods struggle.

Recent advances have shown that machine learning approaches, particularly neural networks, can significantly enhance traditional filtering methods, leading to hybrid frameworks that combine the theoretical rigor of classical methods with the flexibility and learning capabilities of modern data-driven approaches.

This survey provides a comprehensive review of the evolution from classical nonlinear filtering techniques to state-of-the-art neural network-enhanced methods. We organize our discussion into three main paradigms:

1.

Kalman-type methods that extend the classical Kalman filter to handle nonlinearity through various approximation strategies (Section 2);
2.

Monte Carlo methods that use particle-based representations to approximate complex probability distributions (Section 3);
3.

Yau Yau algorithm and Neural network methods(Section 4);

Finally, Section 5 is the conclusion and future prospects of the study of the real time filtering algorithms.

2 Kalman-Type Methods

The Kalman filter (KF) is widely applied in linear systems, providing optimal estimates under Gaussian assumption [kalman]. However, the practical systems often exhibit nonlinear dynamics and non-Gaussian uncertainties which violate the linear Gaussian assumption. As a result, the performance of the standard KF can deteriorate significantly or even lead to divergence when applied to nonlinear systems. To address these limitations, several nonlinear variants of KF have been developed. This section reviews these advanced methodologies in three main categories: extended filtering approaches, optimization-driven filtering innovations, and hybrid multi-model filtering techniques.

2.1 Extended Filtering Approaches

The Extended Kalman Filter (EKF) was developed as an early solution to nonlinear filtering problems [ribeiro2004kalman]. It uses a first-order Taylor expansion to linearize system nonlinearities and employs the Jacobian matrix for covariance propagation. This approach maintains the recursive form and computational efficiency of the original Kalman filter, making it suitable for real-time applications. However, the EKF is limited by its dependence on first-order approximation accuracy, often introducing significant errors in highly nonlinear systems. Additionally, the requirement for analytical Jacobian computation can be impractical for complex models. Despite these shortcomings, the EKF remains widely adopted in navigation, robotics, and control systems owing to its simplicity and low computational cost [potokar2021invariant, reina2019vehicle].

In contrast to the EKF’s reliance on Taylor series linearization, the Unscented Kalman Filter (UKF) was developed to more accurately approximate the state probability distribution through the use of the Unscented Transformation (UT) [julier1997new]. The UKF strategically selects a set of sigma points based on the current state mean and covariance, each assigned with carefully chosen weights. These points are then propagated through the exact nonlinear functions, and the resulting transformed points are used to compute the posterior mean and covariance. This approach enables the UKF to avoid linearization errors and achieve estimation accuracy equivalent to a second-order Taylor expansion, while maintaining computational efficiency.

Building upon the sigma-point framework, the Cubature Kalman Filter (CKF) was proposed as a mathematically rigorous alternative with superior numerical stability [arasaratnam2009cubature]. Unlike the UKF’s heuristic sigma-point selection, the CKF derives its sampling points systematically through the third-degree spherical-radial cubature rule, which provides exact numerical integration for Gaussian-weighted nonlinear functions. It employs $2n$ symmetrically distributed points with equal weights, systematically propagating them through nonlinear functions. Compared to UKF, CKF provides stronger theoretical foundations, avoids heuristic parameter tuning, and demonstrates better numerical stability—especially in high-dimensional state estimation. Its deterministic sampling and minimal point requirement further support efficient implementation in embedded and real-time systems [sharma2017cubature, li2020application].

Each of the three classical nonlinear filtering algorithms mentioned above exhibits unique advantages, leading to their development and adoption across diverse specialized fields, which are summarized in Table 1.

Niche fields	EKF	UKF	CKF
Outlier-robust	[qiu2023outlier, tao2024stochastic]	[liu2024convolutional, nakabayashi2019nonlinear]	[wang2022computationally, wang2022outlier]
Model uncertainties	[zhao2017dynamic, ghobadi2017robust]	[deng2019adaptive]	[tasooji2025cubature]
Non-Gaussian noise	[liu2019linear]	[hu2024robust, shen2023stochastic]	[lin2024adaptive, li2021robust]
Unknown noise	[huang2017new, he2020method]	[shen2023stochastic, hu2020unscented]	[yan2024variational]
Distributed system	[lu2019novel, li2017distributed]	[murata2020extended, yang2019dynamic]	[qu2025cooperative, zhou2023distributed]
Continuous-discrete system	[guihal2021efficient]	[knudsen2018new]	[arasaratnam2010cubature, wang2019new]
Communication constrained	[song2019variance, li2023event]	[shen2023stochastic]	[kooshkbaghi2019event, li2018stochastic]

Table 1: Comparison of classical Kalman-typed method

Besides, several nonlinear filtering methods beyond EKF, UKF, and CKF have been developed to address complex estimation problems. To address bearing-only measurement (BOM) challenges, the Pseudo-linear Kalman Filter (PLKF) was introduced as an efficient solution for target tracking within the Kalman filtering framework [lingren1978position]. By converting angular measurements into a pseudo-linear form using trigonometric transformations, PLKF eliminates the need for complex logical rules to manage quadrant ambiguity, significantly improving the stability and reliability of the estimation process. Due to the bias caused by statistical correlation between the pseudo-linear measurement matrix and angular measurements in PLKF, improved versions such as Bias-Compensated PLKF (BCPLKF) and Instrumental Variable-based PLKF (IVPLKF) have been developed [nguyen2017improved, nguyen2018instrumental]. Based on the concept of the Extended State Observer (ESO), an Extended State-Based Kalman Filter (ESKF) was proposed to handle nonlinear uncertain systems by augmenting the nonlinear term as part of the state vector and estimating it in real time [bai2018extended].

2.2 Optimization-Driven Filtering Innovations

The KF yields results that can be equivalently interpreted as the optimal solution to a covariance minimization problem. In its standard form, KF employs the minimum mean square error (MMSE) criterion as the cost function for estimates in linear Gaussian systems. Over the past decade, various alternative optimization criteria have been developed to extend KF for more complex tasks. For instance, to address non-Gaussian noise, a family of robust correntropy-based cost functions has been introduced. These are used in nonlinear filtering methods within the framework of Information Theoretic Learning (ITL) to construct robust similarity measures [chen2017maximum, li2023generalized]. Furthermore, Huang et al. promoted this approach by replacing the Gaussian kernel function with a statistical similarity measure if the similarity function satisfies continuity, monotonically decreasing, and non-negative second-order derivative [huang2020novel]. In contrast to robust cost function approaches, another line of research has focused on variational Bayesian methods, which model complex distributions—such as Student’s t-distribution—to better approximate real-world noise distributions [bai2020novel, huang2017robust]. Within this framework, the variational distribution is aligned with the true posterior by minimizing the Kullback-Leibler (KL) divergence, thereby ensuring the variational approximation closely matches the actual posterior distribution. Besides, KF can also be derived from the perspective of maximum a posteriori (MAP) estimation. Therfore, a series of iterated KFs were proposed in the MAP framework. To address state-dependent multiplicative noise in observations, a Generalized Iterated Kalman Filter (GIKF) was developed [hu2015generalized]. It employs a Newton-type optimization framework along with explicit multiplicative noise modeling to achieve theoretical attainment of the Cramér–Rao Bound, emphasizing optimal estimation performance. Furthermore, an extended method called Improved Iterated CKF (IICKF) employs a damped Newton method with adaptive step size control to ensure convergence while maintaining computational efficiency [CUI2017460].

Recent advances in distributed Kalman filtering have been dominated by optimization-theoretic approaches that reformulate the filtering problem as distributed optimization. Three key methodologies represent this trend: Ryu and Back proposed a consensus-based optimization framework that recovers centralized performance asymptotically through dual decomposition, relaxing traditional requirements for local observability [ryu2023consensus]. Building on this, Iqbal et al. developed an ADMM-based algorithm that eliminates dual variable exchanges and establishes tight stability bounds, significantly improving communication efficiency [iqbal2025communication]. Finally, Calvo-Fullana and How introduced a mission-aware censoring scheme using Value of Information criteria within a windowed MAP estimation framework, achieving substantial communication reduction while preserving estimation accuracy [calvo2022mission]. Together, these methods demonstrate how modern optimization techniques can simultaneously address estimation performance, communication efficiency, and resource constraints in distributed filtering systems.

2.3 Hybrid Approaches and Multi-Model Filtering

The current frontier in state estimation involves the integration of multiple filtering paradigms within unified frameworks. Recent innovations include adaptive model selection algorithms that automatically switch between different filtering approaches based on system conditions [PAL2024111301, gaoInteractingMultipleModel2017]. For example, the Adaptive High-Order Extended Kalman Filter (AHEKF) adjusts its order according to the innovation function to balance computational burden and estimation accuracy [CHEN2021105539]. Additionally, hybrid filtering strategies that combine different methods have been developed to handle complex systems, such as those with pre-existing or sudden sensor faults and systems exhibiting multiple degradation phases [aswalSwitchingKalmanFilter2022, 7358146, ZHAO201840]. These approaches focus on addressing the challenges of complex system filtering but still face issues such as determining optimal switching thresholds and mitigating the effects of transitions between models [10138793]. Some studies have also integrated neural networks with traditional filtering methods to achieve higher estimation accuracy [AHMADI2025137752, 9855832].

Hierarchical filtering architectures, which operate simultaneously across multiple time scales, have shown strong performance in complex and heterogeneous systems. These methods can estimate both model parameters and system states at different temporal resolutions, thereby improving model accuracy and stability [WEI20171264, 10286165]. Furthermore, they can be extended to handle heterogeneous sensor networks. For instance, a multi-rate Kalman filtering approach based on data fusion was proposed to integrate biased high-frequency acceleration measurements with low-frequency displacement data [zhengDataFusionBased2019]. Such methods effectively address the challenge of mismatched sampling rates among different sensors, which is of great practical importance in real-world applications [shenMultirateStrongTracking2022, zhaoDistributedRecursiveFiltering2022].

2.4 Summary

Despite these substantial algorithmic innovations and practical successes, a fundamental theoretical limitation persists across all Kalman-type methodologies: the absence of rigorous convergence analysis and stability guarantees. This theoretical gap necessitates continuous algorithmic refinements and ad-hoc modifications—ranging from outlier-robust formulations to adaptive parameter tuning schemes—to maintain performance across varying operational conditions. Consequently, while these methods provide powerful empirical solutions, their deployment in safety-critical and mission-critical systems remains constrained by the lack of formal convergence assurance and predictable stability bounds.

3 Monte Carlo Methods in Nonlinear filtering

Many real-world systems are nonlinear and non-Gaussian [daum2005nonlinear], making them difficult to model with simple linear models or normal distributions. Kalman-type methods often require linearization, which can reduce accuracy in complex systems. In contrast, Monte Carlo-based filtering, such as Particle Filters (PFs), provides more accurate state estimation without relying on linearization or Gaussian assumptions [wang2017survey]. PFs approximate the system’s posterior distribution using particles, which allows them to handle medium-dimensional and low-dimensional nonlinear systems better. However, PFs suffer from poor real-time performance when handling high-dimensional systems. Despite the advancements in PFs, challenges like particle degeneracy and low sampling efficiency still exist. To overcome these, Feedback Particle Filter (FPF) was developed, introducing a feedback mechanism based on innovation errors to improve particle effectiveness and algorithm stability [yang2011meanfield, yang2014cd_fpf]. However, it should be made clear that FPF has not fully resolved the poor real-time performance of PFs in high-dimensional systems. This section reviews classical PF methods and variants and then highlights FPF.

3.1 Particle Filtering Advances

The most classic and revolutionary PF, the Bayesian bootstrap filter, is a novel nonlinear filter proposed in [gordon1993bootstrap]. In the update step, it is implemented using a weighted bootstrap approach, which is how the filter gets its name. Unlike Markov chain approximation methods or any other standard discretization schemes for the Fokker-Planck Equation (FPE), PFs avoid defining grids in the state space, with samples naturally concentrating in regions of high probability density [budhiraja2007survey]. In fact, there is no need to know stochastic calculus or FPE, as well as various numerical methods for solving systems of partial differential equations. The essence of PFs is using Monte Carlo sampling to approximate stochastic calculus, representing the required Probability Density Function (PDF) as a set of random samples rather than as a function over the state space. As the number of samples increases, they provide an accurate and equivalent representation of the desired PDF. Estimates of the various moments of the state vector’s PDF, as well as the Highest Posterior Density (HPD) intervals or mode estimates, can be obtained from the samples [gordon1993bootstrap]. Furthermore, PFs are capable of handling systems or measurement noise that are both nonlinear and have any distribution. The development of PFs has a long history, and for more details, one can refer to [doucet2001smc] to explore this field.

A basic form of the classic PF includes two steps:

•

Prediction step: Based on the system’s dynamic model, each particle is propagated to simulate its state at the current time.
•

Update step: Based on the observation data, the weight of each particle is updated to adjust the particle’s state according to the likelihood of the particle’s state given the observation. Finally, the posterior state is approximated by these weighted particles as follows:

$p(x_{k}|z_{1:k})=\sum_{i=1}^{N}w_{k}^{i}\delta(x_{k}-x_{k}^{i})$

where $w_{k}^{i}$ is the weight of the $i$ -th particle, $x_{k}^{i}$ is the position of the $i$ -th particle at time $k$ , and $\delta(x_{k}-x_{k}^{i})$ is the Dirac delta function, indicating the position of each particle.

While PFs excel in handling nonlinear and non-Gaussian problems, they face challenges such as the curse of dimensionality [daum2005nonlinear], particle degeneracy [budhiraja2007survey], and poor real-time performance in high-dimensional systems. The curse of dimensionality, coined by Richard Bellman, refers to the exponential increase in computational complexity as the state space’s dimensionality grows. In high-dimensional spaces, a large number of particles is needed to accurately capture the state distribution, leading to a significant rise in computational cost. A detailed analysis of PF’s computational complexity for a given estimation accuracy can be found in [daum2003curse]. Additionally, particle degeneracy occurs when particle weights concentrate on only a few particles, reducing diversity and compromising estimation accuracy. To counter this, resampling techniques are used to restore the effectiveness of the particle set. There has now been a series of works aimed at reducing the effects of particle degeneracy and the curse of dimensionality, leading to a rich variety of PFs, rather than ‘the’ PF. These advances optimize steps like proposal density, sampling methods, and resampling strategies, aiming to improve PF performance in high-dimensional, nonlinear, and noisy environments. Table 2 offers a detailed review of the techniques and advancements in PFs that help tackle these challenges.

Aspect	Impact	Improvement
Proposal Density	Determines the effectiveness of particle sampling in the state space. Gaussian distributions causes Monte Carlo samples to be poorly distributed in the state space.	Use more complex proposal distributions, such as mixtures of Gaussian [raihan2018particleA, raihan2018particleB] and other exponential-family components.
Sampling Methods	Traditional sampling methods can lead to particle degeneracy, especially in high-dimensional spaces.	Use advanced sampling methods like Metropolis-Hastings [dahlin2019getting] and Gibbs sampling [sun2025metropolis].
Resampling	Frequent resampling increases computational costs and may lose useful information.	Implement sparse and adaptive resampling [aunsri2021adaptive] to reduce computational costs without sacrificing accuracy.
Variance Reduction	PFs are often subject to high variance, especially when data noise is large or the system is complex, leading to fluctuations in estimation results.	Apply variance reduction methods such as stratified sampling, control variates, and antithetic variables to effectively reduce variance [capriotti2008reducing, song2023monte, li2022stratification].

Table 2: The Variants of PF

However, these variants alleviate particle degeneracy and the curse of dimensionality to some extent, but they do not resolve the poor real-time performance in high-dimensional systems. PFs converge asymptotically but should not be regarded as real-time solutions: dimensionality and sampling variance prevent bounded-latency computation of the exact posterior.

3.2 Feedback Particle Filter

Existing PFs have not fundamentally overcome these issues. Notably, these PFs lack the KF’s feedback structure based on innovation errors, a structure that is as important as the algorithm itself [yang2014feedback]. Without it, achieving scalable and cost-effective solutions is difficult. This section introduces the FPF, which retains the KF feedback structure and incorporates optimal control theory to optimize the state estimation of the particle system. This method generates a new particle system model with control inputs. Compared with other PFs, FPF typically offers higher accuracy at lower computational cost. Beyond the FPF introduced in this paper, control-based nonlinear filtering methods are also gaining attention; see [crisan2009approximate, daum2010generalized, pequito2011nonlinear, ma2011generalizing] for related work.

In the FPF, the model of the $i$ -th particle with control input is defined as follows:

dX_{t}^{i}=a(X_{t}^{i})dt+\sigma(X_{t}^{i})dB_{t}^{i}+dU_{t}^{i},

where $X_{t}^{i}\in\mathbb{R}^{d}$ represents the state of the $i$ -th particle at time $t$ , $U_{t}^{i}$ is the corresponding control input, and $\{B_{t}^{i}\}$ are independent standard Wiener processes. Additional assumptions are made about the permissible forms of the control input.

The conditional distribution of a particle $X_{t}^{i}$ given $\mathcal{F}_{t}$ is denoted by $p$ . For any measurable set $A\subset\mathbb{R}^{d}$ , we have

\int_{X\in A}p(x,t)\,dx=\mathbb{P}\{X_{t}^{i}\in A\mid\mathcal{F}_{t}\}.

The true posterior of the system state $X_{t}$ is represented by $p^{*}$ . The initial conditions $\{X_{0}^{i}\}_{i=1}^{N}$ are assumed to be i.i.d. and drawn from the initial distribution $p^{*}(x,0)$ of $X_{0}$ , thus $p(x,0)=p^{*}(x,0)$ .

The goal of the FPF is to choose the control input $U_{t}^{i}$ such that $p$ approximates $p^{*}$ , and consequently, $p^{(N)}$ approximates $p^{*}$ for large $N$ . The synthesis of the control input is formulated as a variational problem, with the Kullback-Leibler (KL) divergence serving as the cost function. The optimal control input is obtained by analyzing the first variation, leading to an explicit formula for the optimal control input, ensuring that $p=p^{*}$ under optimal control.

In essence, FPF improves on traditional PFs by incorporating error-based feedback, making it more robust, efficient, and suitable for complex systems. Here is a brief summary of the advantages of FPF compared to PFs [yang2013feedback]:

•

Innovation Error-Based Feedback: FPF uses an error-based feedback structure, similar to the KF, enhancing robustness and better handling uncertainties in nonlinear systems.
•

No Resampling: FPF eliminates the need for resampling, avoiding particle degeneracy and improving stability and efficiency.
•

Variance Reduction: The feedback structure reduces variance, improving accuracy and lowering computational costs.

In addition, there are two extensions of FPF. To address nonlinear filtering problems with data association uncertainty, the classic Probabilistic Data Association Filter (PDAF) is extended to obtain PDA-FPF [yang2012joint]. To handle nonlinear filtering problems with model association uncertainty, the classic KF-based Interacting Multiple Model Filter (IMM) is extended, resulting in IMM-FPF [yang2013interacting]. Subsequently, the FPF can also be combined with optimal transport theory to obtain related algorithms [TaghvaeiOT, kang2022optimal, kang2023finite, taghvaei2023survey, kang2025].

Although FPF avoids resampling and restores innovation feedback, it remains an interacting-particle approximation requiring large ensembles and approximate gains. Thus, FPF improves upon PFs but does not remove the real-time performance constraints intrinsic to particle-based methods. These limitations highlight the need for a fundamentally different approach-one that can deliver an exact posterior in real time. Section 4 presents the Yau-Yau filter, which uniquely satisfies this criterion.

3.3 Summary

From a real-time algorithmic perspective, while PF methods suffer from computational inefficiencies due to particle degeneracy and scaling limitations, FPF offers notable improvements through deterministic particle evolution and reduced computational overhead. However, FPF’s advantages are primarily realized in Gaussian and near-Gaussian systems where optimal feedback gains can be analytically determined. For general nonlinear systems, FPF still requires complex feedback function design and substantially large particle populations to maintain estimation accuracy, thereby negating its computational benefits and further highlighting the fundamental real-time performance constraints inherent to particle-based filtering methodologies.

4 Conditional Density Evolution Methods

The Duncan, Mortensen, and Zakai (DMZ) equation, independently derived by Duncan [Duncan1967], Mortensen [Mortensen1966], and Zakai [Zakai1969] in the late 1960s, represents a stochastic partial differential equation (PDE) that governs the evolution of the unnormalized conditional probability density function in continuous-time nonlinear filtering problems. Solving the DMZ equation numerically allows for the computation of optimal state estimates, such as conditional expectations, via normalization and integration. This equation has been pivotal in bridging PDE theory with filtering algorithms, enabling the application of numerical PDE methods to nonlinear filtering [florchinger1991time, baras1983existence, atar1999robustness, gobet2006discretization, crisan2022application].

Direct methods provide an alternative for solving the DMZ equation[yau1994new, yau1996direct, yau2001finite, hu2002finite], excelling in Yau filtering systems where the drift term is affine with a smooth potential function. Generalizations transform the equation into time-varying Schrodinger forms for linear-growth cases or solve it via ODEs for Gaussian initials [yau2003explicit, chen2017direct]. Gaussian approximation algorithms decompose arbitrary distributions into Gaussians, facilitating Kolmogorov equation solutions through ODEs [shi2018direct, chendirect2018].

Given the real-time demands in applications like aerospace engineering, the efficiency of solving the DMZ equation is crucial for continuous filtering algorithms. In general, the solution of the DMZ does not have a closed form. Yau and Yau [yau2000real, yau-Yau2008], who proposed a decomposed computational approach to the DMZ equation, made a significant advancement in addressing this challenge. Their method partitions the solution process into two distinct phases: an online stage requiring only straightforward exponential operations, and an offline stage handling the numerically intensive Kolmogorov forward equation (KFE). This decomposition strategy forms the foundation of what we term the Yau–Yau algorithmic framework. The Yau-Yau filtering algorithm enables the systematic resolution of nonlinear filtering problems through partial differential equation theory and algorithms. For instance, by leveraging the Yau-Yau filtering algorithm, the convex maximum principle from PDE theory can be extended to the concept of convex filters in nonlinear filtering [kang2023]. This framework enables the transformation of any nonlinear filtering problem into a PDE numerical computation problem, offering rigorous theoretical foundations. The recent convergence analysis can be founded in [kang2025explicit, sun2025convergence]. However, traditional PDE numerical solution methods are fundamentally constrained by the curse of dimensionality, which prevents conventional PDE-based filtering algorithms from serving as universal solutions for high-dimensional, highly nonlinear filtering applications. After years of development, a series of real-time algorithms based on the Yau-Yau filter have emerged. An important distinction among different Yau-Yau filtering algorithms lies in the varying numerical methods used to solve the PDE, such as finite difference methods [yueh2014efficient] and spectral methods [luo2013complete, luo2013hermite, Dong2021].

4.1 Neural Network Revolution in Nonlinear Filtering

In recent years, filtering has undergone a profound shift from traditional model-based filtering techniques to data-driven approaches [klushyn2021latent] [sun2025recurrent]. Classical methods, such as the KF and its variants, offer strong interpretability, reliable uncertainty quantification through covariance matrices, and low computational complexity under linear Gaussian assumptions. However, their efficacy relies heavily on accurate state-space (SS) models, which are challenging to obtain in practice. Real-world systems frequently exhibit nonlinearity, non-Gaussian noise, and model-reality mismatches, resulting in key limitations:

•

Inevitable modeling errors from approximated dynamics;
•

Complex and often unknown noise distributions;
•

Degraded performance in highly nonlinear systems;
•

Increased computational latency in nonlinear variants.

These shortcomings have spurred the exploration of neural network-based methods, which can learn intricate mappings between states and observations directly from data. Deep learning excels in high-dimensional, nonlinear, and non-Gaussian settings, serving as a powerful complement to traditional KF-based approaches [revach2022kalmannet].

4.2 Deep Learning Integration with Classical Filtering

Machine learning, particularly deep learning, has revolutionized fields like computer vision, natural language processing, and speech recognition by promoting a data-driven paradigm. In this approach, complex neural networks supplant simplistic analytical models, enabling end-to-end training without explicit approximations. This paradigm is especially advantageous when system models are unknown or overly intricate [becker2019recurrent, krishnan2017structured]. In state estimation, adopting deep neural networks (DNNs) directly addresses the limitations of model-based methods. However, purely data-driven DNNs pose challenges in signal processing contexts:

•

High resource demands: Overparameterized DNNs require substantial computational power and large datasets, limiting deployment on resource-constrained devices;
•

Limited interpretability: The black-box nature of DNNs obscures the reasoning behind predictions;
•

Poor generalization and uncertainty handling: DNNs falter under distribution shifts and often lack robust uncertainty estimates.

To mitigate these issues, research has evolved from replacing KFs with DNNs to hybrid ”model-driven + data-driven” frameworks [Shlezinger2023Model]. Current integration strategies can be categorized as follows:

•

DNNs preprocess raw data into features compatible with known SS models for subsequent KF application [klushyn2021latent, Coskun2017Long];
•

DNNs learn SS models from data to inform KF operations [Imbiriba2024Augmented];
•

KFs are reparameterized as trainable ML modules, enabling supervised [revach2022kalmannet] or unsupervised learning [Ghosh2024DANSE].

This fusion represents a major paradigm shift, allowing neural networks to infer system dynamics and observation models from data, often bypassing explicit mathematical formulations. Modern techniques incorporate transformers and attention mechanisms to manage variable-length sequences and multi-scale temporal dependencies, yielding breakthroughs in applications like financial time series analysis, where non-stationary patterns challenge classical methods. Hybrid models have found success in diverse domains, including brain-computer interfaces, acoustic echo cancellation, financial monitoring, wireless beam tracking, and UAV surveillance.

4.3 Deep Learning Integration with Yau-Yau algorithm

While high-dimensional problems remain challenging, advancements in artificial intelligence and deep learning have shown promise in addressing them [chen2023, Tao2023, wang2021deep, Fu2023, Jiao2023, Wang2022, Yin2020]. By leveraging data-driven RNN frameworks, Chen et al. achieve a complete numerical implementation of the high-dimensional, highly nonlinear Yau Yau algorithm [chen2025] . This implementation approach can be rigorously proven from a mathematical perspective, ensuring theoretical correspondence with the Yau Yau algorithm and providing convergence guarantees. Furthermore, mathematical analysis demonstrates that this filtering algorithm can theoretically overcome the curse of dimensionality. This breakthrough provides a definitive solution to the long-standing open problem in nonlinear filtering that has remained unresolved for decades.

Next, we will summarize the Yau-Yau filtering algorithms integrated with neural networks from two perspectives: 1) Yau-Yau filtering enhanced by neural networks, and 2) Yau-Yau filtering implemented via neural networks. The core difference between the two lies in the fact that the former strengthens specific aspects of the Yau-Yau algorithm’s practical computations using targeted neural network methods. The latter fully realizes the entire computational process of Yau-Yau filtering based on neural networks. The first method includes using physics-informed neural networks to solve partial differential equations, thereby replacing the offline part of the Yau-Yau filtering algorithm. The second method includes using recurrent neural networks and other approaches to construct a complete end-to-end training framework.

When comparing the two approaches, the first method only replaces certain computational components, making it superior in terms of interpretability. The second method, being entirely based on more mature neural network frameworks, excels in training convenience and practical effectiveness. More importantly, through the robust theoretical framework of the Yau-Yau filtering algorithm, we can explain the theoretical advantages of architectures based on recurrent neural networks. This is an extremely important work [chen2025], as it bridges advanced filtering theory with cutting-edge neural network implementation methods .

4.4 Summary

Recent years have witnessed a paradigm shift from model-based filtering to data-driven approaches. Classical methods like the Kalman filter, despite their interpretability and theoretical rigor, suffer from modeling errors, unknown noise distributions, and degraded performance in nonlinear systems. Deep neural networks address these limitations by learning complex mappings directly from data, excelling in high-dimensional, nonlinear, and non-Gaussian settings. However, purely data-driven approaches face challenges including high computational demands, limited interpretability, and poor generalization under distribution shifts.

Contemporary research emphasizes hybrid ”model-driven + data-driven” frameworks that integrate neural networks with classical filtering theory. Notably, Chen, Sun, and Yau achieved a breakthrough by implementing the high-dimensional Yau-Yau algorithm through data-driven RNN frameworks [chen2025], providing mathematical guarantees for convergence while theoretically overcoming the curse of dimensionality. This advancement bridges advanced filtering theory with modern neural network implementations, offering a definitive solution to longstanding challenges in high-dimensional nonlinear filtering.

5 Conclusion

This survey reveals fundamental trade-offs inherent in existing nonlinear filtering methodologies. Kalman-type methods generally satisfy real-time computational requirements. However, they suffer from poor accuracy and lack theoretical guarantees in highly nonlinear systems. This necessitates extensive algorithmic extensions and ad-hoc modifications. Monte Carlo approaches demonstrate effectiveness for nonlinear problems but typically fail to meet real-time constraints in high-dimensional scenarios. While FPF enhances particle utilization efficiency and overcomes the curse of dimensionality for linear models, it remains inadequate for strongly nonlinear systems. The Yau-Yau algorithm represents a significant breakthrough, providing the first complete theoretical convergence guarantees for the broadest class of nonlinear filtering systems. However, direct implementation through traditional PDE numerical methods remains constrained by computational curse of dimensionality. Leveraging DNN-based techniques, [chen2025] presents a comprehensive RNN implementation of the Yau-Yau algorithm. This approach achieves both theoretical convergence guarantees and overcomes the curse of dimensionality. The method demonstrates exceptional numerical performance, thereby establishing a unified solution that reconciles theoretical rigor with practical computational efficiency for high-dimensional nonlinear filtering.

6 Future Works

We conclude this review by highlighting several open challenges in nonlinear filtering that offer promising directions for future research:

1.

Adaptive AI-enhanced Filtering for Time-Varying SS Models: Traditional AI-enhanced KFs assume fixed observation models that align between training and deployment stages. Future studies should develop flexible methods to handle mismatched or evolving state and measurement equations without clear patterns.
2.

Non-Markovian State Dynamics: Current methods, as discussed in this review, rely on Markovian state transitions. A key opportunity lies in creating estimators that integrate short- and long-term dependencies to enable non-Markovian models.
3.

Non-Gaussian and Manifold-Constrained Filtering: Standard algorithms generally assume Gaussian noise and Euclidean spaces. Upcoming work could focus on advanced techniques for general noise distributions and manifold-based state equations, improving accuracy in complex systems.

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No.42450242).