Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures

Recent advances in generative world models have enabled classical safe control methods, such as Hamilton-Jacobi (HJ) reachability, to generalize to complex robotic systems operating directly from high-dimensional sensor observations. However, obtaining comprehensive coverage of all safety-critical scenarios during world model training is extremely challenging. As a result, latent safety filters built on top of these models may miss novel hazards and even fail to prevent known ones, overconfidently misclassifying risky out-of-distribution (OOD) situations as safe. To address this, we introduce an uncertainty-aware latent safety filter that proactively steers robots away from both known and unseen failures. Our key idea is to use the world model's epistemic uncertainty as a proxy for identifying unseen potential hazards. We propose a principled method to detect OOD world model predictions by calibrating an uncertainty threshold via conformal prediction. By performing reachability analysis in an augmented state space--spanning both the latent representation and the epistemic uncertainty--we synthesize a latent safety filter that can reliably safeguard arbitrary policies from both known and unseen safety hazards. In simulation and hardware experiments on vision-based control tasks with a Franka manipulator, we show that our uncertainty-aware safety filter preemptively detects potential unsafe scenarios and reliably proposes safe, in-distribution actions.

While latent safety filters can compute control strategies that prevent hard-to-model failures, their training and runtime filtering rely on imagined futures generated by the latent dynamics model. However, a pretrained world model can hallucinate in uncertain scenarios where it lacks knowledge, leading to OOD failures.

Consider the simple example in Figure above where a Dubins car must avoid two failure sets: a circular grey and a rectangular purple region. The world model is trained with RGB images of the environment and angular velocity actions, but the model training data is limited, lacking knowledge of the robot entering the purple failure set. When the world model imagines an action sequence in which the robot enters this region, the world model hallucinates as soon as the scenario goes out-of-distribution: the robot teleports away from the failure region and to a safe state. This phenomenon leads to latent safety filters that cannot prevent unseen failures, and even known failures, due to optimistic safety estimates of uncertain out-of-distribution scenarios.

(Left): We quantify the world model’s epistemic uncertainty for detecting unseen failures in latent space and calibrate an uncertainty threshold via conformal prediction, resulting in an OOD failure set. (Center): Uncertainty-aware latent reachability analysis synthesizes a safety monitor and fallback policy that steers the system away from both known and OOD failures. (Right): Our safety filter reliably safeguards arbitrary task policies during hard-to-model vision-based tasks, like a teleoperator playing the game of Jenga.

We first conduct experiments with a low-dimensional, benchmark safe navigation task where privileged information about the state, dynamics, safe set, and safety controller is available.

UNISafe reliably identifies the OOD failure: To evaluate OOD detection, we first consider a setting where failure states are never observed by $\mathcal{D}_{\mathrm{train}}$. The ground-truth failure set is defined as $|p_y|>0.6$, while the offline dataset contains only 1,000 safe trajectories that never enter this region—making the failure set entirely OOD. As shown in Fig. X, our method reliably infers the OOD failure set from the quantified uncertainty, and the resulting safety value function accurately identifies the unsafe region.

UNISafe robustly learns safety filters despite high uncertainties in the world models: We evaluate whether our method can synthesize a robust safety filter under uncertainty due to limited data coverage. Here, the vehicle must avoid a circular obstacle of radius $0.5m$ at the center, with the failure set defined as $p_x^2 + p_y^2 < 0.5^2$, and $\mathcal{D}_{\mathrm{train}}$ consists of both safe and unsafe trajectories. We construct a dataset of 1,000 expert trajectories that never enter the ground-truth unsafe sets and 50 random trajectories that may include failure states. Expert trajectories are generated using the ground-truth safety value—applying fallback actions near the unsafe boundary and random actions elsewhere—inducing high uncertainty around the boundary. Fig. Y shows that UNISafe robustly learns the safety monitor with higher balanced accuracy, whereas the baseline overconfidently misclassifies unsafe states as safe. In rollouts from 181 challenging safe initial states (all oriented toward failure), UNISafe also achieves higher safety rates.

BibTeX

@article{seo2025uncertainty,
        title={Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures},
        author={Seo, Junwon and Nakamura, Kensuke and Bajcsy, Andrea},
        journal={arXiv preprint arXiv:2505.00779},
        year={2025}
      }

UNISafe: Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures

Abstract

Challenge: Unreliable WM Can Result in OOD failures.

UNISafe : UNcertainty-aware Imagination for Safety filtering

Benchmark Safe Control Task with a 3D Dubins Car

Simulation: Block Plucking

Hardware Experiments: Vision-based Jenga with a Robotic Manipulator

Video Results: Robot Jenga with Latent Safety Filters

System-level Failure Detection (Generalization with Visual OOD Inputs)

BibTeX