This repository contains the code for Softmax Policy Mirror Ascent (SPMA), a policy optimization algorithm based on mirror ascent in the space of logits, using the log-sum-exp mirror map. The repository includes scripts that can be integrated into stable-baselines3 to reproduce the experiments from the AISTATS 2025 paper Fast Convergence of Softmax Policy Mirror Ascent.
The installation is identical to stable-baselines3 (Pytorch version), so no additional steps are required.