As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Incorporating auxiliary objectives into Reinforcement Learning allows agents to acquire additional knowledge, thereby increasing their search for the optimal policy. This article presents the Model-Predictor Proximal Policy Optimization (MP-PPO) algorithm, which merges the concepts of various PPO variants with a Transformer probabilistic prediction module. This model capitalizes on the time dependence inherent in energy management systems, predicting future state transitions by learning to predict certain state characteristics. Notably, our algorithm seamlessly integrates this predictive capability into the Actor-Critic architecture, avoiding the need for an external model. Through experiments on real data, we demonstrate that integrating predictive capabilities for partial state prediction improves both the sample effectiveness and efficiency of the original PPO approach without requiring exterior prior information.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.