Design of Learning Algorithms based on Deep
Learning Techniques for a class of Non-Linear
Systems
Abstract
The control of non-linear systems is a significant challenge due to model parameter uncer-
tainties and external disturbances. Learning algorithms, particularly Deep Reinforcement
Learning (DRL), offer a promising data-driven solution. However, their application is of-
ten hindered by the manual, labour-intensive processes of designing reward functions
and hand-crafting state features. This project addresses these challenges by proposing a
framework to automate both processes. The core methodology involves first extracting
salient features from raw system data using deep learning models. These features are then
used to compute a reward function via Inverse Reinforcement Learning (IRL) from expert
demonstrations. Finally, this learned reward function is used to train a robust DRL agent.
The main objective is to ensure fast convergence and robust performance, with the goal
of creating algorithms that are generalizable to different classes of non-linear systems.
Objective
The primary objective of this project is to design, implement, and validate a novel learning
algorithm that overcomes key limitations in current DRL-based control methods. The
specific goals are:
1. To develop an integrated framework that automates both feature extraction and
reward function discovery for controlling non-linear systems.
2. To investigate deep learning techniques for unsupervised state representation learn-
ing to eliminate the need for manual feature engineering.
3. To employ Inverse Reinforcement Learning (IRL) to learn a reward function from
expert demonstrations, circumventing the reward engineering bottleneck.
4. To design the algorithm for online adaptation, allowing the agent to improve its
policy and reward model continuously from new interactions.
5. To ensure the final control policy exhibits fast convergence, robust performance
against uncertainties and disturbances, and the ability to generalize across different
non-linear systems.
Project Description and Methodology
1. Introduction to Problem
Real-world control systems are predominantly non-linear and are affected by model pa-
rameter uncertainties and external disturbances. Traditional control methods often strug-
gle in these conditions. Learning algorithms, specifically Reinforcement Learning (RL)
and Deep Reinforcement Learning (DRL), provide a powerful paradigm for developing
adaptive controllers that can learn optimal behaviour directly from interaction with the
environment.
2. Current Challenges and Proposed Approach
A major challenge in applying DRL is the need to manually specify a reward function
and a set of features that effectively represent the system’s state. This project aims to
automate this process. The proposed methodology is as follows:
• Automated Feature Extraction: The project will explore the use of deep learn-
ing models, such as encoder-decoder architectures, to automatically extract a low-
dimensional representation of salient features from high-dimensional state data.
• Reward Function Discovery via IRL: Using these automatically extracted fea-
tures, an Inverse Reinforcement Learning (IRL) algorithm will be used to compute
a reward function that explains the behavior observed in expert demonstrations.
• Policy Optimisation: The reward function learned through IRL will then be used
to train a control policy using a suitable DRL algorithm.
• Online Adaptation: The framework will be extended to an online setting. As
the agent interacts with the environment, new state samples will be added to a
replay buffer. The feature representation and reward function can be periodically
re-computed, allowing the agent to continuously refine its policy. This is expected
to improve sample efficiency and adaptability.
The main novelty lies in creating a synergistic pipeline that automates both feature and
reward engineering, with a primary objective of achieving fast convergence and robust
performance.
3. Testing and Evaluation Procedure
The developed algorithms will be implemented and tested on the non-linear Cart-Pole
control problem. The performance will be evaluated based on metrics such as conver-
gence speed and task success rate, focusing mainly on the fast convergence and robust
performance. The robustness of the final controller will be assessed by introducing pertur-
bations to the system’s physical parameters and observing the degradation in performance
and by comparing it with existing ones.
Dr. Sudhansu Kumar Mishra Prem Kumar Lohani
Associate Professor and Head BTECH/10758/22
Electrical and Electronics Engineering Electrical and Electronics Engineering
(Guide)