ML-IRL is an algorithm for inverse reinforcement learning that is discussed in the Neurips paper link and AISTAT paper link
You can download our expert data from the google_drive
- PyTorch 1.5+
- OpenAI Gym
- MuJoCo
pip install ruamel.yaml
- ML-IRL (our method):
ml/ - SAC agent:
common/ - Environments:
envs/ - Configurations:
configs/
- All the experiments are to be run under the root folder.
- Before starting experiments, please
export PYTHONPATH=${PWD}:$PYTHONPATHfor env variable. - We use yaml files in
configs/for experimental configurations, Please changeobjvalue (in the first line) for each method, here is the list ofobjvalues:- Our methods (ML-IRL): ML_S:
maxentirl, ML_SA:maxentirl_sa
- Our methods (ML-IRL): ML_S:
- After running, you will see the training logs in
logs/folder.
All the commands below are also provided in run.sh.
First, you can generate expert data by training expert policy:
python common/train_gd.py configs/samples/experts/{env}.yml # env is in {hopper, walker2d, halfcheetah, ant}
python common/collect.py configs/samples/experts/{env}.yml # env is in {hopper, walker2d, halfcheetah, ant}Then train our method with the provided expert data method (Policy Performance).
# you can vary obj in {`maxentirl_sa`, `maxentirl`}
python ml/irl_samples.py configs/samples/agents/{env}.ymlFirst, you can generate expert data by training expert policy.
Make sure that the env_name parameter in configs/samples/experts/ant_transfer.yml is set to CustomAnt-v0
python common/train_gd.py configs/samples/experts/ant_transfer.yml
python common/collect.py configs/samples/experts/ant_transfer.ymlAfter the training is done, you can choose one of the saved reward model to train a policy from scratch (Recovering the Stationary Reward Function).
Transferring the reward to disabled Ant
python common/train_optimal.py configs/samples/experts/ant_transfer.yml
python ml/irl_samples.py configs/samples/agents/data_transfer.yml(data transfer)