Optimal Model Design for Reinforcement Learning

omd

JAX code for the paper “Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation”

Summary

Model based reinforcement learning typically trains the dynamics and reward functions by minimizing the error of predictions.
The error is only a proxy to maximizing the sum of rewards, the ultimate goal of the agent, leading to the objective mismatch.
We propose an end-to-end algorithm called Optimal Model Design (OMD) that optimizes the returns directly for model learning.
OMD leverages the implicit function theorem to optimize the model parameters and forms the following computational graph:

120944292-bde92500-c701-11eb-9695-17378d26440f

Please cite our work if you find it useful in your research:

@article{nikishin2021control,
  title={Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation},
  author={Nikishin, Evgenii and Abachi, Romina and Agarwal, Rishabh and Bacon, Pierre-Luc},
  journal={arXiv preprint arXiv:2106.03273},
  year={2021}
}

To finish reading, please visit source site