Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

analog-rl/modular_rl

Open more actions menu
 
 

Repository files navigation

reproduced MoutainCar - https://gym.openai.com/evaluations/eval_8JaqvFVMSoKViViGPIR8Ig

This repository implements several algorithms:

  • Trust Region Policy Optimization [1]
  • Proximal Policy Optimization (i.e., TRPO, but using a penalty instead of a constraint on KL divergence), where each subproblem is solved with either SGD or L-BFGS
  • Cross Entropy Method

TRPO and PPO are implemented with neural-network value functions and use GAE [2].

This library is written in a modular way to allow for sharing code between TRPO and PPO variants, and to write the same code for different kinds of action spaces.

Dependencies:

  • keras (1.0.1)
  • theano (0.8.2)
  • tabulate
  • numpy
  • scipy

To run the algorithms implemented here, you should put modular_rl on your PYTHONPATH, or run the scripts (e.g. run_pg.py) from this directory.

Good parameter settings can be found in the experiments directory.

You can learn about the various parameters by running one of the experiment scripts with the -h flag, but providing the (required) env and agent parameters. (Those parameters determine what other parameters are available.) For example, to see the parameters of TRPO,

./run_pg.py --env CartPole-v0 --agent modular_rl.agentzoo.TrpoAgent -h

To the the parameters of CEM,

./run_cem.py --env=Acrobot-v0 --agent=modular_rl.agentzoo.DeterministicAgent  --n_iter=2

[1] JS, S Levine, P Moritz, M Jordan, P Abbeel, "Trust region policy optimization." arXiv preprint arXiv:1502.05477 (2015).

[2] JS, P Moritz, S Levine, M Jordan, P Abbeel, "High-dimensional continuous control using generalized advantage estimation." arXiv preprint arXiv:1506.02438 (2015).

About

Implementation of TRPO and related algorithms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.3%
  • Shell 0.7%
Morty Proxy This is a proxified and sanitized view of the page, visit original site.