Deep-Reinforcement-Learning

Deep Reinforcement Learning Algorithms and Code - Explanations of research papers and their implementations (All algorithm implementations are done in Pytorch)

REINFORCE: Vanilla Policy Gradient
DQN: Deep Q-Learning, Mnih et al, 2013
A3C/A2C: Asynchronous methods for Deep RL,Mnih et al, 2016
PPO: Proximal Policy Optimization,Schulman et al, 2017
DDPG: Deep Deterministic Policy Gradient,Lillicrap et al, 2015

(Folder General: General tips on Deep reinforcement Learning)

From Open AI "Spinning Up as a Deep RL Researcher (or Practitioner)".: How to start in Deep RL assuming you've got a solid background in Mathematics(1,2), a general knowledge of Deep Learning and are familiar with at least one Deep Learning Library (Like PyTorch or TensorFlow):

Which algorithms? You should probably start with vanilla policy gradient (also called REINFORCE), DQN, A2C (the synchronous version of A3C), PPO (the variant with the clipped objective), and DDPG, approximately in that order. The simplest versions of all of these can be written in just a few hundred lines of code (ballpark 250-300), and some of them even less (for example, a no-frills version of VPG can be written in about 80 lines). Write single-threaded code before you try writing parallelized versions of these algorithms. (Do try to parallelize at least one.)

Further Algorithms to study (Suggested at Open AI Hackathon):

How to study the RL Algorithms

Start with the most simple algorithm (REINFORCE). First read the paper carefully. Then read the implementation and try to rewrite the code from scratch. Take care not to overfit on implementation details or on paper details.

Notes

My framework of choice is Pytorch which is covered by a free licence ( Modified BSD license).

The implementations were taken from various sources with a focus on simplicity and ease of understanding (including Udacity's repository for the Deep Reinforcement Learning Nanodegree). There are numerous implementations available including very good modular ones but my purpose is mastering the RL theory and algorithms. Creating modular code is a secondary goal.

There are minor corrections on the implementations with the aim of making them easier to understand and consistent.

Name	Name	Last commit message	Last commit date
Latest commit History 107 Commits
A3C-A2C	A3C-A2C
AWS DeepRacer Competition	AWS DeepRacer Competition
DDPG	DDPG
DQN	DQN
General	General
PPO	PPO
Practice/p1 - Banana Env - pixels	Practice/p1 - Banana Env - pixels
REINFORCE	REINFORCE
Temporal Difference	Temporal Difference
assets	assets
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep-Reinforcement-Learning

How to study the RL Algorithms

Notes

Sources

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

License

spirosrap/Deep-Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

Deep-Reinforcement-Learning

How to study the RL Algorithms

Notes

Sources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages