This repository implements a Reinforcement Learning (RL) agent to solve the optimal execution problem in algorithmic trading using Deep Q-Learning (DQN). The paper aims to rigorously explore Value based, Deep Q-Learning (DQN) Reinforcment Learning for the problem of Optimal Trade Execution. The problem of Optimal Trade Execution focuses on to find the the optimal "path" of executing a stock order, minimising the price impact from the market, and maximising consequent revenue from execution of a stock order.
The results and outcomes of the paper reflects that under a simple environment, where the optimal execution strategy is known as (TWAP), the RL agent is able to very closely and accurately approximate the same solution. As a repercussion, this may set as a proof of concept that under a larger environment, the RL has the potential to learn more patterns relevant to the problem, like stock price prediction or external predatory trading, which would allow it to outperform the TWAP in practical scenarios. It replicates and builds upon the paper:
Optimal Execution with Reinforcement Learning
Yadh Hafsi, Edoardo Vittori (2024) — arXiv:2411.06389
Traders executing large orders must minimize implementation shortfall, i.e., the total cost of executing trades across time, while accounting for market impact.
This RL agent learns an execution strategy that beats fixed strategies like:
- TWAP (Time-Weighted Average Price)
- Aggressive buying
- Passive execution
- Random actions
git clone https://github.com/yourusername/optimal-execution-rl.git
cd optimal-execution-rl
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
- Training stabilizes after ~100 episodes.
- Moving average flattens, suggesting convergence.
Strategy | Mean Reward | Std Dev |
---|---|---|
TWAP | -2400.00 | ≈ 0.00 |
Passive | -2403.36 | 2.74 |
Aggressive | -2407.20 | ≈ 0.00 |
Random | -2402.64 | 2.68 |
RL Agent | -2402.42 | 0.00 |
✅ RL Agent delivers more consistent and slightly better performance than all baselines.
- % Time left
- % Inventory left
- Limit order book imbalance (top 5 levels)
- Best bid/ask
- Discrete: {0, Qmin, 2×Qmin, 3×Qmin, 4×Qmin}
- Penalizes market impact (execution price - bid)
- Penalizes depth (larger orders incur cost)
- Final penalty for unexecuted inventory
- Deep Q-Network (DQN)
- ε-greedy policy (decaying)
- Experience replay + target network
- Trained for 200 episodes
Strategy | Description |
---|---|
TWAP | Fixed trade size at each time interval |
Passive | 60% chance to do nothing, else random |
Aggressive | Always executes 2 × Qmin per timestep |
Random | Chooses random action from action space |
RL Agent | Learns optimal policy from environment |
hft-optimal-execution-dqn
|____notebooks/
| |__optimal_execution_rl_implementation.ipynb ← Full implementation in Jupyter Notebook
| |__optimal_execution_rl_implementation.py
| |__aggressive.ipynb
| |__Passive.ipynb
| |__random.ipynb
| |__ TWAP.ipynb
|
|____baseline/
| |__aggressive.py
| |__Passive.py
| |__random.py
| |__TWAP.py
|
|____docs/
| |__hft_interview_prep.md
| |__paper_summary.md
|
|____src/
| |__environment.py ← Custom Gym environment for optimal execution
| |__agents.py ← Deep Q-Network model, training loop, replay buffer
| |__utils.py ← Evaluation logic and plotting functions
|
|____images/
| |__reward_distribution.png ← Output charts and visualizations
| |__strategy_comparison.png
|
|___README.md ← Detailed project documentation
|___requirements.txt ← List of Python dependencies
|___.gitignore ← Files and folders to be excluded from Git
|___LICENSE ← MIT license file
If you use this code, please cite:
@misc{hafsi2024optimal,
title={Optimal Execution with Reinforcement Learning},
author={Yadh Hafsi and Edoardo Vittori},
year={2024},
eprint={2411.06389},
archivePrefix={arXiv},
primaryClass={q-fin.TR}
}
This project is open-sourced under the MIT License. See LICENSE for details.
Contributions to this project are whole heartedly welcomed and encouraged. Whether it's proposing new features, improving the code structure, enhancing documentation, or recommending improvements to the methodology, your input is highly valued.
Please ensure that any contributions align with the goals of the project and maintain the clarity, reproducibility, and research integrity that underpin this work. All contributions, regardless of size, help strengthen the project and improve the robustness of the implementation.
If you wish to contribute, you're encouraged for thoughtful collaboration, respectful discussion, and a commitment to shared learning. Thank you for helping make this project better for everyone.
For questions, open an issue or reach out via ✉️alqama043@gmail.com