🌹 Unifloral: Unified Offline Reinforcement Learning

Unified implementations and rigorous evaluation for offline reinforcement learning - built by Matthew Jackson, Uljad Berdica, and Jarek Liesen.

💡 Code Philosophy

⚛️ Single-file: We implement algorithms as standalone Python files.
🤏 Minimal: We only edit what is necessary between algorithms, making comparisons straightforward.
⚡️ GPU-accelerated: We use JAX and end-to-end compile all training code, enabling lightning-fast training.

Inspired by CORL and CleanRL - check them out!

🤖 Algorithms

We provide two types of algorithm implementation:

Standalone: Each algorithm is implemented as a single file with minimal dependencies, making it easy to understand and modify.
Unified: Most algorithms are available as configs for our unified implementation unifloral.py.

After training, final evaluation results are saved to .npz files in final_returns/ for analysis using our evaluation protocol.

All scripts support D4RL and use Weights & Biases for logging, with configs provided as WandB sweep files.

Model-free

Algorithm	Standalone	Unified	Extras
BC	`bc.py`	`unifloral/bc.yaml`	-
SAC-N	`sac_n.py`	`unifloral/sac_n.yaml`	[ArXiv]
EDAC	`edac.py`	`unifloral/edac.yaml`	[ArXiv]
CQL	`cql.py`	-	[ArXiv]
IQL	`iql.py`	`unifloral/iql.yaml`	[ArXiv]
TD3-BC	`td3_bc.py`	`unifloral/td3_bc.yaml`	[ArXiv]
ReBRAC	`rebrac.py`	`unifloral/rebrac.yaml`	[ArXiv]
TD3-AWR	-	`unifloral/td3_awr.yaml`	[ArXiv]

Model-based

We implement a single script for dynamics model training: dynamics.py, with config dynamics.yaml.

Algorithm	Standalone	Unified	Extras
MOPO	`mopo.py`	-	[ArXiv]
MOReL	`morel.py`	-	[ArXiv]
COMBO	`combo.py`	-	[ArXiv]
MoBRAC	-	`unifloral/mobrac.yaml`	[ArXiv]

New ones coming soon 👀

📊 Evaluation

Our evaluation script (evaluation.py) implements the protocol described in our paper, analysing the performance of a UCB bandit over a range of policy evaluations.

from evaluation import load_results_dataframe, bootstrap_bandit_trials
import jax.numpy as jnp

# Load all results from the final_returns directory
df = load_results_dataframe("final_returns")

# Run bandit trials with bootstrapped confidence intervals
results = bootstrap_bandit_trials(
    returns_array=jnp.array(policy_returns),  # Shape: (num_policies, num_rollouts)
    num_subsample=8,     # Number of policies to subsample
    num_repeats=1000,    # Number of bandit trials
    max_pulls=200,       # Maximum pulls per trial
    ucb_alpha=2.0,       # UCB exploration coefficient
    n_bootstraps=1000,   # Bootstrap samples for confidence intervals
    confidence=0.95      # Confidence level
)

# Access results
pulls = results["pulls"]                      # Number of pulls at each step
means = results["estimated_bests_mean"]       # Mean score of estimated best policy
ci_low = results["estimated_bests_ci_low"]    # Lower confidence bound
ci_high = results["estimated_bests_ci_high"]  # Upper confidence bound

📝 Cite us!

@misc{jackson2025clean,
      title={A Clean Slate for Offline Reinforcement Learning},
      author={Matthew Thomas Jackson and Uljad Berdica and Jarek Liesen and Shimon Whiteson and Jakob Nicolaus Foerster},
      year={2025},
      eprint={2504.11453},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2504.11453},
}

Name	Name	Last commit message	Last commit date
Latest commit History 11 Commits
algorithms	algorithms
configs	configs
.gitignore	.gitignore
.pylintrc	.pylintrc
Dockerfile	Dockerfile
LICENSE	LICENSE
README.md	README.md
evaluation.py	evaluation.py
evaluation_plots.ipynb	evaluation_plots.ipynb
requirements.txt	requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌹 Unifloral: Unified Offline Reinforcement Learning

💡 Code Philosophy

🤖 Algorithms

Model-free

Model-based

📊 Evaluation

📝 Cite us!

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

License

EmptyJackson/unifloral

Folders and files

Latest commit

History

Repository files navigation

🌹 Unifloral: Unified Offline Reinforcement Learning

💡 Code Philosophy

🤖 Algorithms

Model-free

Model-based

📊 Evaluation

📝 Cite us!

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages