Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

EmptyJackson/unifloral

Open more actions menu

Repository files navigation

🌹 Unifloral: Unified Offline Reinforcement Learning

Unified implementations and rigorous evaluation for offline reinforcement learning - built by Matthew Jackson, Uljad Berdica, and Jarek Liesen.

💡 Code Philosophy

  • ⚛️ Single-file: We implement algorithms as standalone Python files.
  • 🤏 Minimal: We only edit what is necessary between algorithms, making comparisons straightforward.
  • ⚡️ GPU-accelerated: We use JAX and end-to-end compile all training code, enabling lightning-fast training.

Inspired by CORL and CleanRL - check them out!

🤖 Algorithms

We provide two types of algorithm implementation:

  1. Standalone: Each algorithm is implemented as a single file with minimal dependencies, making it easy to understand and modify.
  2. Unified: Most algorithms are available as configs for our unified implementation unifloral.py.

After training, final evaluation results are saved to .npz files in final_returns/ for analysis using our evaluation protocol.

All scripts support D4RL and use Weights & Biases for logging, with configs provided as WandB sweep files.

Model-free

Algorithm Standalone Unified Extras
BC bc.py unifloral/bc.yaml -
SAC-N sac_n.py unifloral/sac_n.yaml [ArXiv]
EDAC edac.py unifloral/edac.yaml [ArXiv]
CQL cql.py - [ArXiv]
IQL iql.py unifloral/iql.yaml [ArXiv]
TD3-BC td3_bc.py unifloral/td3_bc.yaml [ArXiv]
ReBRAC rebrac.py unifloral/rebrac.yaml [ArXiv]
TD3-AWR - unifloral/td3_awr.yaml [ArXiv]

Model-based

We implement a single script for dynamics model training: dynamics.py, with config dynamics.yaml.

Algorithm Standalone Unified Extras
MOPO mopo.py - [ArXiv]
MOReL morel.py - [ArXiv]
COMBO combo.py - [ArXiv]
MoBRAC - unifloral/mobrac.yaml [ArXiv]

New ones coming soon 👀

📊 Evaluation

Our evaluation script (evaluation.py) implements the protocol described in our paper, analysing the performance of a UCB bandit over a range of policy evaluations.

from evaluation import load_results_dataframe, bootstrap_bandit_trials
import jax.numpy as jnp

# Load all results from the final_returns directory
df = load_results_dataframe("final_returns")

# Run bandit trials with bootstrapped confidence intervals
results = bootstrap_bandit_trials(
    returns_array=jnp.array(policy_returns),  # Shape: (num_policies, num_rollouts)
    num_subsample=8,     # Number of policies to subsample
    num_repeats=1000,    # Number of bandit trials
    max_pulls=200,       # Maximum pulls per trial
    ucb_alpha=2.0,       # UCB exploration coefficient
    n_bootstraps=1000,   # Bootstrap samples for confidence intervals
    confidence=0.95      # Confidence level
)

# Access results
pulls = results["pulls"]                      # Number of pulls at each step
means = results["estimated_bests_mean"]       # Mean score of estimated best policy
ci_low = results["estimated_bests_ci_low"]    # Lower confidence bound
ci_high = results["estimated_bests_ci_high"]  # Upper confidence bound

📝 Cite us!

@misc{jackson2025clean,
      title={A Clean Slate for Offline Reinforcement Learning},
      author={Matthew Thomas Jackson and Uljad Berdica and Jarek Liesen and Shimon Whiteson and Jakob Nicolaus Foerster},
      year={2025},
      eprint={2504.11453},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2504.11453},
}

About

Unified Implementations of Offline Reinforcement Learning Algorithms

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5

Morty Proxy This is a proxified and sanitized view of the page, visit original site.