This repository contains examples that show how to train LLMs with reinforcement learning and how to build agents. In these examples, you will learn to:
- Trace rollouts with Weave, view prompts, outputs, rewards, etc. in one place
- Use
Weavewith popular frameworks likeTRL,verl,OpenPipe, andverifiers - Use the
OpenPipeserverless API to train models calls without hosting your own stack - Build and test agents with open models using RL, with repeatable logs and evals
| Sno | Framework | Code | |
|---|---|---|---|
| 1. | TRL | Post-training Qwen2.5 On NuminaMath Dataset | |
| 2. | TRL | Post-training with GSPO algorithm | |
| 3. | ART-Serverless RL | SQLFixer |