Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

v0.2 release

Compare
Choose a tag to compare
Loading
@eric-haibin-lin eric-haibin-lin released this 15 Feb 15:18
· 1304 commits to main since this release
828df7e

Highlights

New algorithms and features

Performance optimization:

  • Remove padding tokens (i.e. sequence packing). Significant throughput increase expected for Llama, Mistral, Gemma, Qwen2 transformer models. Documentation
actor_rollout_ref.model.use_remove_padding=True
critic.model.use_remove_padding=True
  • Dynamic batch size. Significant throughput increase for variable length sequences. Documentation and example
actor_rollout_ref.actor.ppo_max_token_len_per_gpu
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu
critic.ppo_max_token_len_per_gpu
critic.forward_micro_batch_size_per_gpu
reward_model.forward_micro_batch_size_per_gpu
actor_rollout_ref.actor.ulysses_sequence_parallel_size
critic.ulysses_sequence_parallel_size
reward_model.ulysses_sequence_parallel_size
  • vllm v0.7+ integration (preview). For the qwen2 ppo example, 25% time reduction in rollout compared to v0.6.3, and 45% time reduction when cuda graph is enabled. Documentation
actor_rollout_ref.rollout.enforce_eager=False
actor_rollout_ref.rollout.free_cache_engine=False
model.use_liger=True

Changelog

New Features

  1. Algorithm Support:

    • Added support for GRPO algorithm (#124).
    • Implemented REINFORCE++ algorithm (#228).
    • Added ReMax algorithm (#234)
  2. Performance Improvements:

    • Enabled dynamic batch size support (#118).
    • Added meta device initialization and parallel load for FSDP to avoid OOMs during init (#123).
    • Improved gradient accumulation in sequence balance (#141).
    • Added ref/RM offload support (#121).
    • Added LoRA support for SFT (#127).
    • feat: spport rmpad/data-packing in FSDP with transformers (#91)
    • Liger kernel integration (#133)
  3. Experiment Tracking:

    • Integrated SwanLab for experiment tracking with online/offline mode and local dashboard support (#218).
    • Added Mlflow support (#74).

Bug Fixes

  1. Critical Fixes:

    • Fixed checkpoint save with existing directories (#174).
    • Fixed incorrect response_attention_mask in vLLM rollout (#213).
    • Fixed gradient accumulation loss value (#102).
    • Fixed reward model issues with TokenClassification models (#99).
  2. Code Fixes:

    • Fixed redundant non_zero_mask (#152).
    • Fixed validation dp_size (#90).
    • Fixed response_mask index (#60).

Improvements

  1. Performance:

    • Improved memory efficiency in logprobs_from_logits_v2 (#220).
    • Enabled multiprocess dataloader in SFT trainer (#122).
    • Added MFU calculation support (#117).
  2. Miscellaneous:

    • Added option to log validation generations to wandb (#177).

Deprecations and Breaking Changes

  1. Breaking Changes:
    • Changed micro_batch_size to micro_batch_size_per_gpu (#136).
    • Removed @ray.remote on workers to allow inheritance (#61).
    • Refactored old_log_prob into a separate function (#129).

Contributors

A big thank you to all the contributors who made this release possible:
@zhanluxianshen @xingyaoww @fzyzcjy @emergenz @openhands-agent @ZSL98 @YSLIU627 @ZefanW @corbt @jaysonfrancis @hiyouga @Jiayi-Pan @hongpeng-guo @eltociear @chujiezheng @PanAndy @zwhe99 @pcmoritz @huiyeruzhou @VPeterV @uygnef @zhiqi-0 @ExtremeViscent @liziniu @nch0w @Cppowboy @TonyLianLong @4332001876 @tyler-romero @ShaohonChen @kinman0224 @willem-bd @bebetterest @WeiXiongUST @dignfei


Pypi package will be soon available! Please let us know on Github if there's a problem extending RL training recipe based on the pip installed version fo verl.

Full Changelog: v0.1...v0.2

Morty Proxy This is a proxified and sanitized view of the page, visit original site.