Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

No-ZeRO reshaping#289

Open
Muennighoff wants to merge 37 commits intomainbigscience-workshop/Megatron-DeepSpeed:mainfrom
nozero_reshapebigscience-workshop/Megatron-DeepSpeed:nozero_reshapeCopy head branch name to clipboard
Open

No-ZeRO reshaping#289
Muennighoff wants to merge 37 commits intomainbigscience-workshop/Megatron-DeepSpeed:mainfrom
nozero_reshapebigscience-workshop/Megatron-DeepSpeed:nozero_reshapeCopy head branch name to clipboard

Conversation

@Muennighoff
Copy link
Copy Markdown
Collaborator

@Muennighoff Muennighoff commented Jun 23, 2022

Should be merged first: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/239
Only adds tools/convert_checkpoint/deepspeed_to_deepspeed_nozero.py

Our small models are trained without ZeRO. This script enables reshaping of them.

Tests:

  • Loss continues where it left off after reshaping from
    • PP=4, TP=4 -> PP=2, TP=2 👍
    • PP=4, TP=4 -> PP=1, TP=1 👍
    • PP=2, TP=1 -> PP=1, TP=1 👍
  • Checkpoint size stays the same 👍

Notes:

  • I'm not doing any black formatting etc, as this is not a production codebase - Let me know if that's not okay & the code should be cleaner!

@Muennighoff Muennighoff requested review from stas00 and tjruwase June 23, 2022 12:05
@Muennighoff Muennighoff requested a review from thomasw21 July 4, 2022 08:32
@Muennighoff Muennighoff changed the base branch from ds_ckpt_reshape to main July 4, 2022 08:44
adammoody pushed a commit to adammoody/Megatron-DeepSpeed that referenced this pull request Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.