[DSD] Unify the API signatures of set_model_state_dict and set_optimizer_state_dict #127384

fegin · May 29, 2024

Stack from ghstack (oldest at bottom):

[DSD] Remove the unused submodule feature #127604
[DSD] Make distributed state_dict support torch.distributed is not initialized case #127385
-> [DSD] Unify the API signatures of set_model_state_dict and set_optimizer_state_dict #127384
[DSD] Support flattening the optimizer state_dict when saving and unflattening when loading #127071
[DSD] Remove the support of Dict[nn.Module, Dict[str, Any]] state_dict #127070

Summary:
Allow the optim_state_dict argument to be a positional argument. This make sense since this is a required argument and this will make the function signature the consistent as set_model_state_dict without causing BC issues.

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @LucasLLC

[ghstack-poisoned]

pytorch-bot · May 29, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127384

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Upgrade MacOS runner to 14

✅ No Failures

As of commit 7ff00e4 with merge base a60b06b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

wz337

LGTM

fegin · May 31, 2024

@pytorchbot merge

pytorchmergebot · May 31, 2024

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…itialized case (#127385) Fixes #124942 Summary: Allow DSD to support loading the regular optimizer state_dict and can be used when torch.distributed.is_initialized() is False. Pull Request resolved: #127385 Approved by: https://github.com/wz337 ghstack dependencies: #127070, #127071, #127384

…zer_state_dict (#127384) Summary: Allow the optim_state_dict argument to be a positional argument. This make sense since this is a required argument and this will make the function signature the consistent as set_model_state_dict without causing BC issues. Pull Request resolved: #127384 Approved by: https://github.com/wz337 ghstack dependencies: #127070, #127071 (cherry picked from commit 8b4ad3a)

…itialized case (#127385) Fixes #124942 Summary: Allow DSD to support loading the regular optimizer state_dict and can be used when torch.distributed.is_initialized() is False. Pull Request resolved: #127385 Approved by: https://github.com/wz337 ghstack dependencies: #127070, #127071, #127384 (cherry picked from commit 64c581a)

…zer_state_dict (pytorch#127384) Summary: Allow the optim_state_dict argument to be a positional argument. This make sense since this is a required argument and this will make the function signature the consistent as set_model_state_dict without causing BC issues. Pull Request resolved: pytorch#127384 Approved by: https://github.com/wz337 ghstack dependencies: pytorch#127070, pytorch#127071

…itialized case (pytorch#127385) Fixes pytorch#124942 Summary: Allow DSD to support loading the regular optimizer state_dict and can be used when torch.distributed.is_initialized() is False. Pull Request resolved: pytorch#127385 Approved by: https://github.com/wz337 ghstack dependencies: pytorch#127070, pytorch#127071, pytorch#127384

Update

5fb3b6f

[ghstack-poisoned]

pytorch-bot bot added module: distributed_checkpoint oncall: distributed Add this issue/PR to distributed oncall triage queue labels May 29, 2024

fegin added release notes: distributed (checkpoint) ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels May 29, 2024

fegin requested review from LucasLLC and wz337 May 29, 2024 08:06

fegin added the suppress-bc-linter Suppresses the failures of API backward-compatibility linter (Lint/bc_linter) label May 29, 2024

Update

7ff00e4

[ghstack-poisoned]

wz337 approved these changes May 30, 2024

View reviewed changes

fegin mentioned this pull request May 31, 2024

[DSD] Remove the unused submodule feature #127604

Closed

pytorchmergebot added the merging label May 31, 2024

pytorchmergebot added the Merged label May 31, 2024

pytorchmergebot closed this in 8b4ad3a May 31, 2024

pytorchmergebot removed the merging label May 31, 2024

github-actions bot deleted the gh/fegin/246/head branch July 1, 2024 02:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DSD] Unify the API signatures of set_model_state_dict and set_optimizer_state_dict #127384

[DSD] Unify the API signatures of set_model_state_dict and set_optimizer_state_dict #127384

Uh oh!

fegin commented May 29, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 29, 2024 •

edited

Loading

Uh oh!

wz337 left a comment

Uh oh!

fegin commented May 31, 2024

Uh oh!

pytorchmergebot commented May 31, 2024

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

[DSD] Unify the API signatures of set_model_state_dict and set_optimizer_state_dict #127384

[DSD] Unify the API signatures of set_model_state_dict and set_optimizer_state_dict #127384

Uh oh!

Conversation

fegin commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127384

❗ 1 Active SEVs

✅ No Failures

Uh oh!

wz337 left a comment

Choose a reason for hiding this comment

Uh oh!

fegin commented May 31, 2024

Uh oh!

pytorchmergebot commented May 31, 2024

Merge started

Uh oh!

Uh oh!

fegin commented May 29, 2024 •

edited

Loading

pytorch-bot bot commented May 29, 2024 •

edited

Loading