Closed
Description
🐛 Describe the bug
pytorch/torch/distributed/distributed_c10d.py
Line 2440 in 8c167f9
This line should pass in the group so that it gets the rank in the group instead of the global rank. Otherwise, check's for is_coordinator
in sharded checkpoint loading will fail.
Versions
Nightly
cc @ezyang @gchanan @zou3519 @kadeng @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @LucasLLC
Metadata
Metadata
Assignees
Labels
Add this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queue