Skip to content

Conversation

msaroufim
Copy link
Member

@msaroufim msaroufim commented Aug 25, 2025

Replaced PyTorch's deprecated DistributedOptimizer with manual optimizer management using torch.compile to fix the TorchScript deprecation warning so optimizers on each remote worker and manually called step() and zero_grad() through RPC. The remaining warnings (ProcessGroupGloo and NetworkX) I didn't know how to fix at user code. Impressed at Claude for figuring this out.

Output is now clean

(create) ➜  pipeline git:(main) ✗ python main.py
[Gloo] Rank 0 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 2 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 1 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
Processing batch 0
Processing batch 1
Processing batch 2
number of splits = 1, execution time = 24.637782335281372
[Gloo] Rank 0 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 1 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 2 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
Processing batch 0
Processing batch 1
Processing batch 2
number of splits = 2, execution time = 19.14631748199463
[Gloo] Rank 0 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 2 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 1 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
Processing batch 0
Processing batch 1
Processing batch 2
number of splits = 4, execution time = 15.70963716506958
[Gloo] Rank 1 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 0 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 2 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
Processing batch 0
Processing batch 1
Processing batch 2
number of splits = 8, execution time = 11.398766994476318

@meta-cla meta-cla bot added the cla signed label Aug 25, 2025
Copy link

netlify bot commented Aug 25, 2025

Deploy Preview for pytorch-examples-preview canceled.

Name Link
🔨 Latest commit f979909
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-examples-preview/deploys/68acb887f1608700082c9ceb

@msaroufim msaroufim changed the title Modernize RPC example Modernize distributed/rpc/pipeline Aug 25, 2025
@msaroufim msaroufim merged commit 746c0a2 into main Aug 25, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant