Skip to content

Conversation

thomasywang
Copy link
Contributor

Summary:
ProcState::Running will now store 2 fields for addresses. addr is just any address that can be used to reach this Proc meaning it may be a proxy while local_addr is the true address in which it is running. When building address books during ProcMesh initialization, what we will do is for every MeshAgent, we will pass in a slightly modified address book where the address for every one of it's peers that share the same forwarding proxy is pointed towards the true direct address, while every other proc id is bound to that proc's forwarder address, so inter-host communication will still be through a proxy.

That is for proc 2a, instead of being passed:

{
  "1a" => "1_proxy"
  "1b" => "1_proxy",
  "1c" => "1_proxy",
  "1d" => "1_proxy",
  "2b" => "2_proxy",
  "2c" => "2_proxy",
  "2d" => "2_proxy"
}

It will receive

{
  "1a" => "1_proxy"
  "1b" => "1_proxy",
  "1c" => "1_proxy",
  "1d" => "1_proxy",
  "2b" => "2b_addr",
  "2c" => "2c_addr",
  "2d" => "2d_addr"
}

The reason why we want to do this is because without it, the forwarder acts as a bottleneck within the host, and causes all communication to be serial instead of parallel.

Some example data points for perf improvement include:

  • call 1 host x 8 gpu @ 1GB 12.5s => 2.99s
  • call 8 host x 8 gpu @ 1GB 20.0s => 5.84s
  • call 64 host x 8 gpu @ 1GB 23.6s => 12.26s

Proof of parallelism (VPN needed): https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html#!/?url=https://interncache-all.fbcdn.net/manifold/perfetto_internal_traces%2Ftree%2Fshared_trace%2Fthomasywang_2070d1f3-3b42-48cd-bde1-20460b3850cf_tmpt17nblpf.json

Differential Revision: D84032776

Summary:
ProcState::Running will now store 2 fields for addresses. `addr` is just any address that can be used to reach this Proc meaning it may be a proxy while `local_addr` is the true address in which it is running. When building address books during ProcMesh initialization, what we will do is for every MeshAgent, we will pass in a slightly modified address book where the address for every one of it's peers that share the same forwarding proxy is pointed towards the true direct address, while every other proc id is bound to that proc's forwarder address, so inter-host communication will still be through a proxy.

That is for proc 2a, instead of being passed:
```
{
  "1a" => "1_proxy"
  "1b" => "1_proxy",
  "1c" => "1_proxy",
  "1d" => "1_proxy",
  "2b" => "2_proxy",
  "2c" => "2_proxy",
  "2d" => "2_proxy"
}
```
It will receive
```:
{
  "1a" => "1_proxy"
  "1b" => "1_proxy",
  "1c" => "1_proxy",
  "1d" => "1_proxy",
  "2b" => "2b_addr",
  "2c" => "2c_addr",
  "2d" => "2d_addr"
}
```


The reason why we want to do this is because without it, the forwarder acts as a bottleneck within the host, and causes all communication to be serial instead of parallel.

Some example data points for perf improvement include:
- call 1 host x 8 gpu @ 1GB 12.5s => 2.99s
- call 8 host x 8 gpu @ 1GB 20.0s => 5.84s
- call 64 host x 8 gpu @ 1GB 23.6s => 12.26s

Proof of parallelism (VPN needed): https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html#!/?url=https://interncache-all.fbcdn.net/manifold/perfetto_internal_traces%2Ftree%2Fshared_trace%2Fthomasywang_2070d1f3-3b42-48cd-bde1-20460b3850cf_tmpt17nblpf.json

Differential Revision: D84032776
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 7, 2025
Copy link

meta-codesync bot commented Oct 7, 2025

@thomasywang has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84032776.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant