Skip to content

Conversation

danielvegamyhre
Copy link
Contributor

@danielvegamyhre danielvegamyhre commented Oct 17, 2025

Stacked PRs:


[mxfp8 moe training] bench and profile mxfp8 a2a fwd and bwd separately

Single node benchmark on 4xB200 over NVLink shows perf is flat or slightly slower in this setting.

I expect speedups for multi-node EP though, where IB bandwidth is much more constrained than NVLink.

input_shape         num_splits    fwd_bf16_ms    fwd_mxfp8_ms    bwd_bf16_ms    bwd_mxfp8_ms
----------------  ------------  -------------  --------------  -------------  --------------
(1, 8192, 5120)              4       0.269697        0.479122       0.695028        0.993789
(2, 8192, 5120)              4       0.347715        0.468697       0.791324        0.872646
(4, 8192, 5120)              4       0.593996        0.585684       1.28176         1.24674
(8, 8192, 5120)              4       1.53808         1.03233        2.38809         2.3224
(16, 8192, 5120)             4       1.77031         1.8789         4.36899         4.46669

Copy link

pytorch-bot bot commented Oct 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3203

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job, 1 Unrelated Failure

As of commit 19a4679 with merge base b644211 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

danielvegamyhre added a commit that referenced this pull request Oct 17, 2025
stack-info: PR: #3203, branch: danielvegamyhre/stack/81
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/81 branch from 2ab06e3 to da3608b Compare October 17, 2025 17:23
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 17, 2025
@danielvegamyhre danielvegamyhre added mx topic: not user facing Use this tag if you don't want this PR to show up in release notes moe labels Oct 17, 2025
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/80 to main October 17, 2025 21:04
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/81 branch from da3608b to e6959d4 Compare October 17, 2025 21:05
danielvegamyhre added a commit that referenced this pull request Oct 17, 2025
stack-info: PR: #3203, branch: danielvegamyhre/stack/81
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/80 October 17, 2025 21:05
stack-info: PR: #3203, branch: danielvegamyhre/stack/81
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/80 to main October 17, 2025 22:02
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/81 branch from e6959d4 to 19a4679 Compare October 17, 2025 22:02
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/80 October 17, 2025 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. moe mx topic: not user facing Use this tag if you don't want this PR to show up in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant