Skip to content

[hyperactor] channel-level ping-pong benchmarks #906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: gh/mariusae/41/base
Choose a base branch
from

Conversation

mariusae
Copy link
Member

@mariusae mariusae commented Aug 18, 2025

Stack from ghstack (oldest at bottom):

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.

Differential Revision: D80260732

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.

Differential Revision: [D80260732](https://our.internmc.facebook.com/intern/diff/D80260732/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80260732

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.

Differential Revision: [D80260732](https://our.internmc.facebook.com/intern/diff/D80260732/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80260732

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.

Differential Revision: [D80260732](https://our.internmc.facebook.com/intern/diff/D80260732/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80260732

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.

Differential Revision: [D80260732](https://our.internmc.facebook.com/intern/diff/D80260732/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80260732

shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303773788
exported-using-ghexport

Reviewed By: highker, vidhyav

Differential Revision: D80260732
shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303773788
exported-using-ghexport

Reviewed By: highker, vidhyav

Differential Revision: D80260732
This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.

Differential Revision: [D80260732](https://our.internmc.facebook.com/intern/diff/D80260732/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80260732

shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:
Pull Request resolved: meta-pytorch#906

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303699220
exported-using-ghexport

Differential Revision: D80260732

Reviewed By: highker, vidhyav
shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:
Pull Request resolved: meta-pytorch#906

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303699220
exported-using-ghexport

Differential Revision: D80260732

Reviewed By: highker, vidhyav
shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303814001
exported-using-ghexport

Reviewed By: highker, vidhyav

Differential Revision: D80260732
shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303814001
exported-using-ghexport

Reviewed By: highker, vidhyav

Differential Revision: D80260732
shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303814001
exported-using-ghexport

Reviewed By: highker, vidhyav

Differential Revision: D80260732
shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303814001
exported-using-ghexport

Reviewed By: highker, vidhyav

Differential Revision: D80260732
shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303814001
exported-using-ghexport

Reviewed By: highker, vidhyav

Differential Revision: D80260732
shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:
Pull Request resolved: meta-pytorch#906

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303699220
exported-using-ghexport

Differential Revision: D80260732

Reviewed By: highker, vidhyav
shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303814001
exported-using-ghexport

Reviewed By: highker, vidhyav

Differential Revision: D80260732
shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303814001
exported-using-ghexport

Reviewed By: highker, vidhyav

Differential Revision: D80260732
shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303814001
exported-using-ghexport

Reviewed By: highker, vidhyav

Differential Revision: D80260732
shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303814001
exported-using-ghexport

Reviewed By: highker, vidhyav

Differential Revision: D80260732
shayne-fletcher pushed a commit to shayne-fletcher/monarch-1 that referenced this pull request Aug 18, 2025
Summary:

This is an attempt to do an apples-to-apples comparison to P1903314366, to eliminate any non-channel related overheads.

The results replicate previous findings: that our throughput is hampered by excess data copies (either outright or through growing buffers in the encoding stack), and of tokio-level network i/o overheads. Both are being addressed. These benchmarks should help to serve as validation as this work lands.
ghstack-source-id: 303814001
exported-using-ghexport

Reviewed By: highker, vidhyav

Differential Revision: D80260732
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants