Skip to content

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Jul 24, 2025

Summary

We want to distribute jobs to multiple private device pools now that we have have more private devices on AWS mainly:

  • 5 Samsung Galaxy S22 5G
  • 4 Apple iPhone 15 Pro Max

A simple round-robin distribution algorithm will allocate different benchmark configs to devices of the same type from different pools. To achieve this, I refactor .ci/scripts/gather_benchmark_configs.py to introduce the concept of device variant in which the device name will be in the format DEVICE_NAME+VARIANT, for example samsung_galaxy_s22+private or apple_iphone_15+ios_18_public. Each can have more than one device pools.

I also re-enable the benchmark jobs on private iOS devices now that we have more of them to use.

Test plan

Copy link

pytorch-bot bot commented Jul 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12832

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit 7384b78 with merge base 0479dcd (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 24, 2025
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@huydhn huydhn temporarily deployed to upload-benchmark-results July 24, 2025 22:13 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to upload-benchmark-results July 24, 2025 22:16 — with GitHub Actions Inactive
@huydhn huydhn changed the title Support distributing jobs to multiple private device pools Distribute jobs to multiple private device pools Jul 24, 2025
@huydhn huydhn temporarily deployed to upload-benchmark-results July 24, 2025 22:55 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to upload-benchmark-results July 24, 2025 23:37 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to upload-benchmark-results July 25, 2025 00:16 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to upload-benchmark-results July 25, 2025 01:48 — with GitHub Actions Inactive
Signed-off-by: Huy Do <[email protected]>
@huydhn huydhn requested review from guangy10 and yangw-dev July 25, 2025 02:20
@huydhn huydhn marked this pull request as ready for review July 25, 2025 02:20
@huydhn huydhn temporarily deployed to upload-benchmark-results July 25, 2025 02:40 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to upload-benchmark-results July 25, 2025 02:43 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to upload-benchmark-results July 25, 2025 02:45 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to upload-benchmark-results July 25, 2025 03:28 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to upload-benchmark-results July 25, 2025 06:39 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to upload-benchmark-results July 25, 2025 07:16 — with GitHub Actions Inactive
@huydhn
Copy link
Contributor Author

huydhn commented Jul 25, 2025

Let me take a look at the failures from https://github.com/pytorch/executorch/actions/runs/16513751841/job/46701693809 before landing this. I wasn't expect to see them there

@huydhn huydhn temporarily deployed to upload-benchmark-results July 29, 2025 19:06 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to upload-benchmark-results July 29, 2025 19:09 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to upload-benchmark-results July 29, 2025 19:44 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to upload-benchmark-results July 29, 2025 19:52 — with GitHub Actions Inactive
@huydhn huydhn merged commit 70ca29e into main Jul 29, 2025
246 of 249 checks passed
@huydhn huydhn deleted the distribute-jobs-to-private-pools branch July 29, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants