Use linux.4xlarge.memory instead of linux.12xlarge #6896

huydhn · 2024-11-15T20:44:01Z

linux.4xlarge.memory has 128 GB of memory with 16 CPU cores while linux.12xlarge, a more expensive runner, has only 96 GB of memory on a whopping 48 CPU cores.

Testing

Example runs for:

dl3 https://github.com/pytorch/executorch/actions/runs/11863238941/job/33064335597?pr=6896 (20m vs 19m in trunk)
emformer_predict https://github.com/pytorch/executorch/actions/runs/11863238941/job/33064336524?pr=6896 (1h30m vs 1h20m in trunk)

pytorch-bot · 2024-11-15T20:44:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6896

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 8204737 with merge base ec68eb3 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / test-coreml-delegate / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guangy10

@huydhn Before merging can you link the comparison of job execution time between using linux.4xlarge.memory and linux.12xlarge? Would like to understand the actual trade-off we made, ideally we should expect it to be tiny

guangy10 · 2024-11-18T21:21:39Z

.ci/scripts/gather_test_models.py

-        "resnet50": "linux.12xlarge",
-        "llava": "linux.12xlarge",
-        "llama3_2_vision_encoder": "linux.12xlarge",
-        # "llama3_2_text_decoder": "linux.12xlarge",  # TODO: re-enable test when Huy's change is in / model gets smaller.


Can you attach the job link to this model since we re-enable it ?

Good catch, I think I will comment it out and leave it for latter. It doesn't OOM but take forever to export (close to 6 hours so far). I don't have much context, so probably need help from @dvorjackz to figure this one out

Try linux.4xlarge.memory

3d3c8a6

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 15, 2024

huydhn added 2 commits November 15, 2024 12:48

Testing

95a912e

Ready to land

6cc16d9

huydhn requested review from guangy10 and kirklandsign November 18, 2024 18:37

Merge branch 'main' into try-r5-instances

71ff610

huydhn marked this pull request as ready for review November 18, 2024 18:40

kirklandsign approved these changes Nov 18, 2024

View reviewed changes

guangy10 approved these changes Nov 18, 2024

View reviewed changes

guangy10 reviewed Nov 18, 2024

View reviewed changes

huydhn added 2 commits November 25, 2024 16:26

Merge branch 'main' into try-r5-instances

a41f4b4

More tests

2d84696

kirklandsign approved these changes Nov 26, 2024

View reviewed changes

huydhn added the topic: not user facing label Nov 26, 2024

huydhn added 5 commits November 25, 2024 17:09

Forget one line change

245b1fe

Use linux.8xlarge.memory for llama3_2_text_decoder

34ee8d6

Increase timeout value

1df29d1

Skip llama3_2_text_decoder because it takes too long to export

038265e

Ready to land

8204737

huydhn merged commit aadf2ee into main Nov 26, 2024
65 of 66 checks passed

huydhn deleted the try-r5-instances branch November 26, 2024 22:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use linux.4xlarge.memory instead of linux.12xlarge #6896

Use linux.4xlarge.memory instead of linux.12xlarge #6896

Uh oh!

huydhn commented Nov 15, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 15, 2024 •

edited

Loading

Uh oh!

guangy10 left a comment

Uh oh!

guangy10 Nov 18, 2024

Uh oh!

huydhn Nov 26, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Use linux.4xlarge.memory instead of linux.12xlarge #6896

Use linux.4xlarge.memory instead of linux.12xlarge #6896

Uh oh!

Conversation

huydhn commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

pytorch-bot bot commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6896

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

guangy10 left a comment

Choose a reason for hiding this comment

Uh oh!

guangy10 Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

huydhn Nov 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

huydhn commented Nov 15, 2024 •

edited

Loading

pytorch-bot bot commented Nov 15, 2024 •

edited

Loading