Upload iOS peak memory usage metric #6282

huydhn · 2024-10-16T06:44:53Z

This extracts iOS Memory Peak Physical, kB metric and uses it as the benchmark peak_mem_usage(mb). I convert the value to mb because it feels friendlier to human looking at the dashboard, but let me know if we need kb granularity.

Also fix a small regex bug when parsing model name like tinyllama_xnnpack+custom+qe_fp32 where the backend could include +

This also fixes the missing key error in https://github.com/pytorch/executorch/actions/runs/11337625751/job/31531760841#step:7:112, it happened because the script failed to parse tinyllama_xnnpack+custom+qe_fp32 and got back no benchmark records.

pytorch-bot · 2024-10-16T06:44:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6282

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 806a64f with merge base c242c4c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

huydhn · 2024-10-16T07:17:07Z

Testing run https://github.com/pytorch/executorch/actions/runs/11360762990. I also borrow this to test the other PR I have pytorch/test-infra#5770 to mitigate AWS rate limit error.

facebook-github-bot · 2024-10-16T07:25:33Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

huydhn · 2024-10-16T07:43:13Z

Another thing I notice is that XCTClockMetric returns float value in seconds. And, while the script converts it to ms by x1000, the end value doesn't have the same level of accuracy as its Android counterpart. This explains @guangy10 comment on pytorch/test-infra#5758 (comment)

huydhn · 2024-10-16T17:26:18Z

@guangy10 This might answer your question earlier about how people can use the dashboard. Here is the results from my test run above https://github.com/pytorch/executorch/actions/runs/11360762990, I can see the results on the dashboard, and compare with other branch, i.e. main

https://hud.pytorch.org/benchmark/llms?startTime=Wed%2C%2009%20Oct%202024%2017%3A24%3A45%20GMT&stopTime=Wed%2C%2016%20Oct%202024%2017%3A24%3A45%20GMT&granularity=hour&lBranch=add-memory-metric-ios&lCommit=37a00ca41c98ebbd97e060d11987f00918b67037&rBranch=add-memory-metric-ios&rCommit=37a00ca41c98ebbd97e060d11987f00918b67037&repoName=pytorch%2Fexecutorch&modelName=All%20Models&dtypeName=All%20DType&deviceName=All%20Devices

The dashboard verifies that:

The results are uploaded successfully even if some models fail
peak_mem_usage(mb) shows up correctly
Model name tinyllama xnnpack+custom+qe (fp32) is now parsed correctly

guangy10 · 2024-10-16T17:37:05Z

The app measures two peak_mem_usage one for load time including load weights, the other one for forward/execute time. We should display both probably. To me the load time mem usage seems more important especially for large LLMs, if we choose to display one true peak_mem_usage it should be the load time one. We need to clearly tell the difference to avoid confusion if we display both. @shoumikhin could explain more.

@huydhn Not a blocker but I'm curious if there is a way to add/display annotation when hovering on the filed name, something like we see in the scuba table?

guangy10 · 2024-10-16T17:41:37Z

And, while the script converts it to ms by x1000, the end value doesn't have the same level of accuracy as its Android counterpart.

Why is it? It shouldn't affect the accuracy

huydhn · 2024-10-16T17:56:32Z

And, while the script converts it to ms by x1000, the end value doesn't have the same level of accuracy as its Android counterpart.

Why is it? It shouldn't affect the accuracy

Ah, the number is usual something like 0.003 seconds, which is turned into 3 ms. I guess you want to know why they don't have the same accuracy as those numbers on Android

huydhn · 2024-10-16T17:57:05Z

The app measures two peak_mem_usage one for load time including load weights, the other one for forward/execute time. We should display both probably. To me the load time mem usage seems more important especially for large LLMs, if we choose to display one true peak_mem_usage it should be the load time one. We need to clearly tell the difference to avoid confusion if we display both. @shoumikhin could explain more.

I see. What do you think about showing the bigger number between the twos? It's also easy to have both of them under two different names though, i.e. peak_mem_usage (load) and peak_mem_usage (forward/generate) for example.

I think I will go with the latter approach of showing both, then we can take another look at the dashboard to see how they are displayed

huydhn · 2024-10-16T18:02:40Z

@huydhn Not a blocker but I'm curious if there is a way to add/display annotation when hovering on the filed name, something like we see in the scuba table?

This should be doable. Ping me the description that you have in mind for each of these fields, I could add them into the database as a description field.

guangy10 · 2024-10-16T18:05:56Z

And, while the script converts it to ms by x1000, the end value doesn't have the same level of accuracy as its Android counterpart.

Why is it? It shouldn't affect the accuracy

Ah, the number is usual something like 0.003 seconds, which is turned into 3 ms. I guess you want to know why they don't have the same accuracy as those numbers on Android

@huydhn Oh, it may be truncated somewhere when reading it out from the test. Because I can see the raw data shows more than 10 decimal places

The job link where I got the data: https://github.com/pytorch/executorch/actions/runs/11360762990/job/31600157567

guangy10 · 2024-10-16T18:11:33Z

I see. What do you think about showing the bigger number between the twos? It's also easy to have both of them under two different names though, i.e. peak_mem_usage (load) and peak_mem_usage (forward/generate) for example.

I think I will go with the latter approach of showing both, then we can take another look at the dashboard to see how they are displayed

@huydhn peak_mem_usage (load) and peak_mem_usage (forward/generate) look great to me. We can show both and let people decide whether the metric is helpful. One minor concern is that the Android side might not be able to do the same because per @kirklandsign there is no equivalent Android API that can accurately measure the mem usage.

huydhn · 2024-10-16T18:13:28Z

Ohh, this is interesting, the numbers from xcresult have higher resolution than what are printed in the output (where I get them from). I could do another update later to see if I can get them from xcresult. I guess xctest does the "friendlier" print of time in second for human to understand lol

guangy10 · 2024-10-16T18:28:59Z

I see there are multiple data points for (mobilebert coreml) from main in the past few days: https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=mobilebert%2C%20coreml

However, on the dashboard, I can only see one data point from your working branch: https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fexecutorch

@huydhn ~~Do you have any idea why?~~ Oh, actually the upload-benchmark-results job fails at the end: https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=apple-perf.

2024-10-16T02:13:26.5041722Z     benchmark_results = transform(
2024-10-16T02:13:26.5042810Z   File "/home/ec2-user/actions-runner/_work/executorch/executorch/.github/scripts/extract_benchmark_results.py", line 298, in transform
2024-10-16T02:13:26.5043807Z     r["deviceInfo"]["device"] = job_name
2024-10-16T02:13:26.5044296Z KeyError: 'deviceInfo'

guangy10 · 2024-10-16T18:59:11Z

@huydhn Not a blocker but I'm curious if there is a way to add/display annotation when hovering on the filed name, something like we see in the scuba table?

This should be doable. Ping me the description that you have in mind for each of these fields, I could add them into the database as a description field.

Thanks @huydhn! Here are the annotation you can add to the DB.

peak_mem_usage (load):"Peak memory usage during the execution of forward() method, which does not include the memory consumed by the application outside"
peak_mem_usage (execute): "Peak memory usage during model loading, including loading the checkpoint".

@shoumikhin Does the description sound accurate?

huydhn · 2024-10-16T19:19:49Z

@huydhn ~~Do you have any idea why?~~ Oh, actually the upload-benchmark-results job fails at the end: https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=apple-perf.
2024-10-16T02:13:26.5041722Z     benchmark_results = transform(
2024-10-16T02:13:26.5042810Z   File "/home/ec2-user/actions-runner/_work/executorch/executorch/.github/scripts/extract_benchmark_results.py", line 298, in transform
2024-10-16T02:13:26.5043807Z     r["deviceInfo"]["device"] = job_name
2024-10-16T02:13:26.5044296Z KeyError: 'deviceInfo'

Yeah, this is the regex parsing bug that I fix as part of this PR. That's why you see the results showing up on my test branch

facebook-github-bot · 2024-10-16T20:21:18Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

huydhn · 2024-10-16T20:37:30Z

Ty for the review! I will add the description and figure out a way to get the higher resolution results from xcresults in separate PRs later

guangy10 · 2024-10-16T20:58:27Z

Ty for the review! I will add the description and figure out a way to get the higher resolution results from xcresults in separate PRs later

Sounds good to me. BTW, on the dashboard side would you mind split the backend out from the model name (T204741729)? If non-trivial it's totally okay to deal with it after the release.

guangy10 · 2024-10-16T21:07:24Z

https://hud.pytorch.org/benchmark/llms?startTime=Wed%2C%2009%20Oct%202024%2017%3A24%3A45%20GMT&stopTime=Wed%2C%2016%20Oct%202024%2017%3A24%3A45%20GMT&granularity=hour&lBranch=add-memory-metric-ios&lCommit=37a00ca41c98ebbd97e060d11987f00918b67037&rBranch=add-memory-metric-ios&rCommit=37a00ca41c98ebbd97e060d11987f00918b67037&repoName=pytorch%2Fexecutorch&modelName=All%20Models&dtypeName=All%20DType&deviceName=All%20Devices

@huydhn BTW, how do you generate the link with detailed configs? Such a link it's quite helpful. However, mine doesn't show the configs e.g. branch, commits, etc. in the URL. It's just https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fexecutorch

huydhn · 2024-10-16T23:03:43Z

Oh you need to use the copy snapshot link button

huydhn · 2024-10-16T23:05:32Z

My test earlier completes https://github.com/pytorch/executorch/actions/runs/11360762990 but due to the HUD outage earlier, the workflow information is lost. I think I need to re-run this now. Just FYI, we have a data gap from about 11AM to 4PM today.

huydhn · 2024-10-17T01:09:34Z

The two memory metrics are here https://hud.pytorch.org/benchmark/llms?startTime=Thu%2C%2010%20Oct%202024%2001%3A08%3A46%20GMT&stopTime=Thu%2C%2017%20Oct%202024%2001%3A08%3A46%20GMT&granularity=hour&lBranch=add-memory-metric-ios&lCommit=af73f8eae1d58e39f7c17366e3969a5451edc012&rBranch=add-memory-metric-ios&rCommit=af73f8eae1d58e39f7c17366e3969a5451edc012&repoName=pytorch%2Fexecutorch&modelName=All%20Models&dtypeName=All%20DType&deviceName=All%20Devices

facebook-github-bot · 2024-10-17T01:11:34Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: This extracts iOS `Memory Peak Physical, kB` metric and uses it as the benchmark `peak_mem_usage(mb)`. I convert the value to mb because it feels friendlier to human looking at the dashboard, but let me know if we need kb granularity. Also fix a small regex bug when parsing model name like `tinyllama_xnnpack+custom+qe_fp32` where the backend could include `+` This also fixes the missing key error in https://github.com/pytorch/executorch/actions/runs/11337625751/job/31531760841#step:7:112, it happened because the script failed to parse `tinyllama_xnnpack+custom+qe_fp32` and got back no benchmark records. Pull Request resolved: #6282 Reviewed By: guangy10 Differential Revision: D64453877 Pulled By: huydhn

facebook-github-bot · 2024-10-17T03:48:49Z

This pull request was exported from Phabricator. Differential Revision: D64453877

facebook-github-bot · 2024-10-17T05:25:16Z

@huydhn merged this pull request in 5e44991.

huydhn requested review from guangy10 and shoumikhin October 16, 2024 06:44

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 16, 2024

huydhn temporarily deployed to upload-benchmark-results October 16, 2024 08:54 — with GitHub Actions Inactive

huydhn closed this Oct 16, 2024

huydhn reopened this Oct 16, 2024

guangy10 approved these changes Oct 16, 2024

View reviewed changes

huydhn temporarily deployed to upload-benchmark-results October 16, 2024 21:57 — with GitHub Actions Inactive

guangy10 approved these changes Oct 16, 2024

View reviewed changes

huydhn temporarily deployed to upload-benchmark-results October 17, 2024 00:36 — with GitHub Actions Inactive

facebook-github-bot added the fb-exported label Oct 17, 2024

huydhn force-pushed the add-memory-metric-ios branch from e59dbea to 806a64f Compare October 17, 2024 03:48

facebook-github-bot closed this in 5e44991 Oct 17, 2024

facebook-github-bot added the Merged label Oct 17, 2024

guangy10 mentioned this pull request Oct 24, 2024

Fix typo in apple-perf #6491

Merged

Upload iOS peak memory usage metric #6282

Upload iOS peak memory usage metric #6282

Uh oh!

Conversation

huydhn commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6282

✅ No Failures

Uh oh!

huydhn commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Oct 16, 2024

Uh oh!

huydhn commented Oct 16, 2024

Uh oh!

huydhn commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guangy10 commented Oct 16, 2024

Uh oh!

guangy10 commented Oct 16, 2024

Uh oh!

huydhn commented Oct 16, 2024

Uh oh!

huydhn commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huydhn commented Oct 16, 2024

Uh oh!

guangy10 commented Oct 16, 2024

Uh oh!

guangy10 commented Oct 16, 2024

Uh oh!

huydhn commented Oct 16, 2024

Uh oh!

guangy10 commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guangy10 commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huydhn commented Oct 16, 2024

Uh oh!

facebook-github-bot commented Oct 16, 2024

Uh oh!

huydhn commented Oct 16, 2024

Uh oh!

guangy10 commented Oct 16, 2024

Uh oh!

guangy10 commented Oct 16, 2024

Uh oh!

huydhn commented Oct 16, 2024

Uh oh!

huydhn commented Oct 16, 2024

Uh oh!

huydhn commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Oct 17, 2024

Uh oh!

facebook-github-bot commented Oct 17, 2024

Uh oh!

facebook-github-bot commented Oct 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

huydhn commented Oct 16, 2024 •

edited

Loading

pytorch-bot bot commented Oct 16, 2024 •

edited

Loading

huydhn commented Oct 16, 2024 •

edited

Loading

huydhn commented Oct 16, 2024 •

edited

Loading

huydhn commented Oct 16, 2024 •

edited

Loading

guangy10 commented Oct 16, 2024 •

edited

Loading

guangy10 commented Oct 16, 2024 •

edited

Loading

huydhn commented Oct 17, 2024 •

edited

Loading