Skip to content

Conversation

@GregoryComer
Copy link
Member

@GregoryComer GregoryComer commented Sep 25, 2025

Summary

Add a required max_context_len argument to the Llava example model export. When set to 768, this reduces the memory consumption (~6GiB -> ~4.8GiB RSS) at the cost of a smaller context length and thus fixes #14474.

Test plan

Ran ./test_llava.sh and validated the reported memory consumption on an x86 Linux machine.

I 00:00:18.433471 executorch:main.cpp:172] Starting generation...
I 00:00:18.433500 executorch:multimodal_runner.cpp:95] RSS after loading model: 4746.726562 MiB (0 if unsupported)
I 00:00:18.433554 executorch:multimodal_runner.cpp:119] Prefilling input 0/3, type: text
I 00:00:19.484581 executorch:multimodal_runner.cpp:119] Prefilling input 1/3, type: image
I 00:00:19.484710 executorch:multimodal_prefiller.cpp:83] Image tensor dim: 3, dtype: Byte
I 00:00:30.442685 executorch:multimodal_runner.cpp:119] Prefilling input 2/3, type: text
I 00:00:30.951938 executorch:multimodal_runner.cpp:138] RSS after multimodal input processing: 4847.933594 MiB (0 if unsupported)
I 00:00:30.952000 executorch:multimodal_runner.cpp:148] Max new tokens resolved: 153, pos_ 615, max_context_len 768

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 25, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14599

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 3 Pending, 2 Unrelated Failures

As of commit 315ea97 with merge base a1daab9 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 25, 2025
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@GregoryComer GregoryComer marked this pull request as ready for review September 25, 2025 17:32
@GregoryComer GregoryComer requested review from kimishpatel and removed request for mergennachin September 25, 2025 17:33
Copy link
Contributor

@kimishpatel kimishpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to make max_context_len required arg and if it is not in export_llm I think it should be. Or at least improve documentation to include this arg in the export CLI example

@GregoryComer
Copy link
Member Author

would like to make max_context_len required arg and if it is not in export_llm I think it should be. Or at least improve documentation to include this arg in the export CLI example

I've updated the arg to be required and made corresponding changes to the example README and test_llava.sh script. I'd recommend better documenting the different between the user facing max_context_len and max_seq_len args in the export_llava.py script, though I'm likely not the right owner for this.

@kimishpatel
Copy link
Contributor

would like to make max_context_len required arg and if it is not in export_llm I think it should be. Or at least improve documentation to include this arg in the export CLI example

I've updated the arg to be required and made corresponding changes to the example README and test_llava.sh script. I'd recommend better documenting the different between the user facing max_context_len and max_seq_len args in the export_llava.py script, though I'm likely not the right owner for this.

It is also a bit hard to explain the difference between the two unless user understands how to use it for better memory footprint. I would just opt for better default for max_seq_len

Copy link
Contributor

@larryliu0820 larryliu0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix!

@GregoryComer
Copy link
Member Author

Noting that CI failures appear to be pre-existing or flaky. Merging.

@GregoryComer GregoryComer merged commit bc755c6 into pytorch:main Sep 25, 2025
252 of 260 checks passed
@GregoryComer
Copy link
Member Author

I'll submit a pick request once the trunk jobs complete.

@digantdesai
Copy link
Contributor

@pytorchbot cherry-pick --onto release/1.0 -c regression

pytorchbot pushed a commit that referenced this pull request Oct 14, 2025
### Summary
Add a required max_context_len argument to the Llava example model
export. When set to 768, this reduces the memory consumption (~6GiB ->
~4.8GiB RSS) at the cost of a smaller context length and thus fixes
#14474.

### Test plan
Ran ./test_llava.sh and validated the reported memory consumption on an
x86 Linux machine.

```
I 00:00:18.433471 executorch:main.cpp:172] Starting generation...
I 00:00:18.433500 executorch:multimodal_runner.cpp:95] RSS after loading model: 4746.726562 MiB (0 if unsupported)
I 00:00:18.433554 executorch:multimodal_runner.cpp:119] Prefilling input 0/3, type: text
I 00:00:19.484581 executorch:multimodal_runner.cpp:119] Prefilling input 1/3, type: image
I 00:00:19.484710 executorch:multimodal_prefiller.cpp:83] Image tensor dim: 3, dtype: Byte
I 00:00:30.442685 executorch:multimodal_runner.cpp:119] Prefilling input 2/3, type: text
I 00:00:30.951938 executorch:multimodal_runner.cpp:138] RSS after multimodal input processing: 4847.933594 MiB (0 if unsupported)
I 00:00:30.952000 executorch:multimodal_runner.cpp:148] Max new tokens resolved: 153, pos_ 615, max_context_len 768
```

(cherry picked from commit bc755c6)
@pytorchbot
Copy link
Collaborator

Cherry picking #14599

The cherry pick PR is at #15112 and it is recommended to link a regression cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLaVA allocates too much memory on iOS

5 participants