[Llava] Add max_context_len CLI arg #14599

GregoryComer · 2025-09-25T16:52:00Z

Summary

Add a required max_context_len argument to the Llava example model export. When set to 768, this reduces the memory consumption (~6GiB -> ~4.8GiB RSS) at the cost of a smaller context length and thus fixes #14474.

Test plan

Ran ./test_llava.sh and validated the reported memory consumption on an x86 Linux machine.

I 00:00:18.433471 executorch:main.cpp:172] Starting generation...
I 00:00:18.433500 executorch:multimodal_runner.cpp:95] RSS after loading model: 4746.726562 MiB (0 if unsupported)
I 00:00:18.433554 executorch:multimodal_runner.cpp:119] Prefilling input 0/3, type: text
I 00:00:19.484581 executorch:multimodal_runner.cpp:119] Prefilling input 1/3, type: image
I 00:00:19.484710 executorch:multimodal_prefiller.cpp:83] Image tensor dim: 3, dtype: Byte
I 00:00:30.442685 executorch:multimodal_runner.cpp:119] Prefilling input 2/3, type: text
I 00:00:30.951938 executorch:multimodal_runner.cpp:138] RSS after multimodal input processing: 4847.933594 MiB (0 if unsupported)
I 00:00:30.952000 executorch:multimodal_runner.cpp:148] Max new tokens resolved: 153, pos_ 615, max_context_len 768

pytorch-bot · 2025-09-25T16:52:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14599

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 3 Pending, 2 Unrelated Failures

As of commit 315ea97 with merge base a1daab9 ():

NEW FAILURES - The following jobs have failed:

pull / test-qnn-wheel-packages-linux (3.10) / linux-job (gh)
RuntimeError: Command docker exec -t 9b4fe78e3394ba70278514ba7fc3ed5ba9284bb900144c0928ff88bdd3ccb21e /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.11) / linux-job (gh)
RuntimeError: Command docker exec -t aca0422e1df15e2ea58887ec24503c6654e0df9cbebabf9de16e1d205bf09c5f /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.12) / linux-job (gh)
RuntimeError: Command docker exec -t 59266030eddf25f5190eff1f104fafd439a35a24f3e68465fa321e73ba8d9273 /exec failed with exit code 1
pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 9ab5319448c1423681c2d810b51026bf994471cd96c45d79f63d41c3577160e9 /exec failed with exit code 1
trunk / test-static-llama-qnn-eval-linux (baseline, 62) / linux-job (gh)
RuntimeError: Command docker exec -t fadc19b8a22516556feb525542204891556fc9ac186fed177d737e33824daeef /exec failed with exit code 127

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / test-binary-size-linux-gcc / linux-job (gh) (similar failure)

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-setup-linux-gcc / linux-job (gh) (trunk failure)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-09-25T16:52:53Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

examples/models/llava/export_llava.py

kimishpatel

would like to make max_context_len required arg and if it is not in export_llm I think it should be. Or at least improve documentation to include this arg in the export CLI example

GregoryComer · 2025-09-25T18:52:41Z

would like to make max_context_len required arg and if it is not in export_llm I think it should be. Or at least improve documentation to include this arg in the export CLI example

I've updated the arg to be required and made corresponding changes to the example README and test_llava.sh script. I'd recommend better documenting the different between the user facing max_context_len and max_seq_len args in the export_llava.py script, though I'm likely not the right owner for this.

kimishpatel · 2025-09-25T19:04:20Z

would like to make max_context_len required arg and if it is not in export_llm I think it should be. Or at least improve documentation to include this arg in the export CLI example

I've updated the arg to be required and made corresponding changes to the example README and test_llava.sh script. I'd recommend better documenting the different between the user facing max_context_len and max_seq_len args in the export_llava.py script, though I'm likely not the right owner for this.

It is also a bit hard to explain the difference between the two unless user understands how to use it for better memory footprint. I would just opt for better default for max_seq_len

larryliu0820

Thank you for the fix!

GregoryComer · 2025-09-25T20:24:24Z

Noting that CI failures appear to be pre-existing or flaky. Merging.

GregoryComer · 2025-09-25T20:24:47Z

I'll submit a pick request once the trunk jobs complete.

digantdesai · 2025-10-14T16:04:35Z

@pytorchbot cherry-pick --onto release/1.0 -c regression

### Summary Add a required max_context_len argument to the Llava example model export. When set to 768, this reduces the memory consumption (~6GiB -> ~4.8GiB RSS) at the cost of a smaller context length and thus fixes #14474. ### Test plan Ran ./test_llava.sh and validated the reported memory consumption on an x86 Linux machine. ``` I 00:00:18.433471 executorch:main.cpp:172] Starting generation... I 00:00:18.433500 executorch:multimodal_runner.cpp:95] RSS after loading model: 4746.726562 MiB (0 if unsupported) I 00:00:18.433554 executorch:multimodal_runner.cpp:119] Prefilling input 0/3, type: text I 00:00:19.484581 executorch:multimodal_runner.cpp:119] Prefilling input 1/3, type: image I 00:00:19.484710 executorch:multimodal_prefiller.cpp:83] Image tensor dim: 3, dtype: Byte I 00:00:30.442685 executorch:multimodal_runner.cpp:119] Prefilling input 2/3, type: text I 00:00:30.951938 executorch:multimodal_runner.cpp:138] RSS after multimodal input processing: 4847.933594 MiB (0 if unsupported) I 00:00:30.952000 executorch:multimodal_runner.cpp:148] Max new tokens resolved: 153, pos_ 615, max_context_len 768 ``` (cherry picked from commit bc755c6)

pytorchbot · 2025-10-14T16:07:21Z

Cherry picking #14599

The cherry pick PR is at #15112 and it is recommended to link a regression cherry pick PR with an issue. The following tracker issues are updated:

[v1.0.0] Release Tracker #14288 (comment)

Details for Dev Infra team

Raised by workflow job

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 25, 2025

GregoryComer requested a review from mergennachin September 25, 2025 16:52

GregoryComer requested a review from larryliu0820 September 25, 2025 16:59

GregoryComer force-pushed the llava-context-len branch from 63c5534 to ffaa4f4 Compare September 25, 2025 17:04

GregoryComer marked this pull request as ready for review September 25, 2025 17:32

GregoryComer requested review from jackzhxng and lucylq as code owners September 25, 2025 17:32

GregoryComer requested review from kimishpatel and removed request for mergennachin September 25, 2025 17:33

kimishpatel reviewed Sep 25, 2025

View reviewed changes

examples/models/llava/export_llava.py Outdated Show resolved Hide resolved

kimishpatel requested changes Sep 25, 2025

View reviewed changes

[Llava] Add max_context_len CLI arg

315ea97

GregoryComer force-pushed the llava-context-len branch from ffaa4f4 to 315ea97 Compare September 25, 2025 18:22

GregoryComer requested a review from kimishpatel September 25, 2025 18:52

kimishpatel approved these changes Sep 25, 2025

View reviewed changes

larryliu0820 approved these changes Sep 25, 2025

View reviewed changes

GregoryComer merged commit bc755c6 into pytorch:main Sep 25, 2025
252 of 260 checks passed

pytorchbot mentioned this pull request Oct 14, 2025

[v1.0.0] Release Tracker #14288

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Llava] Add max_context_len CLI arg #14599

[Llava] Add max_context_len CLI arg #14599

GregoryComer commented Sep 25, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 25, 2025

Uh oh!

Uh oh!

kimishpatel left a comment

Uh oh!

GregoryComer commented Sep 25, 2025

Uh oh!

kimishpatel commented Sep 25, 2025

Uh oh!

larryliu0820 left a comment

Uh oh!

GregoryComer commented Sep 25, 2025

Uh oh!

Uh oh!

GregoryComer commented Sep 25, 2025

Uh oh!

digantdesai commented Oct 14, 2025

Uh oh!

pytorchbot commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Llava] Add max_context_len CLI arg #14599

[Llava] Add max_context_len CLI arg #14599

Conversation

GregoryComer commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14599

❌ 5 New Failures, 3 Pending, 2 Unrelated Failures

Uh oh!

github-actions bot commented Sep 25, 2025

This PR needs a release notes: label

Uh oh!

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

GregoryComer commented Sep 25, 2025

Uh oh!

kimishpatel commented Sep 25, 2025

Uh oh!

larryliu0820 left a comment

Choose a reason for hiding this comment

Uh oh!

GregoryComer commented Sep 25, 2025

Uh oh!

Uh oh!

GregoryComer commented Sep 25, 2025

Uh oh!

digantdesai commented Oct 14, 2025

Uh oh!

pytorchbot commented Oct 14, 2025

Cherry picking #14599

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

GregoryComer commented Sep 25, 2025 •

edited

Loading

pytorch-bot bot commented Sep 25, 2025 •

edited

Loading

This PR needs a `release notes:` label