Update cache position population and arg order for multimodal runner #14225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

kirklandsign merged 14 commits into main from cache-position-llava

Sep 15, 2025

Contributor

kirklandsign commented Sep 11, 2025 •

edited

Loading

Summary

For voxtral and phi-3, we construct the cache_position_tensor like before; for llava, it will construct underneath so we pass in size 1.

Test plan

CI


          Update cache position size for llava

9bf819c

pytorch-bot bot commented Sep 11, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14225

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Cancelled Jobs

As of commit 5af64f8 with merge base 10e93fb ():

NEW FAILURES - The following jobs have failed:

pull / unittest / windows / windows-job (gh)
Process completed with exit code 2.
pull / unittest-editable / windows / windows-job (gh)
Process completed with exit code 2.

CANCELLED JOBS - The following jobs were cancelled. Please retry:

pull / unittest / macos / macos-job (gh)
##[error]The operation was canceled.
pull / unittest-editable / macos / macos-job (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kirklandsign marked this pull request as ready for review

September 11, 2025 19:33

kirklandsign requested review from jackzhxng, larryliu0820, mergennachin and swolchok as code owners

September 11, 2025 19:33

meta-cla bot added the CLA Signed label

jackzhxng reviewed

View reviewed changes

Contributor

jackzhxng left a comment •

edited

Loading

(Synced offline)

extension/llm/runner/multimodal_prefiller.cpp Outdated

    
                // e.g. if start_pos = 2 and encoder_output.size(1) = 5,

                // cache_position_tensor should be [2, 3, 4, 5, 6].

                auto method_meta = ET_UNWRAP(module_->method_meta(kTextModelMethod));

                auto first_input_info = ET_UNWRAP(method_meta.input_tensor_meta(0));

Contributor

jackzhxng Sep 11, 2025

Change to second_input_info

extension/llm/runner/multimodal_prefiller.cpp Outdated

    
                    cache_positions.data(),

                    {static_cast<int>(seq_len)},

                    executorch::aten::ScalarType::Long);

                auto cache_position_tensor = (numel > 1)

Contributor

jackzhxng Sep 11, 2025

Can you do something like

if (numel > 1) {
    // `cache_position` goes from start_pos to start_pos + encoder_output.size(1).
    // e.g. if start_pos = 2 and encoder_output.size(1) = 5,
    // cache_position_tensor should be [2, 3, 4, 5, 6].
    for (int64_t i = 0; i < seq_len; ++i) {
      cache_positions[i] = start_pos + i;
    }
    auto cache_position_tensor = ::executorch::extension::from_blob(
            cache_positions.data(),
            {static_cast<int>(seq_len)},
            executorch::aten::ScalarType::Long)
} else {
    // Cache position is size 1.
    auto cache_position_tensor = ::executorch::extension::from_blob(
            &start_pos, {1}, executorch::aten::ScalarType::Long);
}

extension/llm/runner/multimodal_prefiller.cpp Outdated

    
                // `cache_position` goes from start_pos to start_pos + encoder_output.size(1).

                // e.g. if start_pos = 2 and encoder_output.size(1) = 5,

                // cache_position_tensor should be [2, 3, 4, 5, 6].

                auto method_meta = ET_UNWRAP(module_->method_meta(kTextModelMethod));

Contributor

jackzhxng Sep 11, 2025

Add comment like

// Get expected shape of cache position tensor, which should be the second argument


          Address comments

ad1116e

larryliu0820 reviewed

View reviewed changes

extension/llm/runner/multimodal_prefiller.cpp Outdated

Comment on lines 98 to 101

    
                auto second_input_info = ET_UNWRAP(method_meta.input_tensor_meta(1));

                auto second_input_sizes = second_input_info.sizes();

                auto numel = second_input_sizes[0];

Contributor

larryliu0820 Sep 11, 2025

Please reuse the logic here https://github.com/pytorch/executorch/blob/main/extension/llm/runner/text_decoder_runner.cpp#L44-L69

Contributor Author

kirklandsign Sep 11, 2025

A bit later?

0th

d35f5e9

jackzhxng mentioned this pull request

Run Llava with MultimodalRunner #14250

Merged

mergennachin added this to the 1.0.0 milestone

jackzhxng reviewed

View reviewed changes

extension/llm/runner/multimodal_prefiller.cpp Outdated

    
                    {static_cast<int>(seq_len)},

                    executorch::aten::ScalarType::Long);

                auto prefill_result = module_->execute(

                    kTextModelMethod, {cache_position_tensor, encoder_output});

Contributor

jackzhxng Sep 12, 2025

Swap these two

kirklandsign added 2 commits

September 12, 2025 13:52


          Fix stuff and order

e18ce85


          Linter

75bd77d

larryliu0820 reviewed

View reviewed changes

extension/llm/runner/util.h Outdated

    
              }

              inline runtime::Result<TensorPtr>

              populate_start_pos_tensor(Module* module, int64_t& start_pos, int seq_len) {

Contributor

larryliu0820 Sep 12, 2025

Add some docstring please, like how we assume the second argument is cache position/ start pos and based on the shape to populate the tensor

Contributor

larryliu0820 Sep 12, 2025

Also the name should be populate_start_pos_or_cache_position

kirklandsign changed the title ~~Update cache position size for llava~~ Update cache position population and arg order for multimodal runner

jackzhxng approved these changes

View reviewed changes

Contributor

jackzhxng left a comment •

edited

Loading

Thank you! I'll need to land #14238 first


          Address comments

05ac4c1

larryliu0820 approved these changes

View reviewed changes

kirklandsign added the release notes: llm label

kirklandsign added 2 commits

September 12, 2025 15:25


          Merge remote-tracking branch 'origin/main' into cache-position-llava

cb633cf

Fix

3e349fb

jackzhxng reviewed

View reviewed changes

extension/llm/runner/text_decoder_runner.cpp Outdated

    
                      "The second input tensor is not 1D tensor. Got dimension (%zu)",

                      sizes.size());

                  auto numel = sizes[0];

                  std::vector<::executorch::aten::SizesType> sizes_vec = {numel};

Contributor

jackzhxng Sep 15, 2025

Seems like you can remove lines 35 ~ 54 since it's not being used anymore?

kirklandsign added 2 commits

September 15, 2025 11:17


          Address comments

0237dfd

Fix

9b2610e

larryliu0820 reviewed

View reviewed changes

extension/llm/runner/util.h Outdated

    
                  const char* method_name,

                  Module* module,

                  int64_t& start_pos,

                  std::vector<int64_t>& cache_positions_underlying_vector,

Contributor

larryliu0820 Sep 15, 2025

Can be concise: cache_positions_vec

larryliu0820 reviewed

View reviewed changes

extension/llm/runner/util.h Outdated

    
              // size 1 because model will populate the cache position tensor underneath), or

              // a populated tensor for cache position, for the given start_pos and seq_len.

              inline runtime::Result<TensorPtr> populate_start_pos_or_cache_position(

                  const char* method_name,

Contributor

larryliu0820 Sep 15, 2025

Give it a default value forward

kirklandsign added 4 commits

September 15, 2025 14:46

Fix

06cb42a


          Update

5665e9b


          Lint

67f5a92

Fix

5af64f8

kirklandsign merged commit ea4f004 into main

121 of 125 checks passed

kirklandsign deleted the cache-position-llava branch

September 15, 2025 23:40

Contributor

larryliu0820 commented Sep 16, 2025

@pytorchbot cheery-pick onto release/1.0 -c "fixnewfeature"

pytorch-bot bot commented Sep 16, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'cheery-pick' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick} ...

Try @pytorchbot --help for more info.

Contributor

larryliu0820 commented Sep 16, 2025

@pytorchbot cherry-pick onto release/1.0 -c fixnewfeature

pytorch-bot bot commented Sep 16, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot cherry-pick: error: the following arguments are required: --onto/--into

usage: @pytorchbot cherry-pick --onto ONTO [--fixes FIXES] -c
                               {regression,critical,fixnewfeature,docs,release}

Try @pytorchbot --help for more info.

Contributor

larryliu0820 commented Sep 16, 2025

@pytorchbot cherry-pick --onto release/1.0 -c fixnewfeature

pytorchbot pushed a commit that referenced this pull request


          Update cache position population and arg order for multimodal runner (#…

bf7f343

…14225)

For voxtral and phi-3, we construct the cache_position_tensor like
before; for llava, it will construct underneath so we pass in size 1.

(cherry picked from commit ea4f004)

Collaborator

pytorchbot commented Sep 16, 2025

Cherry picking #14225

The cherry pick PR is at #14343 and it is recommended to link a fixnewfeature cherry pick PR with an issue. The following tracker issues are updated:

[v1.0.0] Release Tracker #14288 (comment)

Details for Dev Infra team

Raised by workflow job

pytorchbot mentioned this pull request

[v1.0.0] Release Tracker #14288

Closed

StrycekSimon pushed a commit to nxp-upstream/executorch that referenced this pull request


          Update cache position population and arg order for multimodal runner (p…

bc75ff5

…ytorch#14225)

For voxtral and phi-3, we construct the cache_position_tensor like
before; for llava, it will construct underneath so we pass in size 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed release notes: llm