Skip to content

Conversation

@kirklandsign
Copy link
Contributor

@kirklandsign kirklandsign commented Sep 11, 2025

Summary

For voxtral and phi-3, we construct the cache_position_tensor like before; for llava, it will construct underneath so we pass in size 1.

Test plan

CI

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14225

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Cancelled Jobs

As of commit 5af64f8 with merge base 10e93fb (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@kirklandsign kirklandsign marked this pull request as ready for review September 11, 2025 19:33
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 11, 2025
Copy link
Contributor

@jackzhxng jackzhxng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Synced offline)

// e.g. if start_pos = 2 and encoder_output.size(1) = 5,
// cache_position_tensor should be [2, 3, 4, 5, 6].
auto method_meta = ET_UNWRAP(module_->method_meta(kTextModelMethod));
auto first_input_info = ET_UNWRAP(method_meta.input_tensor_meta(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to second_input_info

cache_positions.data(),
{static_cast<int>(seq_len)},
executorch::aten::ScalarType::Long);
auto cache_position_tensor = (numel > 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do something like

if (numel > 1) {
    // `cache_position` goes from start_pos to start_pos + encoder_output.size(1).
    // e.g. if start_pos = 2 and encoder_output.size(1) = 5,
    // cache_position_tensor should be [2, 3, 4, 5, 6].
    for (int64_t i = 0; i < seq_len; ++i) {
      cache_positions[i] = start_pos + i;
    }
    auto cache_position_tensor = ::executorch::extension::from_blob(
            cache_positions.data(),
            {static_cast<int>(seq_len)},
            executorch::aten::ScalarType::Long)
} else {
    // Cache position is size 1.
    auto cache_position_tensor = ::executorch::extension::from_blob(
            &start_pos, {1}, executorch::aten::ScalarType::Long);
}

// `cache_position` goes from start_pos to start_pos + encoder_output.size(1).
// e.g. if start_pos = 2 and encoder_output.size(1) = 5,
// cache_position_tensor should be [2, 3, 4, 5, 6].
auto method_meta = ET_UNWRAP(module_->method_meta(kTextModelMethod));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment like

// Get expected shape of cache position tensor, which should be the second argument

Comment on lines 98 to 101
auto second_input_info = ET_UNWRAP(method_meta.input_tensor_meta(1));
auto second_input_sizes = second_input_info.sizes();
auto numel = second_input_sizes[0];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit later?

@mergennachin mergennachin added this to the 1.0.0 milestone Sep 12, 2025
{static_cast<int>(seq_len)},
executorch::aten::ScalarType::Long);
auto prefill_result = module_->execute(
kTextModelMethod, {cache_position_tensor, encoder_output});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap these two

}

inline runtime::Result<TensorPtr>
populate_start_pos_tensor(Module* module, int64_t& start_pos, int seq_len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some docstring please, like how we assume the second argument is cache position/ start pos and based on the shape to populate the tensor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the name should be populate_start_pos_or_cache_position

@kirklandsign kirklandsign changed the title Update cache position size for llava Update cache position population and arg order for multimodal runner Sep 12, 2025
Copy link
Contributor

@jackzhxng jackzhxng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'll need to land #14238 first

@kirklandsign kirklandsign added the release notes: llm Changes to llm utilities label Sep 12, 2025
"The second input tensor is not 1D tensor. Got dimension (%zu)",
sizes.size());
auto numel = sizes[0];
std::vector<::executorch::aten::SizesType> sizes_vec = {numel};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you can remove lines 35 ~ 54 since it's not being used anymore?

const char* method_name,
Module* module,
int64_t& start_pos,
std::vector<int64_t>& cache_positions_underlying_vector,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be concise: cache_positions_vec

// size 1 because model will populate the cache position tensor underneath), or
// a populated tensor for cache position, for the given start_pos and seq_len.
inline runtime::Result<TensorPtr> populate_start_pos_or_cache_position(
const char* method_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give it a default value forward

@kirklandsign kirklandsign merged commit ea4f004 into main Sep 15, 2025
121 of 125 checks passed
@kirklandsign kirklandsign deleted the cache-position-llava branch September 15, 2025 23:40
@larryliu0820
Copy link
Contributor

@pytorchbot cheery-pick onto release/1.0 -c "fixnewfeature"

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 16, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'cheery-pick' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick} ...

Try @pytorchbot --help for more info.

@larryliu0820
Copy link
Contributor

@pytorchbot cherry-pick onto release/1.0 -c fixnewfeature

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 16, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot cherry-pick: error: the following arguments are required: --onto/--into

usage: @pytorchbot cherry-pick --onto ONTO [--fixes FIXES] -c
                               {regression,critical,fixnewfeature,docs,release}

Try @pytorchbot --help for more info.

@larryliu0820
Copy link
Contributor

@pytorchbot cherry-pick --onto release/1.0 -c fixnewfeature

pytorchbot pushed a commit that referenced this pull request Sep 16, 2025
…14225)

For voxtral and phi-3, we construct the cache_position_tensor like
before; for llava, it will construct underneath so we pass in size 1.

(cherry picked from commit ea4f004)
@pytorchbot
Copy link
Collaborator

Cherry picking #14225

The cherry pick PR is at #14343 and it is recommended to link a fixnewfeature cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

StrycekSimon pushed a commit to nxp-upstream/executorch that referenced this pull request Sep 23, 2025
…ytorch#14225)

For voxtral and phi-3, we construct the cache_position_tensor like
before; for llava, it will construct underneath so we pass in size 1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: llm Changes to llm utilities

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants