Skip to content

Conversation

@jackzhxng
Copy link
Contributor

@jackzhxng jackzhxng commented Aug 25, 2025

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 25, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13662

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit fb87bbf with merge base 99e6349 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 25, 2025
jackzhxng added a commit that referenced this pull request Aug 26, 2025
ghstack-source-id: 75f5ea3
Pull Request resolved: #13662
Comment on lines 25 to 26
inline constexpr auto kTokenEmbeddingMethod = "token_embeddings";
inline constexpr auto kTextModelMethod = "decoder";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it backwards compatible...

Keep token_embedding, not token_embeddings

And keep text_model instead of decoder

Copy link
Contributor Author

@jackzhxng jackzhxng Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing that needs to be kept backwards compatible, this isn't used anywhere atm. I'd like to match this to Optimum

Comment on lines 93 to 110
// 2. Run decoder model for prefill.
// `cache_position` goes from start_pos to start_pos + encoder_output.size(1).
// e.g. if start_pos = 2 and encoder_output.size(1) = 5,
// cache_position_tensor should be [2, 3, 4, 5, 6].
int64_t seq_len = encoder_output.toTensor().size(1);
std::vector<int64_t> cache_positions(seq_len);
for (int64_t i = 0; i < seq_len; ++i) {
cache_positions[i] = start_pos + i;
}
auto cache_position_tensor = ::executorch::extension::from_blob(
cache_positions.data(), {seq_len}, executorch::aten::ScalarType::Long);
auto prefill_result = module_->execute(
kTextModelMethod, {cache_position_tensor, encoder_output});
if (prefill_result.error() != ::executorch::runtime::Error::Ok) {
return ::executorch::runtime::Error::Internal;
}
auto prefill_outputs = prefill_result.get();
auto outputs_res = prefill_outputs[0].toTensor();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @kimishpatel @JacobSzwejbka is this correct way to manage KV Cache indices?

return prefill_result.error();
}
auto prefill_outputs = prefill_result.get();
auto outputs_res = prefill_outputs[0].toTensor();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate if outputs_res.numel() == 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I think adding so many validations for extremely unlikely outcomes makes things too long and hard to read. I think letting this one naturally error out below and returning that error directly is good enough

jackzhxng added a commit that referenced this pull request Aug 26, 2025
ghstack-source-id: 98599b7
Pull Request resolved: #13662
@jackzhxng jackzhxng added the release notes: llm Changes to llm utilities label Aug 26, 2025
jackzhxng added a commit that referenced this pull request Aug 26, 2025
ghstack-source-id: 98599b7
Pull Request resolved: #13662
@jackzhxng jackzhxng requested a review from mergennachin August 26, 2025 19:53
@mergennachin
Copy link
Contributor

mergennachin commented Aug 26, 2025

consider adding unit tests similar to what mengwei added in vision-text version

@mergennachin
Copy link
Contributor

Also, CI failures look legit

@jackzhxng
Copy link
Contributor Author

Yeah I'm fixing the CI issue

jackzhxng added a commit that referenced this pull request Aug 26, 2025
ghstack-source-id: db16778
Pull Request resolved: #13662
Comment on lines +177 to +179
Audio& get_audio() & {
return std::get<Audio>(data_);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed? like do we ever return mutable Audio?

Copy link
Contributor Author

@jackzhxng jackzhxng Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking the same too, this is following already established pattern, I'm thinking of getting rid of all of these get_ variants later

Comment on lines +93 to +97
// 2. Run decoder model for prefill.
// `cache_position` goes from start_pos to start_pos + encoder_output.size(1).
// e.g. if start_pos = 2 and encoder_output.size(1) = 5,
// cache_position_tensor should be [2, 3, 4, 5, 6].
int64_t seq_len = encoder_output.toTensor().size(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didnt vision based multimodal need exactly the same thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vision also takes this path

@jackzhxng jackzhxng requested a review from lucylq as a code owner August 28, 2025 19:36
@kirklandsign kirklandsign self-requested a review August 28, 2025 20:21
@jackzhxng jackzhxng merged commit fb87bbf into gh/jackzhxng/30/base Sep 2, 2025
111 of 112 checks passed
@jackzhxng jackzhxng deleted the gh/jackzhxng/30/head branch September 2, 2025 02:54
jackzhxng added a commit that referenced this pull request Sep 3, 2025
(Messed up the merge for the original stack, this is reland. Original PR with comments here - #13662)


Differential Revision: [D81498750](https://our.internmc.facebook.com/intern/diff/D81498750)

[ghstack-poisoned]
jackzhxng added a commit that referenced this pull request Sep 3, 2025
(Messed up the merge for the original stack, this is reland. Original PR with comments here - #13662)


Differential Revision: [D81498750](https://our.internmc.facebook.com/intern/diff/D81498750)

[ghstack-poisoned]
kirklandsign pushed a commit that referenced this pull request Sep 3, 2025
ghstack-source-id: 9f1fca5
Pull Request resolved: #13662
jackzhxng added a commit that referenced this pull request Sep 4, 2025
(Messed up the merge for the original stack, this is reland. Original PR with comments here - #13662)


Differential Revision: [D81498750](https://our.internmc.facebook.com/intern/diff/D81498750)

[ghstack-poisoned]
jackzhxng added a commit that referenced this pull request Sep 4, 2025
(Messed up the merge for the original stack, this is reland. Original PR with comments here - #13662)


Differential Revision: [D81498750](https://our.internmc.facebook.com/intern/diff/D81498750)

[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: llm Changes to llm utilities

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants