Skip to content

Commit e1744fe

Browse files
committed
Update on "[llm] Support different shape of input_pos"
For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor. This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`. To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`. Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/) [ghstack-poisoned]
2 parents 1ac0a14 + 11ff1a5 commit e1744fe

File tree

1 file changed

+14
-10
lines changed

1 file changed

+14
-10
lines changed

extension/llm/runner/text_decoder_runner.cpp

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -43,25 +43,29 @@ ::executorch::runtime::Result<executorch::aten::Tensor> TextDecoderRunner::step(
4343
auto second_input_info = ET_UNWRAP(method_meta.input_tensor_meta(1));
4444
// For input_pos, numel is 1, for cache_positions, numel is max_seq_len
4545
auto sizes = second_input_info.sizes();
46-
auto numel = 1;
47-
std::vector<::executorch::aten::SizesType> sizes_vec;
48-
for (const auto& size : sizes) {
49-
sizes_vec.emplace_back(size);
50-
numel *= size;
51-
}
46+
// Assuming 1D tensor
47+
ET_CHECK_OR_RETURN_ERROR(
48+
sizes.size() == 1,
49+
InvalidProgram,
50+
"The second input tensor is not 1D tensor. Got dimension (%zu)",
51+
sizes.size());
52+
auto numel = sizes[0];
53+
std::vector<::executorch::aten::SizesType> sizes_vec = {numel};
54+
5255
// Assuming the last dimension is the one with the variable token length,
5356
// for example [1, S] or [1, 1, S]
5457
sizes_vec[sizes_vec.size() - 1] = numel;
5558
TensorPtr start_pos_tensor;
5659
if (numel > 1) {
5760
// Assuming model is exported with cache_positions, create a tensor with
5861
// the same size as cache_positions
59-
start_pos_tensor = empty({sizes_vec}, ::executorch::aten::ScalarType::Long);
60-
torch::executor::native::arange_out_impl(start_pos, start_pos + numel, 1.0, *start_pos_tensor);
62+
start_pos_tensor = empty(sizes_vec, ::executorch::aten::ScalarType::Long);
63+
torch::executor::native::arange_out_impl(
64+
start_pos, start_pos + numel, 1.0, *start_pos_tensor);
6165
} else {
6266
// Assuming model is exported with input_pos, create a tensor with size 1
63-
start_pos_tensor =
64-
from_blob(&start_pos, {1}, ::executorch::aten::ScalarType::Long);
67+
start_pos_tensor = from_blob(
68+
&start_pos, sizes_vec, ::executorch::aten::ScalarType::Long);
6569
}
6670
ET_LOG(Info, "Start pos tensor numel: %zu", start_pos_tensor->numel());
6771
auto outputs_res = module_->forward({tokens, start_pos_tensor});

0 commit comments

Comments
 (0)