Skip to content

Commit 3b95ef7

Browse files
committed
Update on "[llm] Support different shape of input_pos"
For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor. This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`. To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`. Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/) [ghstack-poisoned]
2 parents 6f07be3 + f2d4483 commit 3b95ef7

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

kernels/portable/cpu/util/arange_util.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,13 @@ void arange_out_impl(
3838
double end,
3939
double step,
4040
Tensor& out) {
41+
(void)ctx;
4142
Tensor::SizesType numel = compute_arange_out_size(start, end, step);
4243
ET_ARANGE_IMPL(ctx, start, numel, step, out, "arange.start_out");
4344
}
4445

4546
void arange_out_impl(KernelRuntimeContext& ctx, double end, Tensor& out) {
47+
(void)ctx;
4648
ET_ARANGE_IMPL(ctx, 0.0, end, 1.0, out, "arange.out");
4749
}
4850

0 commit comments

Comments
 (0)