Update base for Update on "[llm] Support different shape of input_pos"

larryliu0820 · larryliu0820 · commit 5525c1fdb49d · 2025-06-24T20:57:19.000-07:00
For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor. This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`. To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`. Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/) [ghstack-poisoned]
diff --git a/kernels/portable/cpu/util/arange_util.cpp b/kernels/portable/cpu/util/arange_util.cpp
@@ -19,11 +19,16 @@ namespace torch::executor::native {
 
 Tensor::SizesType
 compute_arange_out_size(double start, double end, double step) {
-  ET_CHECK_MSG(
-      end > start, "end (%f) must be greater than start (%f)", end, start);
-  ET_CHECK_MSG(step > 0, "step must be positive, got %f", step);
   Tensor::SizesType numel =
       static_cast<Tensor::SizesType>(std::ceil((end - start) / step));
+
+  ET_CHECK_MSG(
+      numel >= 0,
+      "numel should be non-negative, but got (%d). start (%f), end (%f), step (%f)",
+      numel,
+      start,
+      end,
+      step);
   return numel;
 }