Harmonize the prefill signature name.

hheydary · copybara-github · commit c1e5ce7353a3 · 2025-01-15T13:13:35.000-08:00
PiperOrigin-RevId: 715917372
diff --git a/ai_edge_torch/generative/README.md b/ai_edge_torch/generative/README.md
@@ -18,7 +18,7 @@ The system is designed to help ML practitioners deploy their trained Large Langu
 * [Convert](#convert-pytorch-llm-to-a-tflite-model) the model, and get a TFLite Flatbuffer representing the mobile model.
 * Choose either approach below to deploy the end to end [LLM Inference Pipeline](#end-to-end-inference-pipeline).
 
-For a more detailed explaination of how the system works, please refer to the [System Overview](doc/system_overview.md).
+For a more detailed explanation of how the system works, please refer to the [System Overview](doc/system_overview.md).
 
 ### Model Authoring using Edge Generative API
 
@@ -67,7 +67,7 @@ https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a
 Then export the model to TFLite with:
 https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/test_models/toy_model_with_kv_cache.py#L133-L139
 
-Please note that using the `prefill` and `decode` method conventions are required for easy integration into the Mediapipe LLM Inference API.
+Please note that using the `prefill_{SEQ-LEN}` and `decode` method conventions are required for easy integration into the Mediapipe LLM Inference API.
 
 To further optimize the on-device execution, a model can be exported with more than one prefill signature. As such, we use `prefill_{SEQ-LENS}` to export models with multiple prefill sequence lengths. During inference, the signature closest the input sequence length is used to minimize throwaway results.
 
@@ -137,7 +137,7 @@ For an end-to-end example showing how to author, convert, quantize and execute,
 ## What to expect
 
 ### Future Roadmap
-* Expanded accleration support on mobile, and web GPUs, and mobile NPUs.
+* Expanded acceleration support on mobile, and web GPUs, and mobile NPUs.
 * Advanced quantization approaches suitable for LLMs.
 * Expanded support of models, including Diffusion models.
 * LoRA support.
diff --git a/ai_edge_torch/generative/examples/README.md b/ai_edge_torch/generative/examples/README.md
@@ -200,8 +200,8 @@ export PYTHONPATH=$PWD/gemma_pytorch:$PYTHONPATH
 In this step, we use the `ai_edge_torch`'s standard multi-signature conversion API to convert PyTorch `nn.Module` to a single TFLite flatbuffer for on-device execution. For example, in `tiny_llama/convert_to_tflite.py`, we use this python code to convert the `TinyLlama` model to a multi-signature TFLite model:
 https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/tiny_llama/convert_to_tflite.py#L26-L61
 
-Once converted, you will get a `.tflite` model which will be ready for on-device execution. Note that the `.tflite` model generated uses static shapes. Inside the generated `.tflite` model, there will be two signatures defined (two entrypoints to the model):
-1) `prefill`: taking 2 tensor inputs `prefill_tokens`, `prefill_input_pos`. With shape `(BATCH_SIZE, PREFILL_SEQ_LEN)` and `(PREFILL_SEQ_LEN)`.
+Once converted, you will get a `.tflite` model which will be ready for on-device execution. Note that the `.tflite` model generated uses static shapes. Inside the generated `.tflite` model, there will be two signatures defined (two entry points to the model):
+1) `prefill_*`: taking 2 tensor inputs `prefill_tokens`, `prefill_input_pos`. With shape `(BATCH_SIZE, PREFILL_SEQ_LEN)` and `(PREFILL_SEQ_LEN)`.
 2) `decode`: taking 2 tensor inputs `decode_token`, `decode_input_pos`. With shape `(1, 1)` and `(1)`.
 To learn more about TFLite signatures, please refer to this [article](https://www.tensorflow.org/lite/guide/signatures).
 
diff --git a/ai_edge_torch/generative/utilities/converter.py b/ai_edge_torch/generative/utilities/converter.py
@@ -167,10 +167,7 @@ def _export_helper(
       prefill_seq_len = prefill_seq_lens[i]
       prefill_tokens = prefill_tokens_list[i]
       prefill_input_pos = prefill_input_pos_list[i]
-      if i == 0 and len(prefill_seq_lens) == 1:
-        prefill_signature_name = 'prefill'
-      else:
-        prefill_signature_name = f'prefill_{prefill_seq_len}'
+      prefill_signature_name = f'prefill_{prefill_seq_len}'
 
       sample_kwargs = {
           'tokens': prefill_tokens,