You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please note that using the `prefill` and `decode` method conventions are required for easy integration into the Mediapipe LLM Inference API.
70
+
Please note that using the `prefill_{SEQ-LEN}` and `decode` method conventions are required for easy integration into the Mediapipe LLM Inference API.
71
71
72
72
To further optimize the on-device execution, a model can be exported with more than one prefill signature. As such, we use `prefill_{SEQ-LENS}` to export models with multiple prefill sequence lengths. During inference, the signature closest the input sequence length is used to minimize throwaway results.
73
73
@@ -137,7 +137,7 @@ For an end-to-end example showing how to author, convert, quantize and execute,
137
137
## What to expect
138
138
139
139
### Future Roadmap
140
-
* Expanded accleration support on mobile, and web GPUs, and mobile NPUs.
140
+
* Expanded acceleration support on mobile, and web GPUs, and mobile NPUs.
141
141
* Advanced quantization approaches suitable for LLMs.
142
142
* Expanded support of models, including Diffusion models.
In this step, we use the `ai_edge_torch`'s standard multi-signature conversion API to convert PyTorch `nn.Module` to a single TFLite flatbuffer for on-device execution. For example, in `tiny_llama/convert_to_tflite.py`, we use this python code to convert the `TinyLlama` model to a multi-signature TFLite model:
Once converted, you will get a `.tflite` model which will be ready for on-device execution. Note that the `.tflite` model generated uses static shapes. Inside the generated `.tflite` model, there will be two signatures defined (two entrypoints to the model):
204
-
1)`prefill`: taking 2 tensor inputs `prefill_tokens`, `prefill_input_pos`. With shape `(BATCH_SIZE, PREFILL_SEQ_LEN)` and `(PREFILL_SEQ_LEN)`.
203
+
Once converted, you will get a `.tflite` model which will be ready for on-device execution. Note that the `.tflite` model generated uses static shapes. Inside the generated `.tflite` model, there will be two signatures defined (two entry points to the model):
204
+
1)`prefill_*`: taking 2 tensor inputs `prefill_tokens`, `prefill_input_pos`. With shape `(BATCH_SIZE, PREFILL_SEQ_LEN)` and `(PREFILL_SEQ_LEN)`.
205
205
2)`decode`: taking 2 tensor inputs `decode_token`, `decode_input_pos`. With shape `(1, 1)` and `(1)`.
206
206
To learn more about TFLite signatures, please refer to this [article](https://www.tensorflow.org/lite/guide/signatures).
0 commit comments