-
Notifications
You must be signed in to change notification settings - Fork 148
Description
Hii Team,
I was looking for a exact steps to convert a pytorch model to litert-lm format which I can execute on npu backend on a Qualcomm Chipset.
Experiments I Tried with a Gemma 270m model (google/functiongemma-270m-it) :
High level Flow : pytorch -> tflite -> tflite (npu optimized) -> litertlm.
1. pytorch to tflite converion ( Tried with 2 options for quantize : 'dynamic_int8' , 'fp16' )
``` import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
MODEL_DIR = "google/functiongemma-270m-it"
print(f"Loading model from {MODEL_DIR} via HuggingFace transformers...")
from huggingface_hub import login, snapshot_download
snapshot_path = snapshot_download(MODEL_DIR, local_dir="./")
import os
import ai_edge_torch
from ai_edge_torch.generative.utilities import converter
from ai_edge_torch.generative.utilities.export_config import ExportConfig
from ai_edge_torch.generative.layers import kv_cache
from ai_edge_torch._config import config
ai_edge_torch.config = config
MODEL_DIR = snapshot_path
# Load model using ai-edge-torch (required for conversion)
print(f"Loading model from {MODEL_DIR} via ai-edge-torch...")
pytorch_model = gemma3.build_model_270m(MODEL_DIR)
pytorch_model.eval()
print("Model loaded!")
LITERTLM_OUTPUT_DIR = "TFLite_output"
os.makedirs(LITERTLM_OUTPUT_DIR, exist_ok=True)
export_config = ExportConfig()
export_config.kvcache_layout = kv_cache.KV_LAYOUT_TRANSPOSED
export_config.mask_as_input = True
# Find tokenizer
TOKENIZER_PATH = f"{MODEL_DIR}/tokenizer.model"
if not os.path.exists(TOKENIZER_PATH):
from huggingface_hub import hf_hub_download
TOKENIZER_PATH = hf_hub_download(
repo_id="google/functiongemma-270m-it",
filename="tokenizer.model"
)
print(f"Tokenizer: {TOKENIZER_PATH}")
METADATA_PATH = f"{LITERTLM_OUTPUT_DIR}/base_llm_metadata.textproto"
metadata_content = r"""start_token: {
token_ids: {
ids: [ 2 ]
}
}
stop_tokens: {
token_str: "<end_of_turn>"
}
stop_tokens: {
token_str: "<start_function_response>"
}
llm_model_type: {
function_gemma: {}
}
"""
with open(METADATA_PATH, 'w') as f:
f.write(metadata_content)
print(f"Metadata created: {METADATA_PATH}")
print("\n" + "=" * 50)
print("Converting to .litertlm...")
print("Time: ~5-15 min (A100)")
print("=" * 50)
print("Falling back to .tflite...")
converter.convert_to_tflite(
pytorch_model,
output_path=LITERTLM_OUTPUT_DIR,
output_name_prefix="functiongemma-flutter_prefill_128",
prefill_seq_len=128,
kv_cache_max_len=1024,
quantize="dynamic_int8",
export_config=export_config,
)
print("\n.tflite conversion complete")```
-
tflite -> tflite (npu optimized) [AOT compile]
```import os from ai_edge_litert.aot import aot_compile as aot_lib from ai_edge_litert.aot.vendors.qualcomm import target as qnn_target LITERTLM_OUTPUT_DIR = "TFLite_output" # CHANGE THIS to your actual QNN SDK folder os.environ["QNN_SDK_ROOT"] = "<path to sdk> /qairt/2.42.0.251225" # Add all QNN backend libraries (choose correct arch for your host machine) os.environ["LD_LIBRARY_PATH"] = ( "<path to sdk >/qairt/2.42.0.251225/lib/x86_64-linux-clang:" + os.environ.get("LD_LIBRARY_PATH", "") ) print("QNN_SDK_ROOT =", os.environ["QNN_SDK_ROOT"]) print("LD_LIBRARY_PATH =", os.environ["LD_LIBRARY_PATH"]) qsm8750 = qnn_target.Target(qnn_target.SocModel.SM8750) compiled = aot_lib.aot_compile(f"./{LITERTLM_OUTPUT_DIR}/functiongemma-flutter_prefill_128_q8_ekv1024.tflite", target=[qsm8750]) output_dir = LITERTLM_OUTPUT_DIR os.makedirs(output_dir, exist_ok=True) compiled.export(output_dir, model_name="functiongemma-flutter_q8_ekv1024_optimized_model_pf_128")``` -
tflite (npu optimized) -> litertlm using litertlm_builder_cli
After doing some setup i could litertlm_builder_cli present in LiteRT-LMrepo to package a litertlm model using tflite model and other components
NOTE : I couldnt find embedder and aux for this gemma model so i had to extract them from so other prebuilt litert lm model which was running on NPU and use them to package this
below command :
python -m litert_lm.schema.py.litertlm_builder_cli
system_metadata --str Authors "Amogh"
llm_metadata --path ./TFLite_output/base_llm_metadata.textproto
tflite_model --path ./extract_From_lm/model_2_TF_LITE_EMBEDDER.tflite --model_type embedder
tflite_model --path ./extract_From_lm/model_3_TF_LITE_AUX.tflite --model_type aux
tflite_model --path ./TFLite_output/functiongemma-flutter_q8_ekv1024_optimized_model_pf_128_Qualcomm_SM8750.tflite
--model_type prefill_decode
--backend_constraint npu
hf_tokenizer --path "<path to hf cache >/.cache/huggingface/hub/models--google--functiongemma-270m-it/snapshots/39eccb091651513a5dfb56892d3714c1b5b8276c/tokenizer.json"
output --path ./TFLite_output/functiongemma-flutter_q8_ekv1024_optimized_model_Qualcomm_SM8750_pf128.litertlm- Error observed while Execution
Facing Similar below Error for both quantized and FP16 model : [look detailed_failure_logs.txt for complete logs]
Build :
bazel build --config=android_arm64 //runtime/engine:litert_lm_main
bazel build --config=android_arm64 @litert//litert/vendors/qualcomm/dispatch:dispatch_api_soExecution on device :
export LD_LIBRARY_PATH=/data/local/tmp/lrt_lm
export ADSP_LIBRARY_PATH=/data/local/tmp/lrt_lm
./litert_lm_main /data/local/tmp/lrt_lm/model.litertlm 'Explain the history of LiteRT in 3 bullet points' npuError Message:
ERROR: [qnn_manager.cc:494] Failed to create QNN context: 1002
ERROR: [dispatch_api.cc:229] Failed to create context from context binary: Failed to create QNN context for function qnn_partition_7, base address: 0x73f12c1000, size: 5279744
ERROR: Failed to initialize kernel.
ERROR: Encountered unresolved custom op: DISPATCH_OP.
See instructions: https://www.tensorflow.org/lite/guide/ops_custom
ERROR: Node number 0 (DISPATCH_OP) failed to prepare.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1771511108.532611 30958 litert_lm_main.cc:175] Check failed: MainHelper(argc, argv) is OK (INTERNAL: ERROR: [runtime/executor/llm_litert_npu_compiled_model_executor.cc:1597]
└ ERROR: [external/litert/litert/cc/litert_compiled_model.h:847])
*** Check failure stack trace: *** -
What I am looking for ?
- I am looking for some proven working steps which i can use to convert my custom model to litertlm format and execute on NPU of a Qualcomm Chipset.
- what exactly above error means and any possible resolutions where i can continue with above steps ?
- Any documentations or resources where i can find litert-lm conversion and execution steps on NPU on a Qualcomm Chipset.