Requirement for Exact Steps for conversion Pytorch to Litert-LM Model which can execute on NPU backend of a Qualcomm Chipset.

Hii Team,

I was looking for a exact steps to convert a pytorch model to litert-lm format which I can execute on npu backend on a Qualcomm Chipset.

Experiments I Tried with a Gemma 270m model (google/functiongemma-270m-it) : 

High level Flow : **pytorch -> tflite -> tflite (npu optimized) -> litertlm.**


**1. pytorch to tflite converion ( Tried with 2 options for quantize : 'dynamic_int8' , 'fp16' )**

        
        ``` import torch
        from transformers import AutoTokenizer, AutoModelForCausalLM
        MODEL_DIR = "google/functiongemma-270m-it"
        print(f"Loading model from {MODEL_DIR} via HuggingFace transformers...")
        from huggingface_hub import login, snapshot_download
        snapshot_path = snapshot_download(MODEL_DIR, local_dir="./")
        import os
        import ai_edge_torch
        from ai_edge_torch.generative.utilities import converter
        from ai_edge_torch.generative.utilities.export_config import ExportConfig
        from ai_edge_torch.generative.layers import kv_cache
        from ai_edge_torch._config import config  
        ai_edge_torch.config = config
        MODEL_DIR = snapshot_path
        # Load model using ai-edge-torch (required for conversion)
        print(f"Loading model from {MODEL_DIR} via ai-edge-torch...")
        pytorch_model = gemma3.build_model_270m(MODEL_DIR)
        pytorch_model.eval()
        print("Model loaded!")
        
        LITERTLM_OUTPUT_DIR = "TFLite_output"
        os.makedirs(LITERTLM_OUTPUT_DIR, exist_ok=True)
        
        export_config = ExportConfig()
        export_config.kvcache_layout = kv_cache.KV_LAYOUT_TRANSPOSED
        export_config.mask_as_input = True
        
        # Find tokenizer
        TOKENIZER_PATH = f"{MODEL_DIR}/tokenizer.model"
        if not os.path.exists(TOKENIZER_PATH):
            from huggingface_hub import hf_hub_download
            TOKENIZER_PATH = hf_hub_download(
                repo_id="google/functiongemma-270m-it",
                filename="tokenizer.model"
            )
        print(f"Tokenizer: {TOKENIZER_PATH}")
        
        METADATA_PATH = f"{LITERTLM_OUTPUT_DIR}/base_llm_metadata.textproto"
        
        metadata_content = r"""start_token: {
            token_ids: {
                ids: [ 2 ]
            }
        }
        stop_tokens: {
            token_str: "<end_of_turn>"
        }
        stop_tokens: {
            token_str: "<start_function_response>"
        }
        llm_model_type: {
            function_gemma: {}
        }
        """
        
        with open(METADATA_PATH, 'w') as f:
            f.write(metadata_content)
        print(f"Metadata created: {METADATA_PATH}")
        
        print("\n" + "=" * 50)
        print("Converting to .litertlm...")
        print("Time: ~5-15 min (A100)")
        print("=" * 50)
        
        print("Falling back to .tflite...")
        converter.convert_to_tflite(
            pytorch_model,
            output_path=LITERTLM_OUTPUT_DIR,
            output_name_prefix="functiongemma-flutter_prefill_128",
            prefill_seq_len=128,
            kv_cache_max_len=1024,
            quantize="dynamic_int8",
            export_config=export_config,
        )
        print("\n.tflite conversion complete")```

2.  **tflite -> tflite (npu optimized) [[AOT compile](https://github.com/google-ai-edge/litert-samples/blob/main/compiled_model_api/colab/LiteRT_AOT_Compilation_Tutorial.ipynb)]**
        
        ```import os
        from ai_edge_litert.aot import aot_compile as aot_lib
        from ai_edge_litert.aot.vendors.qualcomm import target as qnn_target
        
        LITERTLM_OUTPUT_DIR = "TFLite_output"
        # CHANGE THIS to your actual QNN SDK folder
        os.environ["QNN_SDK_ROOT"] = "<path to sdk> /qairt/2.42.0.251225"
        
        # Add all QNN backend libraries (choose correct arch for your host machine)
        os.environ["LD_LIBRARY_PATH"] = (
            "<path to sdk >/qairt/2.42.0.251225/lib/x86_64-linux-clang:" +
            os.environ.get("LD_LIBRARY_PATH", "")
        )
        print("QNN_SDK_ROOT =", os.environ["QNN_SDK_ROOT"])
        print("LD_LIBRARY_PATH =", os.environ["LD_LIBRARY_PATH"])

        qsm8750 = qnn_target.Target(qnn_target.SocModel.SM8750)
        compiled = aot_lib.aot_compile(f"./{LITERTLM_OUTPUT_DIR}/functiongemma-flutter_prefill_128_q8_ekv1024.tflite", target=[qsm8750])
        
        output_dir = LITERTLM_OUTPUT_DIR
        os.makedirs(output_dir, exist_ok=True)
        compiled.export(output_dir, model_name="functiongemma-flutter_q8_ekv1024_optimized_model_pf_128")```

3. **tflite (npu optimized) -> litertlm  using  **litertlm_builder_cli**** 

  After doing some setup i could litertlm_builder_cli  present in [LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM )repo to package a litertlm model using tflite model and other components 

  **NOTE : I couldnt find embedder  and aux for this gemma model so i had to extract them from so other prebuilt litert lm model which was running on NPU and use them to package this**

below command : 
```shell
   python -m litert_lm.schema.py.litertlm_builder_cli   
system_metadata --str Authors "Amogh"   
llm_metadata --path ./TFLite_output/base_llm_metadata.textproto   
tflite_model --path ./extract_From_lm/model_2_TF_LITE_EMBEDDER.tflite     --model_type embedder   
tflite_model --path ./extract_From_lm/model_3_TF_LITE_AUX.tflite     --model_type aux   
tflite_model --path ./TFLite_output/functiongemma-flutter_q8_ekv1024_optimized_model_pf_128_Qualcomm_SM8750.tflite    
 --model_type prefill_decode     
--backend_constraint npu   
hf_tokenizer --path "<path to hf cache >/.cache/huggingface/hub/models--google--functiongemma-270m-it/snapshots/39eccb091651513a5dfb56892d3714c1b5b8276c/tokenizer.json"   
output --path ./TFLite_output/functiongemma-flutter_q8_ekv1024_optimized_model_Qualcomm_SM8750_pf128.litertlm
``` 


4. **Error observed while Execution**
       Facing Similar below Error for both quantized and FP16 model : [look detailed_failure_logs.txt for complete logs]
     
 Build : 
   ```shell
         bazel build --config=android_arm64 //runtime/engine:litert_lm_main
         bazel build --config=android_arm64 @litert//litert/vendors/qualcomm/dispatch:dispatch_api_so
   ```
Execution on device : 
   ```shell
          export LD_LIBRARY_PATH=/data/local/tmp/lrt_lm
          export ADSP_LIBRARY_PATH=/data/local/tmp/lrt_lm 
           ./litert_lm_main /data/local/tmp/lrt_lm/model.litertlm 'Explain the history of LiteRT in 3 bullet points'  npu
   ```
Error Message: 
   ```shell
          ERROR: [qnn_manager.cc:494] Failed to create QNN context: 1002
          ERROR: [dispatch_api.cc:229] Failed to create context from context binary: Failed to create QNN context for     function qnn_partition_7, base address: 0x73f12c1000, size: 5279744
          ERROR: Failed to initialize kernel.
          ERROR: Encountered unresolved custom op: DISPATCH_OP.
          See instructions: https://www.tensorflow.org/lite/guide/ops_custom
          ERROR: Node number 0 (DISPATCH_OP) failed to prepare.
          WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
          F0000 00:00:1771511108.532611   30958 litert_lm_main.cc:175] Check failed: MainHelper(argc, argv) is OK (INTERNAL: ERROR: [runtime/executor/llm_litert_npu_compiled_model_executor.cc:1597]
          └ ERROR: [external/litert/litert/cc/litert_compiled_model.h:847])
          *** Check failure stack trace: *** 
   ```

[detailed_failure_logs.txt](https://github.com/user-attachments/files/25439053/detailed_failure_logs.txt)

5. **What I am looking for ?**
     
    1. I am looking for some proven working steps which i can use to convert my custom model to litertlm format and execute on NPU of a Qualcomm Chipset.
    2. what exactly above error means and any possible resolutions where i can continue with above steps ?
    3. Any documentations or resources where i can find litert-lm conversion and execution steps on NPU on a Qualcomm Chipset.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requirement for Exact Steps for conversion Pytorch to Litert-LM Model which can execute on NPU backend of a Qualcomm Chipset. #960

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Requirement for Exact Steps for conversion Pytorch to Litert-LM Model which can execute on NPU backend of a Qualcomm Chipset. #960

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions