Skip to content

Requirement for Exact Steps for conversion Pytorch to Litert-LM Model which can execute on NPU backend of a Qualcomm Chipset. #960

@ALZ112

Description

@ALZ112

Hii Team,

I was looking for a exact steps to convert a pytorch model to litert-lm format which I can execute on npu backend on a Qualcomm Chipset.

Experiments I Tried with a Gemma 270m model (google/functiongemma-270m-it) :

High level Flow : pytorch -> tflite -> tflite (npu optimized) -> litertlm.

1. pytorch to tflite converion ( Tried with 2 options for quantize : 'dynamic_int8' , 'fp16' )

    ``` import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM
    MODEL_DIR = "google/functiongemma-270m-it"
    print(f"Loading model from {MODEL_DIR} via HuggingFace transformers...")
    from huggingface_hub import login, snapshot_download
    snapshot_path = snapshot_download(MODEL_DIR, local_dir="./")
    import os
    import ai_edge_torch
    from ai_edge_torch.generative.utilities import converter
    from ai_edge_torch.generative.utilities.export_config import ExportConfig
    from ai_edge_torch.generative.layers import kv_cache
    from ai_edge_torch._config import config  
    ai_edge_torch.config = config
    MODEL_DIR = snapshot_path
    # Load model using ai-edge-torch (required for conversion)
    print(f"Loading model from {MODEL_DIR} via ai-edge-torch...")
    pytorch_model = gemma3.build_model_270m(MODEL_DIR)
    pytorch_model.eval()
    print("Model loaded!")
    
    LITERTLM_OUTPUT_DIR = "TFLite_output"
    os.makedirs(LITERTLM_OUTPUT_DIR, exist_ok=True)
    
    export_config = ExportConfig()
    export_config.kvcache_layout = kv_cache.KV_LAYOUT_TRANSPOSED
    export_config.mask_as_input = True
    
    # Find tokenizer
    TOKENIZER_PATH = f"{MODEL_DIR}/tokenizer.model"
    if not os.path.exists(TOKENIZER_PATH):
        from huggingface_hub import hf_hub_download
        TOKENIZER_PATH = hf_hub_download(
            repo_id="google/functiongemma-270m-it",
            filename="tokenizer.model"
        )
    print(f"Tokenizer: {TOKENIZER_PATH}")
    
    METADATA_PATH = f"{LITERTLM_OUTPUT_DIR}/base_llm_metadata.textproto"
    
    metadata_content = r"""start_token: {
        token_ids: {
            ids: [ 2 ]
        }
    }
    stop_tokens: {
        token_str: "<end_of_turn>"
    }
    stop_tokens: {
        token_str: "<start_function_response>"
    }
    llm_model_type: {
        function_gemma: {}
    }
    """
    
    with open(METADATA_PATH, 'w') as f:
        f.write(metadata_content)
    print(f"Metadata created: {METADATA_PATH}")
    
    print("\n" + "=" * 50)
    print("Converting to .litertlm...")
    print("Time: ~5-15 min (A100)")
    print("=" * 50)
    
    print("Falling back to .tflite...")
    converter.convert_to_tflite(
        pytorch_model,
        output_path=LITERTLM_OUTPUT_DIR,
        output_name_prefix="functiongemma-flutter_prefill_128",
        prefill_seq_len=128,
        kv_cache_max_len=1024,
        quantize="dynamic_int8",
        export_config=export_config,
    )
    print("\n.tflite conversion complete")```
  1. tflite -> tflite (npu optimized) [AOT compile]

    ```import os
    from ai_edge_litert.aot import aot_compile as aot_lib
    from ai_edge_litert.aot.vendors.qualcomm import target as qnn_target
    
    LITERTLM_OUTPUT_DIR = "TFLite_output"
    # CHANGE THIS to your actual QNN SDK folder
    os.environ["QNN_SDK_ROOT"] = "<path to sdk> /qairt/2.42.0.251225"
    
    # Add all QNN backend libraries (choose correct arch for your host machine)
    os.environ["LD_LIBRARY_PATH"] = (
        "<path to sdk >/qairt/2.42.0.251225/lib/x86_64-linux-clang:" +
        os.environ.get("LD_LIBRARY_PATH", "")
    )
    print("QNN_SDK_ROOT =", os.environ["QNN_SDK_ROOT"])
    print("LD_LIBRARY_PATH =", os.environ["LD_LIBRARY_PATH"])
    
    qsm8750 = qnn_target.Target(qnn_target.SocModel.SM8750)
    compiled = aot_lib.aot_compile(f"./{LITERTLM_OUTPUT_DIR}/functiongemma-flutter_prefill_128_q8_ekv1024.tflite", target=[qsm8750])
    
    output_dir = LITERTLM_OUTPUT_DIR
    os.makedirs(output_dir, exist_ok=True)
    compiled.export(output_dir, model_name="functiongemma-flutter_q8_ekv1024_optimized_model_pf_128")```
    
  2. tflite (npu optimized) -> litertlm using litertlm_builder_cli

After doing some setup i could litertlm_builder_cli present in LiteRT-LMrepo to package a litertlm model using tflite model and other components

NOTE : I couldnt find embedder and aux for this gemma model so i had to extract them from so other prebuilt litert lm model which was running on NPU and use them to package this

below command :

   python -m litert_lm.schema.py.litertlm_builder_cli   
system_metadata --str Authors "Amogh"   
llm_metadata --path ./TFLite_output/base_llm_metadata.textproto   
tflite_model --path ./extract_From_lm/model_2_TF_LITE_EMBEDDER.tflite     --model_type embedder   
tflite_model --path ./extract_From_lm/model_3_TF_LITE_AUX.tflite     --model_type aux   
tflite_model --path ./TFLite_output/functiongemma-flutter_q8_ekv1024_optimized_model_pf_128_Qualcomm_SM8750.tflite    
 --model_type prefill_decode     
--backend_constraint npu   
hf_tokenizer --path "<path to hf cache >/.cache/huggingface/hub/models--google--functiongemma-270m-it/snapshots/39eccb091651513a5dfb56892d3714c1b5b8276c/tokenizer.json"   
output --path ./TFLite_output/functiongemma-flutter_q8_ekv1024_optimized_model_Qualcomm_SM8750_pf128.litertlm
  1. Error observed while Execution
    Facing Similar below Error for both quantized and FP16 model : [look detailed_failure_logs.txt for complete logs]

Build :

      bazel build --config=android_arm64 //runtime/engine:litert_lm_main
      bazel build --config=android_arm64 @litert//litert/vendors/qualcomm/dispatch:dispatch_api_so

Execution on device :

       export LD_LIBRARY_PATH=/data/local/tmp/lrt_lm
       export ADSP_LIBRARY_PATH=/data/local/tmp/lrt_lm 
        ./litert_lm_main /data/local/tmp/lrt_lm/model.litertlm 'Explain the history of LiteRT in 3 bullet points'  npu

Error Message:

       ERROR: [qnn_manager.cc:494] Failed to create QNN context: 1002
       ERROR: [dispatch_api.cc:229] Failed to create context from context binary: Failed to create QNN context for     function qnn_partition_7, base address: 0x73f12c1000, size: 5279744
       ERROR: Failed to initialize kernel.
       ERROR: Encountered unresolved custom op: DISPATCH_OP.
       See instructions: https://www.tensorflow.org/lite/guide/ops_custom
       ERROR: Node number 0 (DISPATCH_OP) failed to prepare.
       WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
       F0000 00:00:1771511108.532611   30958 litert_lm_main.cc:175] Check failed: MainHelper(argc, argv) is OK (INTERNAL: ERROR: [runtime/executor/llm_litert_npu_compiled_model_executor.cc:1597]
       └ ERROR: [external/litert/litert/cc/litert_compiled_model.h:847])
       *** Check failure stack trace: *** 

detailed_failure_logs.txt

  1. What I am looking for ?

    1. I am looking for some proven working steps which i can use to convert my custom model to litertlm format and execute on NPU of a Qualcomm Chipset.
    2. what exactly above error means and any possible resolutions where i can continue with above steps ?
    3. Any documentations or resources where i can find litert-lm conversion and execution steps on NPU on a Qualcomm Chipset.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions