Skip to content

qnn_llama_runner on SA8295 outputs repetitive “sp” with Qwen3-1.7B after ExecuTorch export #15954

@lansexinhu

Description

@lansexinhu

🐛 Describe the bug

use main commit b4d72f1

I have tried some settings, the setting:( when I use other setting, the convert would be failed, and the error
" some op has incorrect Value 68, expected >= 73"
or
" [ERROR] [Qnn ExecuTorch]: fa_alloc.cc:2462::ERROR:graph requires estimated allocation of 2315388 KB, limit is 2097152 KB [ERROR] [Qnn ExecuTorch]: graph_prepare.cc:845::ERROR:error during serialize: memory usage too large",

When using default_quant_dtype = QuantDtype.use_8a8w and disabling the 16a4w_block quantization, the quantization/conversion completes successfully
`
class Qwen3_1_7BQuantRecipe(StaticLLMQuantRecipe):
default_quant_dtype = QuantDtype.use_8a8w
def init(self, verbose: bool = False):
super().init()

    self.recipe = (
        QuantRecipe(
            self.default_quant_dtype,
            False,
            act_observer=MinMaxObserver,
            granularity=QuantGranularity.PER_TENSOR,
            verbose=verbose,
        )
        .add_regex(
            {
                r"output\.conv",
            },
            QuantDtype.use_16a8w,
            False,
            act_observer=MinMaxObserver,
            granularity=QuantGranularity.PER_CHANNEL,
        )
    )
    self.recipe.custom_quant_annotations.append(annotate_kv_8bit)

`

however, when running qnn_llama_runner with Qwen3-1.7B converted via ExecuTorch (hybrid QNN .pte) on a Qualcomm SA8295 device, the model generates a long sequence of “sp” .

<|im_start|>user what is 1+1<|im_end|> <|im_start|>assistant.addHandlertoHaveBeenCalled sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp

I hope to get your help or suggestions. Thanks very much.

Versions

commit b4d72f1

cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: qnnIssues related to Qualcomm's QNN delegate and code under backends/qualcomm/partner: qualcommFor backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions