-
Notifications
You must be signed in to change notification settings - Fork 737
Description
🐛 Describe the bug
use main commit b4d72f1
I have tried some settings, the setting:( when I use other setting, the convert would be failed, and the error
" some op has incorrect Value 68, expected >= 73"
or
" [ERROR] [Qnn ExecuTorch]: fa_alloc.cc:2462::ERROR:graph requires estimated allocation of 2315388 KB, limit is 2097152 KB [ERROR] [Qnn ExecuTorch]: graph_prepare.cc:845::ERROR:error during serialize: memory usage too large",
When using default_quant_dtype = QuantDtype.use_8a8w and disabling the 16a4w_block quantization, the quantization/conversion completes successfully
`
class Qwen3_1_7BQuantRecipe(StaticLLMQuantRecipe):
default_quant_dtype = QuantDtype.use_8a8w
def init(self, verbose: bool = False):
super().init()
self.recipe = (
QuantRecipe(
self.default_quant_dtype,
False,
act_observer=MinMaxObserver,
granularity=QuantGranularity.PER_TENSOR,
verbose=verbose,
)
.add_regex(
{
r"output\.conv",
},
QuantDtype.use_16a8w,
False,
act_observer=MinMaxObserver,
granularity=QuantGranularity.PER_CHANNEL,
)
)
self.recipe.custom_quant_annotations.append(annotate_kv_8bit)
`
however, when running qnn_llama_runner with Qwen3-1.7B converted via ExecuTorch (hybrid QNN .pte) on a Qualcomm SA8295 device, the model generates a long sequence of “sp” .
<|im_start|>user what is 1+1<|im_end|> <|im_start|>assistant.addHandlertoHaveBeenCalled sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp
I hope to get your help or suggestions. Thanks very much.
Versions
commit b4d72f1
cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin