请问推理速度 #4
-
|
我在测试dev-multi-op-in-one-graph分支, 都可以运行(很棒),但qnn速度和cpu速度基本一样,请问你们测试的速度是什么样?并且使用NPU的时候,推理时,cpu也一样很高,这是为什么 我在高通8295中运行qwen2.5 0.5B q4_0量化: ./llama-cli -m ggml-model-q4_0.gguf -t 1 --chat-template chatml -p "我是一个助手" -n 128 |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
|
怎么能够更快一些 |
Beta Was this translation helpful? Give feedback.
-
|
我使用 q4_0 量化模型,我发现 backend-ops.cpp 中 GGML_OP_MUL_MAT 都打印 src0 type 2 and src1 type 0 are not equal, 导致没有调度到NPU上面,所以最终还是cpu的执行效果,请问怎么处理 |
Beta Was this translation helpful? Give feedback.
-
|
hi @scguang301 I am also getting similar problem I am getting both GPU and HTP logs while executing the tiny_llama model so I am actully confused whether model is running on QNN NPU or QNN GPU. llm_load_print_meta: max token length = 48 [qnn_init, 248]: device property is not supported [qnn_init, 258]: device counts 1 system_info: n_threads = 8 (n_threads_batch = 8) / 8 | CPU : NEON = 1 | ARM_FMA = 1 | MATMUL_INT8 = 1 | AARCH64_REPACK = 1 | sampler seed: 3467048278 [{<(Task)>}] [{<(Input)>}] Bread, milk, eggs, chicken, rice, pasta, tomatoes, spinach, bananas, apples, yogurt, cheese, toothpaste, soap, tissues, laundry detergent, coffee, Two proteins: a rotisserie chicken and two 4 oz. fillets of fresh salmon Two veggies: asparagus and carrots a handful of bananas and two avocados cereal 1 rotisserie chicken (or two 4 oz. Fillets), cooked Preheat oven to 375°F. Line a baking dish with parchment paper. |
Beta Was this translation helpful? Give feedback.
-
|
Hi, sorry for the late reply, have an issue already to track the performance here, please have a look |
Beta Was this translation helpful? Give feedback.
Hi, sorry for the late reply, have an issue already to track the performance here, please have a look
#34