lemonade-eval evaluation fails on NPU

**1. lemonade-eval -i  Qwen-2.5-1.5B-Instruct-NPU load lm-eval-harness --task gsm8k --limit 10**

<img width="1567" height="994" alt="Image" src="https://github.com/user-attachments/assets/619d3ed5-d637-466a-b1b6-44339732ecda" />

**2. The evaluation score using the accuracy-mmlu tool is 0 or very low, the model's results appear highly suspicious and should not be this low.**
  lemonade-eval -i  Qwen3-4B-Hybrid load accuracy-mmlu --tests management
  lemonade-eval -i  Qwen-2.5-1.5B-Instruct-NPU load accuracy-mmlu --tests management

<img width="1434" height="780" alt="Image" src="https://github.com/user-attachments/assets/dcd306ff-f906-43cf-b5b8-22a2846c579e" />

**3. The accuracy-perplexity tool cannot evaluate perplexity.**
lemonade-eval -i C:\sj\AMD_model\Qwen-2.5_1.5B_Instruct-onnx-ryzenai-1.7-hybrid oga-load --device hybrid --dtype int4 accuracy-perplexity
<img width="1845" height="890" alt="Image" src="https://github.com/user-attachments/assets/ce74ef91-d844-4e52-9f93-2ef676cb1251" />

lemonade-eval -i amd/Llama-3.2-1B-Instruct-onnx-ryzenai-1.7-hybrid oga-load --device hybrid --dtype int4 accuracy-perplexity
<img width="1919" height="361" alt="Image" src="https://github.com/user-attachments/assets/6dde976f-eead-40c3-bacd-a54c2ea94622" />

**4.The model support list is inaccessible.**
https://lemonade-server.ai/docs/server/server_models/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lemonade-eval evaluation fails on NPU #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

lemonade-eval evaluation fails on NPU #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions