MMLU evaluation fails with Qwen3


I'm trying to evaluate Qwen/Qwen3-30B-A3 on MMLU. I have my model deployed on vLLM on my local machine.


When I run the task normally like:
yaml{description: "mmlu:model=Qwen/Qwen3-VL-30B-A3B-Instruct,eval_split=test,subject=clinical_knowledge", groups: ["mmlu_clinical_knowledge"], priority: 1}

It either gives:

KeyError: 'choices'
File "/helm/clients/openai_client.py", line 354, in _make_chat_request
    for raw_completion in response["choices"]:
KeyError: 'choices'
Or it completes the evaluation but does not show anything in raw predicted text .


So I added a parameter:
yaml{description: "mmlu:model=Qwen/Qwen3-VL-30B-A3B-Instruct,eval_split=test,subject=clinical_knowledge,increase_max_tokens=2048", groups: ["mmlu_clinical_knowledge"], priority: 4}
And ran the evals. It did run the evaluations and the answer is present in the generated text, but it was not able to detect it.


Do you have any solution to make it work?

these below are the inference with the latter config 

<img width="1512" height="782" alt="Image" src="https://github.com/user-attachments/assets/9914fdcd-5f6e-409e-9c29-3d877dc9c04c" />
<img width="1512" height="702" alt="Image" src="https://github.com/user-attachments/assets/345c403b-9bcd-43ca-9ba6-c257a47b057a" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMLU evaluation fails with Qwen3 #3942

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MMLU evaluation fails with Qwen3 #3942

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions