Skip to content

MMLU evaluation fails with Qwen3 #3942

@shaafsalman

Description

@shaafsalman

I'm trying to evaluate Qwen/Qwen3-30B-A3 on MMLU. I have my model deployed on vLLM on my local machine.

When I run the task normally like:
yaml{description: "mmlu:model=Qwen/Qwen3-VL-30B-A3B-Instruct,eval_split=test,subject=clinical_knowledge", groups: ["mmlu_clinical_knowledge"], priority: 1}

It either gives:

KeyError: 'choices'
File "/helm/clients/openai_client.py", line 354, in _make_chat_request
for raw_completion in response["choices"]:
KeyError: 'choices'
Or it completes the evaluation but does not show anything in raw predicted text .

So I added a parameter:
yaml{description: "mmlu:model=Qwen/Qwen3-VL-30B-A3B-Instruct,eval_split=test,subject=clinical_knowledge,increase_max_tokens=2048", groups: ["mmlu_clinical_knowledge"], priority: 4}
And ran the evals. It did run the evaluations and the answer is present in the generated text, but it was not able to detect it.

Do you have any solution to make it work?

these below are the inference with the latter config

Image Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions