qwen3.5-4b 评测结果对不齐,有人能复现吗 #73
Unanswered
coder-james
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
评测结果:
基于 VLMEvalKit
MMStar 72.13 (report 78.3)
MMBench_en 85.6(report 89.4)
SimpleVQA 43.4 (report 44.29)
RealWorldQA 74.9 (report 79.5)
HallusionBench 60.13 (report 65)
评测参数:
Instruct (or non-thinking) mode for general tasks: temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
Beta Was this translation helpful? Give feedback.
All reactions