Skip to content

主分支eval评估脚本或许有问题 #221

@chesser-y

Description

@chesser-y

对Qwen3-4B-instruct-2507模型在数学上的评估分数与Qwen官方指导的评分有较大的差距(超过 10%),或许评估性能脚本有问题。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions