-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Hello, could you add some details (used code/eval benchmarks) about evaluation for datasets mentioned below?
- HumanEval
- MT-Bench
- LiveCodeBench
You precisely determined in the paper that for MT-Bench you used Qwen 2.5 as a Judge and some details about samples, but maybe you can add more details about used methods/eval frameworks etc.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels