Add gaia results for qwen-3-coder #536

all-hands-bot · 2026-02-09T23:29:54Z

Evaluation Results

Model: qwen-3-coder
Benchmark: gaia
Agent Version: v1.11.0

Results

Accuracy: 24.8%
Total Cost: $0.00
Average Instance Cost: $0.00
Total Duration: 113930s (1898.8m)
Average Instance Runtime: 690s

Report Summary

Total instances: 165
Submitted instances: 165
Resolved instances: 41
Unresolved instances: 124
Empty patch instances: 0
Error instances: 0

Additional Metadata

completed_instances: 165
incomplete_instances: 0

This PR was automatically created by the evaluation pipeline.

github-actions · 2026-02-09T23:30:29Z

📊 Progress Report

============================================================
OpenHands Index Results - Progress Report
============================================================

Target: Complete all model × benchmark pairs
  11 models × 5 benchmarks = 55 pairs
  (each pair requires all 3 metrics: score, cost_per_instance, average_runtime)

============================================================
OVERALL PROGRESS: ⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛ 100.0%
  Complete: 55 / 55 pairs
============================================================

❌ Schema Validation

============================================================
Schema Validation Report
============================================================

Results directory: /home/runner/work/openhands-index-results/openhands-index-results/results
Files validated: 28
  Passed: 27
  Failed: 1

Errors:
  - /home/runner/work/openhands-index-results/openhands-index-results/results/qwen-3-coder/scores.json: Entry 0:
  • Field 'cost_per_instance': Input should be greater than 0 (got: 0.0)

============================================================
VALIDATION FAILED
============================================================

This report measures progress towards the 3D array goal (benchmarks × models × metrics) as described in #2.

openhands-ai · 2026-02-09T23:30:42Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Measure Progress

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #536 at branch `eval/qwen-3-coder/gaia-20260209-232952`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

Add gaia results for qwen-3-coder

6d2ab69

all-hands-bot requested a review from juanmichelini February 9, 2026 23:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gaia results for qwen-3-coder #536

Add gaia results for qwen-3-coder #536

Uh oh!

all-hands-bot commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

openhands-ai bot commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add gaia results for qwen-3-coder #536

Are you sure you want to change the base?

Add gaia results for qwen-3-coder #536

Uh oh!

Conversation

all-hands-bot commented Feb 9, 2026

Evaluation Results

Results

Report Summary

Additional Metadata

Uh oh!

github-actions bot commented Feb 9, 2026

📊 Progress Report

❌ Schema Validation

Uh oh!

openhands-ai bot commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants