Add commit0 results for qwen-3-coder #540

all-hands-bot · 2026-02-09T23:40:28Z

Evaluation Results

Model: qwen-3-coder
Benchmark: commit0
Agent Version: v1.10.0

Results

Accuracy: 6.2%
Total Cost: $0.00
Average Instance Cost: $0.00
Total Duration: 16536s (275.6m)
Average Instance Runtime: 1033s

Report Summary

Total instances: 16
Submitted instances: 16
Resolved instances: 1
Unresolved instances: 15
Empty patch instances: 0
Error instances: 0

Additional Metadata

completed_instances: 16
total_passed_tests: 1510
total_tests: 2250

This PR was automatically created by the evaluation pipeline.

github-actions · 2026-02-09T23:40:42Z

📊 Progress Report

============================================================
OpenHands Index Results - Progress Report
============================================================

Target: Complete all model × benchmark pairs
  11 models × 5 benchmarks = 55 pairs
  (each pair requires all 3 metrics: score, cost_per_instance, average_runtime)

============================================================
OVERALL PROGRESS: ⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛ 100.0%
  Complete: 55 / 55 pairs
============================================================

❌ Schema Validation

============================================================
Schema Validation Report
============================================================

Results directory: /home/runner/work/openhands-index-results/openhands-index-results/results
Files validated: 28
  Passed: 27
  Failed: 1

Errors:
  - /home/runner/work/openhands-index-results/openhands-index-results/results/qwen-3-coder/scores.json: Entry 2:
  • Field 'cost_per_instance': Input should be greater than 0 (got: 0.0)

============================================================
VALIDATION FAILED
============================================================

This report measures progress towards the 3D array goal (benchmarks × models × metrics) as described in #2.

openhands-ai · 2026-02-09T23:40:54Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Measure Progress

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #540 at branch `eval/qwen-3-coder/commit0-20260209-234025`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

Add commit0 results for qwen-3-coder

d065a72

all-hands-bot requested a review from juanmichelini February 9, 2026 23:40

juanmichelini marked this pull request as draft February 9, 2026 23:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add commit0 results for qwen-3-coder #540

Add commit0 results for qwen-3-coder #540

Uh oh!

all-hands-bot commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

openhands-ai bot commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add commit0 results for qwen-3-coder #540

Are you sure you want to change the base?

Add commit0 results for qwen-3-coder #540

Uh oh!

Conversation

all-hands-bot commented Feb 9, 2026

Evaluation Results

Results

Report Summary

Additional Metadata

Uh oh!

github-actions bot commented Feb 9, 2026

📊 Progress Report

❌ Schema Validation

Uh oh!

openhands-ai bot commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants