Skip to content

Conversation

@all-hands-bot
Copy link
Collaborator

Evaluation Results

Model: qwen-3-coder
Benchmark: commit0
Agent Version: v1.10.0

Results

  • Accuracy: 6.2%
  • Total Cost: $0.00
  • Average Instance Cost: $0.00
  • Total Duration: 16536s (275.6m)
  • Average Instance Runtime: 1033s

Report Summary

  • Total instances: 16
  • Submitted instances: 16
  • Resolved instances: 1
  • Unresolved instances: 15
  • Empty patch instances: 0
  • Error instances: 0

Additional Metadata

  • completed_instances: 16
  • total_passed_tests: 1510
  • total_tests: 2250

This PR was automatically created by the evaluation pipeline.

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

📊 Progress Report

============================================================
OpenHands Index Results - Progress Report
============================================================

Target: Complete all model × benchmark pairs
  11 models × 5 benchmarks = 55 pairs
  (each pair requires all 3 metrics: score, cost_per_instance, average_runtime)

============================================================
OVERALL PROGRESS: ⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛ 100.0%
  Complete: 55 / 55 pairs
============================================================

❌ Schema Validation

============================================================
Schema Validation Report
============================================================

Results directory: /home/runner/work/openhands-index-results/openhands-index-results/results
Files validated: 28
  Passed: 27
  Failed: 1

Errors:
  - /home/runner/work/openhands-index-results/openhands-index-results/results/qwen-3-coder/scores.json: Entry 2:
  • Field 'cost_per_instance': Input should be greater than 0 (got: 0.0)

============================================================
VALIDATION FAILED
============================================================

This report measures progress towards the 3D array goal (benchmarks × models × metrics) as described in #2.

@openhands-ai
Copy link

openhands-ai bot commented Feb 9, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Measure Progress

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #540 at branch `eval/qwen-3-coder/commit0-20260209-234025`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@juanmichelini juanmichelini marked this pull request as draft February 9, 2026 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants