Add swt-bench results for glm-4.7 #532

all-hands-bot · 2026-02-09T17:50:43Z

Evaluation Results

Model: glm-4.7
Benchmark: swt-bench
Agent Version: v1.10.0

Results

Accuracy: 49.4%
Total Cost: $156.32
Average Instance Cost: $0.37
Total Duration: 314009s (5233.5m)
Average Instance Runtime: 744s

⚠️ REVIEWER NOTE

total_instances (422) does not match expected (433).
Accuracy is calculated using expected_instances (433) as the denominator.

Report Summary

Total instances: 422
Submitted instances: 422
Resolved instances: 214
Unresolved instances: 206
Empty patch instances: 0
Error instances: 2

Additional Metadata

Mean coverage: 0.7815494682208436
Mean coverage delta: 0.6100117958136516
completed_instances: 420
unstopped_instances: 0

This PR was automatically created by the evaluation pipeline.

github-actions · 2026-02-09T17:51:04Z

📊 Progress Report

============================================================
OpenHands Index Results - Progress Report
============================================================

Target: Complete all model × benchmark pairs
  11 models × 5 benchmarks = 55 pairs
  (each pair requires all 3 metrics: score, cost_per_instance, average_runtime)

============================================================
OVERALL PROGRESS: ⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛ 100.0%
  Complete: 55 / 55 pairs
============================================================

✅ Schema Validation

============================================================
Schema Validation Report
============================================================

Results directory: /home/runner/work/openhands-index-results/openhands-index-results/results
Files validated: 28
  Passed: 28
  Failed: 0

============================================================
VALIDATION PASSED
============================================================

This report measures progress towards the 3D array goal (benchmarks × models × metrics) as described in #2.

Add swt-bench results for glm-4.7

e891320

all-hands-bot requested a review from juanmichelini February 9, 2026 17:50

juanmichelini approved these changes Feb 9, 2026

View reviewed changes

Merge branch 'main' into eval/glm-4.7/swt-bench-20260209-175040

06b2096

juanmichelini merged commit ad75f38 into main Feb 9, 2026
1 check passed

juanmichelini deleted the eval/glm-4.7/swt-bench-20260209-175040 branch February 9, 2026 19:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add swt-bench results for glm-4.7 #532

Add swt-bench results for glm-4.7 #532

Uh oh!

all-hands-bot commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add swt-bench results for glm-4.7 #532

Add swt-bench results for glm-4.7 #532

Uh oh!

Conversation

all-hands-bot commented Feb 9, 2026

Evaluation Results

Results

⚠️ REVIEWER NOTE

Report Summary

Additional Metadata

Uh oh!

github-actions bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Progress Report

✅ Schema Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Feb 9, 2026 •

edited

Loading