Add swe-bench results for glm-4.7 #530

all-hands-bot · 2026-02-09T17:46:07Z

Evaluation Results

Model: glm-4.7
Benchmark: swe-bench
Agent Version: v1.10.0

Results

Accuracy: 73.4%
Total Cost: $255.45
Average Instance Cost: $0.51
Total Duration: 503521s (8392.0m)
Average Instance Runtime: 1007s

Report Summary

Total instances: 500
Submitted instances: 498
Resolved instances: 367
Unresolved instances: 129
Empty patch instances: 0
Error instances: 2

Additional Metadata

completed_instances: 496
schema_version: 2
unstopped_instances: 0

This PR was automatically created by the evaluation pipeline.

github-actions · 2026-02-09T17:46:24Z

📊 Progress Report

============================================================
OpenHands Index Results - Progress Report
============================================================

Target: Complete all model × benchmark pairs
  11 models × 5 benchmarks = 55 pairs
  (each pair requires all 3 metrics: score, cost_per_instance, average_runtime)

============================================================
OVERALL PROGRESS: ⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛ 100.0%
  Complete: 55 / 55 pairs
============================================================

✅ Schema Validation

============================================================
Schema Validation Report
============================================================

Results directory: /home/runner/work/openhands-index-results/openhands-index-results/results
Files validated: 28
  Passed: 28
  Failed: 0

============================================================
VALIDATION PASSED
============================================================

This report measures progress towards the 3D array goal (benchmarks × models × metrics) as described in #2.

juanmichelini

LGTM

Add swe-bench results for glm-4.7

b68fda7

all-hands-bot requested a review from juanmichelini February 9, 2026 17:46

juanmichelini approved these changes Feb 9, 2026

View reviewed changes

Merge branch 'main' into eval/glm-4.7/swe-bench-20260209-174604

05fd2a7

juanmichelini merged commit 949f99f into main Feb 9, 2026
1 check passed

juanmichelini deleted the eval/glm-4.7/swe-bench-20260209-174604 branch February 9, 2026 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add swe-bench results for glm-4.7 #530

Add swe-bench results for glm-4.7 #530

Uh oh!

all-hands-bot commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

juanmichelini left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add swe-bench results for glm-4.7 #530

Add swe-bench results for glm-4.7 #530

Uh oh!

Conversation

all-hands-bot commented Feb 9, 2026

Evaluation Results

Results

Report Summary

Additional Metadata

Uh oh!

github-actions bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Progress Report

✅ Schema Validation

Uh oh!

juanmichelini left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Feb 9, 2026 •

edited

Loading