Skip to content

Commit 0f426a8

Browse files
Add swe-bench results for claude-4.6-opus (#529)
* Add swe-bench results for claude-4.6-opus * Trigger CI workflow Co-authored-by: openhands <[email protected]> * Fix: Add swe-bench results to existing claude-4.6-opus directory - Add swe-bench entry to results/claude-4.6-opus/scores.json - Remove invalid v1.11.0_claude-4.6-opus directory that didn't match schema The schema requires directory_name to match the model name, so results should be added to the existing claude-4.6-opus directory. Co-authored-by: openhands <[email protected]> --------- Co-authored-by: openhands <[email protected]>
1 parent a8b914b commit 0f426a8

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

results/claude-4.6-opus/scores.json

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,18 @@
11
[
2+
{
3+
"benchmark": "swe-bench",
4+
"score": 74.8,
5+
"metric": "accuracy",
6+
"cost_per_instance": 0.56,
7+
"average_runtime": 178.0,
8+
"full_archive": "https://results.eval.all-hands.dev/swebench/litellm_proxy-anthropic-claude-opus-4-6/21809491582/results.tar.gz",
9+
"tags": [
10+
"swe-bench"
11+
],
12+
"agent_version": "v1.11.0",
13+
"submission_time": "2026-02-09T07:37:23+00:00",
14+
"eval_visualization_page": "https://laminar.sh/shared/evals/f2d3e96a-5265-44f5-88b2-434d0a81b893"
15+
},
216
{
317
"benchmark": "swt-bench",
418
"score": 78.8,

0 commit comments

Comments
 (0)