Add swe-bench results for qwen3-coder-next #539

all-hands-bot · 2026-02-09T23:40:07Z

Evaluation Results

Model: qwen3-coder-next
Benchmark: swe-bench
Agent Version: v1.11.1

Results

Accuracy: 66.6%
Total Cost: $680.26
Average Instance Cost: $1.36
Total Duration: 722734s (12045.6m)
Average Instance Runtime: 1445s

Report Summary

Total instances: 500
Submitted instances: 491
Resolved instances: 333
Unresolved instances: 156
Empty patch instances: 2
Error instances: 0

Additional Metadata

completed_instances: 489
schema_version: 2
unstopped_instances: 0

This PR was automatically created by the evaluation pipeline.

github-actions · 2026-02-09T23:40:37Z

📊 Progress Report

============================================================
OpenHands Index Results - Progress Report
============================================================

Target: Complete all model × benchmark pairs
  11 models × 5 benchmarks = 55 pairs
  (each pair requires all 3 metrics: score, cost_per_instance, average_runtime)

============================================================
OVERALL PROGRESS: ⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛ 100.0%
  Complete: 55 / 55 pairs
============================================================

❌ Schema Validation

============================================================
Schema Validation Report
============================================================

Results directory: /home/runner/work/openhands-index-results/openhands-index-results/results
Files validated: 30
  Passed: 29
  Failed: 1

Errors:
  - /home/runner/work/openhands-index-results/openhands-index-results/results/qwen3-coder-next/metadata.json:   • Field 'model': Input should be 'claude-4.6-opus', 'claude-4.5-opus', 'claude-4.5-sonnet', 'gemini-3-pro', 'gemini-3-flash', 'glm-4.7', 'gpt-5.2', 'gpt-5.2-codex', 'kimi-k2-thinking', 'kimi-k2.5', 'minimax-m2.1', 'deepseek-v3.2-reasoner', 'qwen-3-coder' or 'nemotron-3-nano' (got: 'qwen3-coder-next')

============================================================
VALIDATION FAILED
============================================================

This report measures progress towards the 3D array goal (benchmarks × models × metrics) as described in #2.

juanmichelini · 2026-02-09T23:51:11Z

@OpenHands search the web and add fields for "country", "release_date", "parameter_count_b", "active_parameter_count_b" show your work add a comment in the PR with links proving each of the fields.

openhands-ai · 2026-02-09T23:51:19Z

I'm on it! juanmichelini can track my progress at all-hands.dev

…b to qwen3-coder-next metadata - country: cn (Alibaba is a Chinese company) - release_date: 2026-02-02 (from official Qwen blog) - parameter_count_b: 80 (80B total parameters) - active_parameter_count_b: 3 (3B active parameters per token) Co-authored-by: openhands <[email protected]>

juanmichelini · 2026-02-09T23:52:38Z

Sources for Added Metadata Fields

I have added the following fields to qwen3-coder-next/metadata.json:

Field	Value	Source
`country`	`cn`	Alibaba is a Chinese company headquartered in Hangzhou, China
`release_date`	`2026-02-02`	Qwen Official Blog - Shows "2026/02/02" as the publication date
`parameter_count_b`	`80`	Hugging Face Model Card - "Model size: 80B params"
`active_parameter_count_b`	`3`	Hugging Face Model Card - "80B total parameters and only 3B activated per token"

Additional Supporting Sources:

NewsBreak Article - "Released February 3 under an Apache 2.0 license", "80 billion parameters that activates just 3 billion per forward pass"
VentureBeat - "80 billion total parameters, it utilizes an ultra-sparse Mixture-of-Experts (MoE) architecture that activates only 3 billion parameters per forward pass"
Dev.to Guide - "Total Parameters: 80B, Activated Parameters: 3B (per inference)"

openhands-ai · 2026-02-09T23:52:56Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Measure Progress

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #539 at branch `eval/qwen3-coder-next/swe-bench-20260209-234004`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

openhands-ai · 2026-02-09T23:53:11Z

Summary

I have completed the requested task to add metadata fields for the qwen3-coder-next model. Here's what was done:

Changes Made

Updated results/qwen3-coder-next/metadata.json with the following fields:

Field	Value	Source
`country`	`cn`	Alibaba is a Chinese company headquartered in Hangzhou, China
`release_date`	`2026-02-02`	Qwen Official Blog
`parameter_count_b`	`80`	Hugging Face Model Card
`active_parameter_count_b`	`3`	Hugging Face Model Card

Checklist

Added country field
Added release_date field
Added parameter_count_b field
Added active_parameter_count_b field
Committed and pushed changes to the PR branch
Added a comment on PR Add swe-bench results for qwen3-coder-next #539 with links proving each field's source

PR Comment

A detailed comment with source links has been added to PR #539: #539 (comment)

The comment includes:

Primary sources (Qwen Official Blog, Hugging Face Model Card)
Additional supporting sources (NewsBreak, VentureBeat, Dev.to)

View full conversation

juanmichelini

LGTM

juanmichelini · 2026-02-10T09:33:15Z

@OpenHands fix the validation errors!

openhands-ai · 2026-02-10T09:33:25Z

I'm on it! juanmichelini can track my progress at all-hands.dev

juanmichelini · 2026-02-10T11:15:03Z

https://github.com/OpenHands fix the validation errors!

Add swe-bench results for qwen3-coder-next

0b7d003

all-hands-bot requested a review from juanmichelini February 9, 2026 23:40

juanmichelini approved these changes Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add swe-bench results for qwen3-coder-next #539

Add swe-bench results for qwen3-coder-next #539

all-hands-bot commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

juanmichelini commented Feb 9, 2026

Uh oh!

openhands-ai bot commented Feb 9, 2026

Uh oh!

juanmichelini commented Feb 9, 2026

Uh oh!

openhands-ai bot commented Feb 9, 2026

Uh oh!

openhands-ai bot commented Feb 9, 2026

Uh oh!

juanmichelini left a comment

Uh oh!

juanmichelini commented Feb 10, 2026

Uh oh!

openhands-ai bot commented Feb 10, 2026

Uh oh!

juanmichelini commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add swe-bench results for qwen3-coder-next #539

Are you sure you want to change the base?

Add swe-bench results for qwen3-coder-next #539

Conversation

all-hands-bot commented Feb 9, 2026

Evaluation Results

Results

Report Summary

Additional Metadata

Uh oh!

github-actions bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Progress Report

❌ Schema Validation

Uh oh!

juanmichelini commented Feb 9, 2026

Uh oh!

openhands-ai bot commented Feb 9, 2026

Uh oh!

juanmichelini commented Feb 9, 2026

Sources for Added Metadata Fields

Additional Supporting Sources:

Uh oh!

openhands-ai bot commented Feb 9, 2026

Uh oh!

openhands-ai bot commented Feb 9, 2026

Summary

Changes Made

Checklist

PR Comment

Uh oh!

juanmichelini left a comment

Choose a reason for hiding this comment

Uh oh!

juanmichelini commented Feb 10, 2026

Uh oh!

openhands-ai bot commented Feb 10, 2026

Uh oh!

juanmichelini commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Feb 9, 2026 •

edited

Loading