awesome-open-source-llms/details/qwen25-coder.md at master · ever-works/awesome-open-source-llms

Overview

Qwen2.5-Coder is Alibaba Cloud's family of code-specialized large language models, available in six sizes from 0.5B to 32B parameters. The 32B-Instruct variant has become the current state-of-the-art open-source code model, matching the coding capabilities of GPT-4o.

Model Variants

Available Sizes

Qwen2.5-Coder-0.5B: Smallest, efficient for edge deployment
Qwen2.5-Coder-1.5B: Balanced efficiency and capability
Qwen2.5-Coder-3B: Mid-range performance
Qwen2.5-Coder-7B: Strong general-purpose coding
Qwen2.5-Coder-14B: Enhanced reasoning capabilities
Qwen2.5-Coder-32B: Flagship model with SOTA performance

Each size available in Base and Instruct variants.

Architecture

Qwen2.5-Coder-32B Specifications

Parameters: ~32.5 billion
Transformer Layers: 64
Attention Heads: 40 (query), 8 (key-value)
Attention Mechanism: Grouped Query Attention (GQA)
Vocabulary Size: 151,646 tokens
Position Encoding: Rotary Position Embeddings (RoPE)
Activation: SwiGLU
Normalization: Root Mean Square Layer Normalization (RMSNorm)
QKV Bias: Attention QKV bias enabled

Training Scale

Training Tokens: 5.5 trillion tokens of code-related data
Programming Languages: 92 languages supported

Benchmark Performance

Code Completion (SOTA on 5 benchmarks)

HumanEval-Infilling: State-of-the-art
CrossCodeEval: State-of-the-art
CrossCodeLongEval: State-of-the-art
RepoEval: State-of-the-art
SAFIM: State-of-the-art

Code Generation

HumanEval: Multiple benchmarks show strong performance
- Qwen2.5-7B-Instruct: 84.8% pass@1
- Through 4-stage filtering: Average improvement from 41.6% to 46.8%
MBPP: Competitive scores across variants
EvalPlus: Best performance among open-source models
LiveCodeBench: Top rankings
BigCodeBench: Leading scores

Multi-Language Code Repair

MdEval: 75.2 (Qwen2.5-Coder-32B-Instruct) - Ranked 1st among open-source
McEval: 65.9 - Strong multi-language reasoning

Real-World Performance

SWE-Bench Verified: 73.7% (ranks 4th overall, behind only Claude 3.5 Sonnet and similar top models)
Qwen3-Coder: 69.6% SWE-Bench Verified (surpasses Claude and GPT-4)
Aider Code Editing: 73.7% (4th place)

Key Capabilities

Code Understanding

Code review and analysis
Bug detection and identification
Code explanation and documentation
Repository-level understanding

Code Generation

Function and class generation from descriptions
Multi-language code translation
Test case generation
Code completion and infilling

Code Reasoning

Algorithm design
Mathematical problem-solving through code
Complex logic implementation
Performance optimization suggestions

Supported Programming Languages

92 programming languages including:

Python, Java, JavaScript, TypeScript
C, C++, C#, Go, Rust
PHP, Ruby, Swift, Kotlin
SQL, Shell, PowerShell
And 76+ more languages

Training Methodology

4-Stage Filtering Process

Qwen2.5-Coder uses advanced data filtering:

Quality assessment and filtering
Diversity enhancement
Difficulty balancing
Length optimization

Result: Average HumanEval/MBPP scores increased from 41.6% to 46.8%

Training Data Composition

High-quality code repositories
Programming documentation
StackOverflow and developer forums
Technical tutorials and guides
Code competitions and challenges

Use Cases

Software Development: Autocomplete, code generation, refactoring
Code Review: Automated review and suggestion systems
Education: Programming tutors and learning assistants
DevOps: Script generation and automation
Documentation: Auto-generating code documentation
Testing: Test case generation and quality assurance

Deployment Options

Cloud Platforms

Alibaba Cloud ModelScope
Hugging Face Hub
NVIDIA NIM (Qwen2.5-Coder-32B-Instruct, 7B-Instruct)

Frameworks

vLLM for high-throughput serving
Text Generation Inference (TGI)
llama.cpp for CPU inference
Ollama for local deployment

Hardware Requirements

32B models: Multi-GPU or high-memory GPUs recommended
7B models: Single high-end GPU
Smaller variants: Consumer GPUs and edge devices

Performance Comparison

vs. GPT-4o

Qwen2.5-Coder-32B-Instruct matches GPT-4o on coding tasks

vs. Other Open-Source Models

Outperforms CodeLlama across most benchmarks
Superior to StarCoder2 on multiple tasks
Leads DeepSeek-Coder on several metrics

Integration Examples

Compatible with:

VS Code extensions
JetBrains IDEs
GitHub Copilot alternatives
Custom IDE integrations

Licensing

Apache 2.0 License - permissive for research and commercial use.

Pricing

Free and open source.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overview

Model Variants

Available Sizes

Architecture

Qwen2.5-Coder-32B Specifications

Training Scale

Benchmark Performance

Code Completion (SOTA on 5 benchmarks)

Code Generation

Multi-Language Code Repair

Real-World Performance

Key Capabilities

Code Understanding

Code Generation

Code Reasoning

Supported Programming Languages

Training Methodology

4-Stage Filtering Process

Training Data Composition

Use Cases

Deployment Options

Cloud Platforms

Frameworks

Hardware Requirements

Performance Comparison

vs. GPT-4o

vs. Other Open-Source Models

Integration Examples

Licensing

Pricing

FilesExpand file tree

qwen25-coder.md

Latest commit

History

qwen25-coder.md

File metadata and controls

Overview

Model Variants

Available Sizes

Architecture

Qwen2.5-Coder-32B Specifications

Training Scale

Benchmark Performance

Code Completion (SOTA on 5 benchmarks)

Code Generation

Multi-Language Code Repair

Real-World Performance

Key Capabilities

Code Understanding

Code Generation

Code Reasoning

Supported Programming Languages

Training Methodology

4-Stage Filtering Process

Training Data Composition

Use Cases

Deployment Options

Cloud Platforms

Frameworks

Hardware Requirements

Performance Comparison

vs. GPT-4o

vs. Other Open-Source Models

Integration Examples

Licensing

Pricing