Qwen2.5-Coder is Alibaba Cloud's family of code-specialized large language models, available in six sizes from 0.5B to 32B parameters. The 32B-Instruct variant has become the current state-of-the-art open-source code model, matching the coding capabilities of GPT-4o.
- Qwen2.5-Coder-0.5B: Smallest, efficient for edge deployment
- Qwen2.5-Coder-1.5B: Balanced efficiency and capability
- Qwen2.5-Coder-3B: Mid-range performance
- Qwen2.5-Coder-7B: Strong general-purpose coding
- Qwen2.5-Coder-14B: Enhanced reasoning capabilities
- Qwen2.5-Coder-32B: Flagship model with SOTA performance
Each size available in Base and Instruct variants.
- Parameters: ~32.5 billion
- Transformer Layers: 64
- Attention Heads: 40 (query), 8 (key-value)
- Attention Mechanism: Grouped Query Attention (GQA)
- Vocabulary Size: 151,646 tokens
- Position Encoding: Rotary Position Embeddings (RoPE)
- Activation: SwiGLU
- Normalization: Root Mean Square Layer Normalization (RMSNorm)
- QKV Bias: Attention QKV bias enabled
- Training Tokens: 5.5 trillion tokens of code-related data
- Programming Languages: 92 languages supported
- HumanEval-Infilling: State-of-the-art
- CrossCodeEval: State-of-the-art
- CrossCodeLongEval: State-of-the-art
- RepoEval: State-of-the-art
- SAFIM: State-of-the-art
- HumanEval: Multiple benchmarks show strong performance
- Qwen2.5-7B-Instruct: 84.8% pass@1
- Through 4-stage filtering: Average improvement from 41.6% to 46.8%
- MBPP: Competitive scores across variants
- EvalPlus: Best performance among open-source models
- LiveCodeBench: Top rankings
- BigCodeBench: Leading scores
- MdEval: 75.2 (Qwen2.5-Coder-32B-Instruct) - Ranked 1st among open-source
- McEval: 65.9 - Strong multi-language reasoning
- SWE-Bench Verified: 73.7% (ranks 4th overall, behind only Claude 3.5 Sonnet and similar top models)
- Qwen3-Coder: 69.6% SWE-Bench Verified (surpasses Claude and GPT-4)
- Aider Code Editing: 73.7% (4th place)
- Code review and analysis
- Bug detection and identification
- Code explanation and documentation
- Repository-level understanding
- Function and class generation from descriptions
- Multi-language code translation
- Test case generation
- Code completion and infilling
- Algorithm design
- Mathematical problem-solving through code
- Complex logic implementation
- Performance optimization suggestions
92 programming languages including:
- Python, Java, JavaScript, TypeScript
- C, C++, C#, Go, Rust
- PHP, Ruby, Swift, Kotlin
- SQL, Shell, PowerShell
- And 76+ more languages
Qwen2.5-Coder uses advanced data filtering:
- Quality assessment and filtering
- Diversity enhancement
- Difficulty balancing
- Length optimization
Result: Average HumanEval/MBPP scores increased from 41.6% to 46.8%
- High-quality code repositories
- Programming documentation
- StackOverflow and developer forums
- Technical tutorials and guides
- Code competitions and challenges
- Software Development: Autocomplete, code generation, refactoring
- Code Review: Automated review and suggestion systems
- Education: Programming tutors and learning assistants
- DevOps: Script generation and automation
- Documentation: Auto-generating code documentation
- Testing: Test case generation and quality assurance
- Alibaba Cloud ModelScope
- Hugging Face Hub
- NVIDIA NIM (Qwen2.5-Coder-32B-Instruct, 7B-Instruct)
- vLLM for high-throughput serving
- Text Generation Inference (TGI)
- llama.cpp for CPU inference
- Ollama for local deployment
- 32B models: Multi-GPU or high-memory GPUs recommended
- 7B models: Single high-end GPU
- Smaller variants: Consumer GPUs and edge devices
- Qwen2.5-Coder-32B-Instruct matches GPT-4o on coding tasks
- Outperforms CodeLlama across most benchmarks
- Superior to StarCoder2 on multiple tasks
- Leads DeepSeek-Coder on several metrics
Compatible with:
- VS Code extensions
- JetBrains IDEs
- GitHub Copilot alternatives
- Custom IDE integrations
Apache 2.0 License - permissive for research and commercial use.
Free and open source.