Overview
This first public release publishes the complete initial codebase for Speculators — a unified library for building, evaluating, converting, and serving speculative decoding algorithms for LLMs. It delivers the core framework, CI/CD and developer workflow, model/config implementations (EAGLE v1/HASS/EAGLE‑3), converter CLIs from external research repos, a Hugging Face–compatible model format with vLLM serving support, and prototype training code.
What’s New (Highlights)
- Unified, extensible framework for speculator models (build, evaluate, convert, store)
- Hugging Face–compatible speculator format with serving support landed in vLLM
- Models/configs for EAGLE v1 (HASS-style), HASS, and EAGLE‑3 (multi-layer types)
- Checkpoint converter CLIs (Eagle, Eagle‑3) from external research repositories
- Prototype training code and scripts (EAGLE‑1-style drafter, HASS) + requirements
- Production readiness: CI/CD, tests, style, docs, examples, and benchmarks
Use Cases Enabled
- Register and configure new speculator algorithms via a standardized configuration and registry system
- Convert external checkpoints (EAGLE/EAGLE‑3/HASS variants) into the Speculators format with CLI tools
- Serve Speculators models directly in vLLM for low‑latency inference
- Evaluate and benchmark speculators (e.g., with GuideLLM), including quantized verifier swaps
- Prototype‑train drafters using provided research code and scripts
Getting Started
- Install (Python 3.9–3.13 on Linux or macOS):
pip install git+https://github.com/neuralmagic/speculators.git
- Serve with vLLM (requires v1 API):
VLLM_USE_V1=1 vllm serve RedHatAI/Qwen3-8B-speculator.eagle3
- Explore examples and research:
examples/
,research/eagle3/
,research/hass/
Compatibility Notes
- Python: 3.9–3.13
- OS: Linux and macOS
- Transformers pinned to avoid mypy regressions (PR #73)
- vLLM v1 API required for serving (set
VLLM_USE_V1=1
)
Full Changelog (v0.1.0)
First public release of Speculators. This release publishes the complete initial codebase and enables the first set of core use cases for speculative decoding with LLMs.
Added
- Base configuration and registry system with tests: Speculator, Token Proposal, and Model Speculator configs;
EagleSpeculatorConfig
for EAGLE v1/HASS; config serialization/loading (PRs #26, #27, #28, #29, #34, #36) - Eagle speculator model and support for multiple transformer layer types (PRs #37, #49)
- Eagle‑3 speculator model and Qwen support (PRs #50, #55)
- Checkpoint converter CLIs: Eagle and Eagle‑3; standardized converter interface (PRs #39, #53, #72)
- vLLM serving documentation and Qwen benchmark assets (PRs #77, #78, #82, #83)
- Examples directory and README for getting started (PR #81)
- Branding assets (icons, logos, user‑flow diagrams) (PR #87)
Changed
- Standardized converter CLI UX and flags (PR #72)
- Documentation/readme formatting and content updates (PRs #70, #75, #83, #85)
Fixed
- Missing embeddings in converted checkpoints/workflows (PR #65)
- CLI flags and
norm_before_residual
toggle (PRs #57, #58) - Compatibility: pin
transformers
to resolve mypy/typing regressions (PR #73)
CI/CD and Tooling
- GitHub Actions: migrated link checks to lychee and updated workflows (PRs #3, #45)
- PR comment behavior refinements (PR #47)
Research and Training
- Training code for EAGLE‑1‑style drafter with multi‑step training (PR #35)
- HASS/EAGLE‑3 research updates, requirements, and DeepSpeed dependency (PRs #64, #67, #69)
Documentation
- vLLM serving instructions, Qwen benchmark results, examples README, and research readmes (PRs #64, #70, #77, #78, #81, #83, #85)
New Contributors
- @fynnsu made their first contribution in PR #47
- @shanjiaz made their first contribution in PR #53
- @MeganEFlynn made their first contribution in PR #55
Thanks also to continuing contributors: @markurtz, @rahul-tuli, @dsikka
Links
- Compare changes: v0.0.1...v0.1.0