Release Speculators v0.1.0 -- First Public Release · vllm-project/speculators

Overview

This first public release publishes the complete initial codebase for Speculators — a unified library for building, evaluating, converting, and serving speculative decoding algorithms for LLMs. It delivers the core framework, CI/CD and developer workflow, model/config implementations (EAGLE v1/HASS/EAGLE‑3), converter CLIs from external research repos, a Hugging Face–compatible model format with vLLM serving support, and prototype training code.

What’s New (Highlights)

Unified, extensible framework for speculator models (build, evaluate, convert, store)
Hugging Face–compatible speculator format with serving support landed in vLLM
Models/configs for EAGLE v1 (HASS-style), HASS, and EAGLE‑3 (multi-layer types)
Checkpoint converter CLIs (Eagle, Eagle‑3) from external research repositories
Prototype training code and scripts (EAGLE‑1-style drafter, HASS) + requirements
Production readiness: CI/CD, tests, style, docs, examples, and benchmarks

Use Cases Enabled

Register and configure new speculator algorithms via a standardized configuration and registry system
Convert external checkpoints (EAGLE/EAGLE‑3/HASS variants) into the Speculators format with CLI tools
Serve Speculators models directly in vLLM for low‑latency inference
Evaluate and benchmark speculators (e.g., with GuideLLM), including quantized verifier swaps
Prototype‑train drafters using provided research code and scripts

Getting Started

Install (Python 3.9–3.13 on Linux or macOS):

pip install git+https://github.com/neuralmagic/speculators.git

Serve with vLLM (requires v1 API):

VLLM_USE_V1=1 vllm serve RedHatAI/Qwen3-8B-speculator.eagle3

Explore examples and research: examples/, research/eagle3/, research/hass/

Compatibility Notes

Python: 3.9–3.13
OS: Linux and macOS
Transformers pinned to avoid mypy regressions (PR #73)
vLLM v1 API required for serving (set VLLM_USE_V1=1)

Full Changelog (v0.1.0)

First public release of Speculators. This release publishes the complete initial codebase and enables the first set of core use cases for speculative decoding with LLMs.

Added

Base configuration and registry system with tests: Speculator, Token Proposal, and Model Speculator configs; EagleSpeculatorConfig for EAGLE v1/HASS; config serialization/loading (PRs #26, #27, #28, #29, #34, #36)
Eagle speculator model and support for multiple transformer layer types (PRs #37, #49)
Eagle‑3 speculator model and Qwen support (PRs #50, #55)
Checkpoint converter CLIs: Eagle and Eagle‑3; standardized converter interface (PRs #39, #53, #72)
vLLM serving documentation and Qwen benchmark assets (PRs #77, #78, #82, #83)
Examples directory and README for getting started (PR #81)
Branding assets (icons, logos, user‑flow diagrams) (PR #87)

Changed

Standardized converter CLI UX and flags (PR #72)
Documentation/readme formatting and content updates (PRs #70, #75, #83, #85)

Fixed

Missing embeddings in converted checkpoints/workflows (PR #65)
CLI flags and norm_before_residual toggle (PRs #57, #58)
Compatibility: pin transformers to resolve mypy/typing regressions (PR #73)

CI/CD and Tooling

GitHub Actions: migrated link checks to lychee and updated workflows (PRs #3, #45)
PR comment behavior refinements (PR #47)

Research and Training

Training code for EAGLE‑1‑style drafter with multi‑step training (PR #35)
HASS/EAGLE‑3 research updates, requirements, and DeepSpeed dependency (PRs #64, #67, #69)

Documentation

vLLM serving instructions, Qwen benchmark results, examples README, and research readmes (PRs #64, #70, #77, #78, #81, #83, #85)

New Contributors

@fynnsu made their first contribution in PR #47
@shanjiaz made their first contribution in PR #53
@MeganEFlynn made their first contribution in PR #55

Thanks also to continuing contributors: @markurtz, @rahul-tuli, @dsikka

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speculators v0.1.0 -- First Public Release