Skip to content

Speculators v0.1.0 -- First Public Release

Latest
Compare
Choose a tag to compare
@markurtz markurtz released this 08 Aug 01:45
· 7 commits to main since this release
8a49095

Overview

This first public release publishes the complete initial codebase for Speculators — a unified library for building, evaluating, converting, and serving speculative decoding algorithms for LLMs. It delivers the core framework, CI/CD and developer workflow, model/config implementations (EAGLE v1/HASS/EAGLE‑3), converter CLIs from external research repos, a Hugging Face–compatible model format with vLLM serving support, and prototype training code.

What’s New (Highlights)

  • Unified, extensible framework for speculator models (build, evaluate, convert, store)
  • Hugging Face–compatible speculator format with serving support landed in vLLM
  • Models/configs for EAGLE v1 (HASS-style), HASS, and EAGLE‑3 (multi-layer types)
  • Checkpoint converter CLIs (Eagle, Eagle‑3) from external research repositories
  • Prototype training code and scripts (EAGLE‑1-style drafter, HASS) + requirements
  • Production readiness: CI/CD, tests, style, docs, examples, and benchmarks

Use Cases Enabled

  • Register and configure new speculator algorithms via a standardized configuration and registry system
  • Convert external checkpoints (EAGLE/EAGLE‑3/HASS variants) into the Speculators format with CLI tools
  • Serve Speculators models directly in vLLM for low‑latency inference
  • Evaluate and benchmark speculators (e.g., with GuideLLM), including quantized verifier swaps
  • Prototype‑train drafters using provided research code and scripts

Getting Started

  • Install (Python 3.9–3.13 on Linux or macOS):
    pip install git+https://github.com/neuralmagic/speculators.git
  • Serve with vLLM (requires v1 API):
    VLLM_USE_V1=1 vllm serve RedHatAI/Qwen3-8B-speculator.eagle3
  • Explore examples and research: examples/, research/eagle3/, research/hass/

Compatibility Notes

  • Python: 3.9–3.13
  • OS: Linux and macOS
  • Transformers pinned to avoid mypy regressions (PR #73)
  • vLLM v1 API required for serving (set VLLM_USE_V1=1)

Full Changelog (v0.1.0)

First public release of Speculators. This release publishes the complete initial codebase and enables the first set of core use cases for speculative decoding with LLMs.

Added

  • Base configuration and registry system with tests: Speculator, Token Proposal, and Model Speculator configs; EagleSpeculatorConfig for EAGLE v1/HASS; config serialization/loading (PRs #26, #27, #28, #29, #34, #36)
  • Eagle speculator model and support for multiple transformer layer types (PRs #37, #49)
  • Eagle‑3 speculator model and Qwen support (PRs #50, #55)
  • Checkpoint converter CLIs: Eagle and Eagle‑3; standardized converter interface (PRs #39, #53, #72)
  • vLLM serving documentation and Qwen benchmark assets (PRs #77, #78, #82, #83)
  • Examples directory and README for getting started (PR #81)
  • Branding assets (icons, logos, user‑flow diagrams) (PR #87)

Changed

  • Standardized converter CLI UX and flags (PR #72)
  • Documentation/readme formatting and content updates (PRs #70, #75, #83, #85)

Fixed

  • Missing embeddings in converted checkpoints/workflows (PR #65)
  • CLI flags and norm_before_residual toggle (PRs #57, #58)
  • Compatibility: pin transformers to resolve mypy/typing regressions (PR #73)

CI/CD and Tooling

  • GitHub Actions: migrated link checks to lychee and updated workflows (PRs #3, #45)
  • PR comment behavior refinements (PR #47)

Research and Training

  • Training code for EAGLE‑1‑style drafter with multi‑step training (PR #35)
  • HASS/EAGLE‑3 research updates, requirements, and DeepSpeed dependency (PRs #64, #67, #69)

Documentation

  • vLLM serving instructions, Qwen benchmark results, examples README, and research readmes (PRs #64, #70, #77, #78, #81, #83, #85)

New Contributors

Thanks also to continuing contributors: @markurtz, @rahul-tuli, @dsikka

Links