Update ReadME feature content #109

dsikka · 2025-09-01T20:33:48Z

Updates / extends to include the following pieces:

Supported features
Benefits of Speculators
Summary table of supported models / models that will be supported
A point on future features (e.g. training)

github-actions · 2025-09-01T20:36:10Z

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/vllm-project/speculators/actions/runs/17409864819/artifacts/3908277154.
They will be retained for up to 30 days.
Commit: 593aff8

rahul-tuli · 2025-09-02T16:30:31Z

README.md

@@ -12,7 +12,7 @@

 ## Overview

-**Speculators** is a unified library for building, evaluating, and storing speculative decoding algorithms for large language model (LLM) inference, including in frameworks like vLLM. Speculative decoding is a lossless technique that speeds up LLM inference by using a smaller, faster speculator model to propose tokens, which are then verified by the larger base model, reducing latency without compromising output quality. Speculators standardizes this process with reusable formats and tools, enabling easier integration and deployment of speculative decoding in production-grade inference servers.
+**Speculators** is a unified library for building, evaluating, and storing speculative decoding algorithms for large language model (LLM) inference, including in frameworks like vLLM. Speculative decoding is a lossless technique that speeds up LLM inference by using a smaller, faster speculator model to propose tokens, which are then verified by the larger base model, reducing latency without compromising output quality. Speculators intelligently draft multiple tokens ahead of time, and the main model verifies them in a single step. This approach boosts performance without sacrificing output quality, as every accepted token is guaranteed to match what the main model would have generated on its own. Speculators standardizes this process with reusable formats and tools, enabling easier integration and deployment of speculative decoding in production-grade inference servers.


Suggested change

**Speculators** is a unified library for building, evaluating, and storing speculative decoding algorithms for large language model (LLM) inference, including in frameworks like vLLM. Speculative decoding is a lossless technique that speeds up LLM inference by using a smaller, faster speculator model to propose tokens, which are then verified by the larger base model, reducing latency without compromising output quality. Speculators intelligently draft multiple tokens ahead of time, and the main model verifies them in a single step. This approach boosts performance without sacrificing output quality, as every accepted token is guaranteed to match what the main model would have generated on its own. Speculators standardizes this process with reusable formats and tools, enabling easier integration and deployment of speculative decoding in production-grade inference servers.

**Speculators** is a unified library for building, evaluating, and storing speculative decoding algorithms for large language model (LLM) inference, including in frameworks like vLLM. Speculative decoding is a lossless technique that speeds up LLM inference by using a smaller, faster speculator model to propose tokens, which are then verified by the larger base model, reducing latency without compromising output quality. Speculators intelligently draft multiple tokens ahead of time, and the main model verifies them in a single forward pass. This approach boosts performance without sacrificing output quality, as every accepted token is guaranteed to match what the main model would have generated on its own. Speculators standardizes this process with reusable formats and tools, enabling easier integration and deployment of speculative decoding in production-grade inference servers.

We've used Speculators (the repo) and Speculators (The draft models) both in this sentence, do you feel it might be confusing?

rahul-tuli · 2025-09-02T16:31:36Z

README.md

@@ -12,7 +12,7 @@

 ## Overview

-**Speculators** is a unified library for building, evaluating, and storing speculative decoding algorithms for large language model (LLM) inference, including in frameworks like vLLM. Speculative decoding is a lossless technique that speeds up LLM inference by using a smaller, faster speculator model to propose tokens, which are then verified by the larger base model, reducing latency without compromising output quality. Speculators standardizes this process with reusable formats and tools, enabling easier integration and deployment of speculative decoding in production-grade inference servers.
+**Speculators** is a unified library for building, evaluating, and storing speculative decoding algorithms for large language model (LLM) inference, including in frameworks like vLLM. Speculative decoding is a lossless technique that speeds up LLM inference by using a smaller, faster speculator model to propose tokens, which are then verified by the larger base model, reducing latency without compromising output quality. Speculators intelligently draft multiple tokens ahead of time, and the main model verifies them in a single step. This approach boosts performance without sacrificing output quality, as every accepted token is guaranteed to match what the main model would have generated on its own. Speculators standardizes this process with reusable formats and tools, enabling easier integration and deployment of speculative decoding in production-grade inference servers.


We've used Speculators (the repo) and Speculators (The draft models) both in this sentence, do you feel it might be confusing?

rahul-tuli · 2025-09-02T16:34:28Z

README.md

+For development with additional tools:
+
+```bash
+pip install -e .[dev]


Suggested change

pip install -e .[dev]

pip install -e ".[dev]"

For supporting zsh as well

dsikka added 2 commits September 1, 2025 20:32

update readme

43a5e92

update

7cab032

move

18c0e14

dsikka changed the title ~~Update ReadMe~~ Update ReadME feature content Sep 1, 2025

dsikka requested review from markurtz, anmarques, rahul-tuli and shanjiaz September 2, 2025 13:52

Merge branch 'main' into update-readme

593aff8

rahul-tuli reviewed Sep 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update ReadME feature content #109

Update ReadME feature content #109

Uh oh!

dsikka commented Sep 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 1, 2025 •

edited

Loading

Uh oh!

rahul-tuli Sep 2, 2025

Uh oh!

rahul-tuli Sep 2, 2025

Uh oh!

rahul-tuli Sep 2, 2025

Uh oh!

rahul-tuli Sep 2, 2025

Uh oh!

Uh oh!

Update ReadME feature content #109

Are you sure you want to change the base?

Update ReadME feature content #109

Uh oh!

Conversation

dsikka commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rahul-tuli Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

rahul-tuli Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

rahul-tuli Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

rahul-tuli Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dsikka commented Sep 1, 2025 •

edited

Loading

github-actions bot commented Sep 1, 2025 •

edited

Loading