Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
00edc17
added sglang arguments (#317)
FrankLeeeee Nov 23, 2025
72337ef
unified benchmark scripts (#319)
FrankLeeeee Nov 23, 2025
95cb2ae
fixed data regeneration script (#321)
FrankLeeeee Nov 24, 2025
f6ec513
fix ckpt dir check (#320)
justadogistaken Nov 24, 2025
d582d7d
support gen hidden states use fp8 (#318)
jiapingW Nov 24, 2025
34b5883
Add subset options for opc (#312)
jhinpan Nov 24, 2025
d960896
Fixed the installation command
FrankLeeeee Nov 25, 2025
1e3fb6e
organized unit tests (#324)
FrankLeeeee Nov 25, 2025
44409f6
fixed non-runnable examples (#322)
FrankLeeeee Nov 25, 2025
341abf5
merged data generation scripts (#323)
FrankLeeeee Nov 25, 2025
ed30525
Fix args type (#328)
mmdbhs Nov 25, 2025
04a6bcf
added autoflakes pre-commit hook (#327)
FrankLeeeee Nov 25, 2025
70f5187
fixed specforge imports (#332)
FrankLeeeee Nov 27, 2025
3df5b27
added tests for scripts (#331)
FrankLeeeee Nov 27, 2025
8dff2b7
bump to v0.1.1 (#330)
FrankLeeeee Nov 27, 2025
9b05770
Support more sampling params in data generation (#333)
yubofredwang Nov 27, 2025
b77e6f7
Add qwen3-coder-30B-A3B-Instruct Eagle3 Training Script (#329)
jhinpan Nov 28, 2025
e7b3716
Remove full hidden states capturing in custom backend (#337)
yubofredwang Nov 30, 2025
5c43694
fix mmstart benchmrk (#334)
jiapingW Nov 30, 2025
3e0cda0
updated benchmark docs (#340)
FrankLeeeee Dec 1, 2025
44d5c62
grouped args for better reference (#343)
FrankLeeeee Dec 2, 2025
3bca52c
added profiling (#344)
FrankLeeeee Dec 2, 2025
94de9f8
Feature/online train use hf backend optimize GPU usage (#346)
jiapingW Dec 2, 2025
5c355b8
added model-download-dir (#347)
FrankLeeeee Dec 2, 2025
a77b9de
add missing layers_to_output_hidden_states in qwen3 moe (#351)
yubofredwang Dec 5, 2025
c65a358
fix: is_running to get_run (#353)
Zeyi-Lin Dec 7, 2025
dc44caf
add default build_dataset_num_proc value (#354)
sleepcoo Dec 8, 2025
9639a52
fixed kv head replication in qwen3 moe (#357)
FrankLeeeee Dec 8, 2025
e0625b0
[Docs] add benchmark refer (#358)
jiapingW Dec 9, 2025
e012016
optimized sglang backend memory usage (#359)
FrankLeeeee Dec 10, 2025
381476b
update sglang && support qwen3 next (#355)
sleepcoo Dec 12, 2025
020a856
Add --is-preformatted flag to prepare_hidden_states.py (#350)
Ofir408 Dec 12, 2025
86c1749
remove unuse code (#367)
sleepcoo Dec 16, 2025
ef165ac
added more benchmarks (#369)
FrankLeeeee Dec 16, 2025
901c868
added deepwiki badge (#370)
FrankLeeeee Dec 16, 2025
19e84eb
fixed benchmarks (#372)
FrankLeeeee Dec 16, 2025
f656ae7
feat: add support for Qwen3-Coder-480B-A35B-Instruct-FP8 training (#371)
xiaomin-D Dec 18, 2025
1c17635
added specbundle doc (#383)
FrankLeeeee Dec 23, 2025
157745d
fixed doc build (#384)
FrankLeeeee Dec 23, 2025
d9952d1
Add a SpecBundle dashboard (#382)
sleepcoo Dec 23, 2025
106874d
added link to specbundle (#385)
FrankLeeeee Dec 23, 2025
73e6f80
bump version to v0.2.0 (#386)
FrankLeeeee Dec 23, 2025
e30518a
added dashboard link (#387)
FrankLeeeee Dec 23, 2025
4a1101c
feat: add training support for DeepSeek-V3 EAGLE-3 speculative decodi…
yefei12 Dec 25, 2025
280fab9
Support Qwen3,Qwen3-Next,Kimi-K2,Deepseek models template (#381)
jiapingW Dec 25, 2025
ee22b87
[feature] add Sequence Parallelism support for offline training (#366)
uygnef Dec 25, 2025
5660635
fixed templates (#389)
FrankLeeeee Dec 25, 2025
4ac6bb7
corrected llama3 examples (#391)
FrankLeeeee Dec 26, 2025
69b679d
[Feat] Make num_workers configurable and fix 0-worker crash (#376)
yeshihai Dec 26, 2025
a686e3d
added regenerated datasets (#395)
FrankLeeeee Dec 27, 2025
866ca44
fixed benchmark process termination (#394)
FrankLeeeee Dec 27, 2025
b7febe8
added regenerated data processing for llama series (#396)
FrankLeeeee Dec 28, 2025
886ab9c
added specbundle to readme (#397)
FrankLeeeee Dec 28, 2025
10004e7
Merge branch 'main' into modal-labs/flash_attn
yubofredwang Dec 29, 2025
6742725
fix deps
yubofredwang Dec 29, 2025
5f18a47
lint
yubofredwang Jan 1, 2026
d75ba86
bump flash-attn
yubofredwang Jan 4, 2026
080bd28
update ci image
sleepcoo Jan 13, 2026
b849a2a
test fa3
sleepcoo Jan 13, 2026
1d6bbe5
fix bug
sleepcoo Jan 13, 2026
569f375
fix bug
sleepcoo Jan 13, 2026
498994f
fix bug
sleepcoo Jan 13, 2026
3e6e827
Update Docker image version in test workflow
FrankLeeeee Jan 14, 2026
42fef31
Update pip install command in test workflow
FrankLeeeee Jan 14, 2026
79d9411
Update pyproject.toml
FrankLeeeee Jan 14, 2026
18918fc
Add setuptools installation to workflow
FrankLeeeee Jan 14, 2026
45cad19
Update test.yaml
FrankLeeeee Jan 14, 2026
3a95e87
Update test.yaml
FrankLeeeee Jan 14, 2026
021d8f2
Refactor test workflow to eliminate redundancy
FrankLeeeee Jan 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions .github/workflows/publish_docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ concurrency:
jobs:
deploy-github-pages:
runs-on: ubuntu-latest
if: github.repository == 'sgl-project/specforge'
if: github.repository == 'sgl-project/specforge' || github.repository == 'sleepcoo/SpecForge'
permissions:
contents: write
steps:
Expand All @@ -28,17 +28,42 @@ jobs:
with:
python-version: '3.13'

- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: docs/spec_bundle/package-lock.json

- name: Install dependencies
run: |
apt-get update && apt-get install -y pandoc parallel retry
sudo apt-get update && sudo apt-get install -y pandoc parallel retry
pip install -r docs/requirements.txt

- name: Build spec bundle dashboard
run: |
# Copy logos to public directory
cp assets/logo.png docs/spec_bundle/public/logo.png
cp docs/_static/imgs/specbundle-logo.png docs/spec_bundle/public/specbundle-logo.png
cd docs/spec_bundle
npm ci
npm run build
# Clean up node_modules to prevent Sphinx from processing them
rm -rf node_modules
cd ..

- name: Build documentation
run: |
cd docs
make compile
make html
# Copy SpecBundle to root of output directory
mkdir -p _build/html/SpecBundle
cp -r spec_bundle/dist/* _build/html/SpecBundle/

- name: Add .nojekyll file
run: |
touch ./docs/_build/html/.nojekyll

- name: Deploy
uses: peaceiris/actions-gh-pages@v4
Expand Down
13 changes: 9 additions & 4 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,12 @@ jobs:

- name: Restore cache
run: |
if [ -d /github/home/cache ] && [ ! -z "$(ls -A /github/home/cache/)" ]; then
cp -p -r /github/home/cache ./
fi

if [ -d /github/home/sf ] && [ ! -z "$(ls -A /github/home/sf/)" ]; then
cp -p -r /github/home/sf/* ./
cp -p -r /github/home/sf ./
fi

- name: Remove flashinfer # this is needed to avoid flashinfer jit compilation makes the program hang
Expand All @@ -42,16 +46,17 @@ jobs:
uv venv sf -p 3.11
fi
source sf/bin/activate
uv pip install -v . --prerelease=allow
uv pip install setuptools
uv pip install -v . --prerelease=allow --no-build-isolation

- name: Run test
timeout-minutes: 30
shell: bash
run: |
source sf/bin/activate
export PYTHONPATH=$PWD
python -m unittest discover -s ./tests -p "test_*.py" -v
python tests/test_utils/test_flash_attention.py

- name: Save cache
run: |
cp -p -r sf /github/home/
cp -p -r cache /github/home/
5 changes: 5 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
default_stages: [pre-commit, pre-push, manual]

repos:
- repo: https://github.com/PyCQA/autoflake
rev: v2.3.1
hooks:
- id: autoflake
args: [--remove-all-unused-imports, --in-place]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
Expand Down
19 changes: 18 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@
<img src="./assets/logo.png" alt="logo" width="400" margin="10px"></img>

[![documentation](https://img.shields.io/badge/📖-Documentation-red.svg?style=flat)](https://docs.sglang.ai/SpecForge/)
[![SpecBundle](https://img.shields.io/badge/🤗%20SpecBundle-yellow.svg?style=flat)](https://huggingface.co/collections/lmsys/specbundle)
[![DeepWiki](https://img.shields.io/badge/DeepWiki-SpecForge-blue.svg?logo=)](https://deepwiki.com/sgl-project/SpecForge)

[![github badge](https://img.shields.io/badge/📃%20LMSYS-Blog-black.svg?style=flat)](https://lmsys.org/blog/2025-07-25-spec-forge/)
[![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://sgl-fru7574.slack.com/archives/C09784E3EN6)
[![SGLang Eagle3](https://img.shields.io/badge/🤗%20Hugging%20Face-SGLang%20Eagle3-yellow.svg?style=flat)](https://huggingface.co/collections/lmsys/eagle-3-6886b2329f3998a8bc23f8ed)
[![license](https://img.shields.io/badge/License-MIT%202.0-blue)](./LICENSE)

</div>
Expand All @@ -21,8 +23,23 @@ We have seen many open-source projects for speculative decoding, but most of the

Check out [**our documentation**](https://docs.sglang.ai/SpecForge/) to get started.


## 🚀 Accelerate with SpecBundle

SpecBundle is a collection of production-grade speculative decoding models that are released by the SpecForge team and our industry partners. They provide higher acceptance rate compared to the existing open-source checkpoints over a wide range of domains. Together with SGLang, you can experience up to 4x speedup for inference. Check out our resources below:


| Item | Link |
| --- | --- |
| 📝 Documentation | [Link](https://docs.sglang.io/SpecForge/community_resources/specbundle.html) |
| 📊 Performance Dashboard | [Link](https://docs.sglang.io/SpecForge/SpecBundle/index.html) |
| 🤗 Hugging Face Collection | [Link](https://huggingface.co/collections/lmsys/specbundle) |


## 🎉 News

- [2025-12] 🎉 Released SpecBundle (phase 1) and SpecForge v0.2. Check out our blog at [LMSYS.org](https://lmsys.org/blog/2025-12-23-spec-bundle-phase-1/)
- [2025-12] 🔔 Released the roadmap for 2026 Q1.
- [2025-08] 🔔 SpecForge is listed as a [flagship project](https://lmsys.org/about/) in LMSYS. Congratulations to the SpecForge team!
- [2025-08] 🔥 SpecForge powered the Eagle3 draft model for GPT-OSS. Check out the blog at [LMSYS.org](https://lmsys.org/blog/2025-08-27-gpt-oss/)
- [2025-07] 🔥 SpecForge is released together with Llama4-Eagle3 checkpoints. Check out our blog at [LMSYS.org](https://lmsys.org/blog/2025-07-25-spec-forge/)
Expand Down
1 change: 1 addition & 0 deletions benchmarks/.gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
*.jsonl
results/
83 changes: 39 additions & 44 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -1,72 +1,67 @@
# Benchmarking for Speculative Decoding

## Setup
## Overview

You can create a new environment and install SGLang with the following command:
We provided a unified script to test the performance of the Speculative Decoding with EAGLE3 algorithm on multiple datasets. You can follow the steps below to run the benchmarks.

```bash
# create virtual env
uv venv sglang -p 3.11
source sglang/bin/activate
## Run Benchmarks

# install sglang
uv pip install "sglang[all]>=0.4.9.post2"
```
### Launch SGLang and Benchmarker Concurrently

You can serve your trained model with SGLang with the following command by replacing the `<target-model-path>` and `<draft-model-path>` with the actual path to the target model and draft model.
`bench_eagle3.py` can help you launch a SGLang server process and a Benchmarking process concurrently. In this way, you don't have to launch the SGLang server manually, this script will manually handle the SGLang launch under different speculative decoding configurations. Some important arguments are:
- `--model-path`: the path to the target model.
- `--speculative-draft-model-path`: the path to the draft model.
- `--port`: the port to launch the SGLang server.
- `--trust-remote-code`: trust the remote code.
- `--mem-fraction-static`: the memory fraction for the static memory.
- `--tp-size`: the tensor parallelism size.
- `--attention-backend`: the attention backend.
- `--config-list`: the list of speculative decoding configuration to test, the format is `<batch-size>,<num-steps>,<topk>,<num-draft-tokens>`.
- `--benchmark-list`: the list of benchmarks to test, the format is `<benchmark-name>:<num-prompts>:<subset>`.

```bash
python3 -m sglang.launch_server \
--model <target-model-path> \
--speculative-algorithm EAGLE3 \
--speculative-draft-model-path <draft-model-path> \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
--mem-fraction-static 0.75 \
--cuda-graph-max-bs 2 \
--tp 1 \
--context-length 8192 \
--trust-remote-code \
--host 0.0.0.0 \
```shell
python3 bench_eagle3.py \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--speculative-draft-model-path lmsys/sglang-EAGLE3-LLaMA3.1-Instruct-8B \
--port 30000 \
--trust-remote-code \
--mem-fraction-static 0.8 \
--tp-size 1 \
--attention-backend fa3 \
--config-list 1,0,0,0 1,3,1,4 \
--benchmark-list mtbench gsm8k:5 ceval:5:accountant \
--dtype bfloat16
```

## Run Benchmarks
### Launch Benchmarker Independently

You first need to start the SGLang server:
If you want to launch the SGLang server independently, you can use the following command.

```bash
```shell
# you can launch a server
python3 -m sglang.launch_server \
--model <target-model-path> \
--model meta-llama/Llama-3.1-8B-Instruct \
--speculative-algorithm EAGLE3 \
--speculative-draft-model-path <draft-model-path> \
--speculative-draft-model-path lmsys/sglang-EAGLE3-LLaMA3.1-Instruct-8B \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
--mem-fraction-static 0.75 \
--cuda-graph-max-bs 2 \
--tp 8 \
--context-length 8192 \
--cuda-graph-max-bs 1 \
--tp 1 \
--trust-remote-code \
--host 0.0.0.0 \
--port 30000 \
--dtype bfloat16
```

Then you can run the benchmarks:
Then we can start benchmarking. Note that you should use the same host and port as the one used in the SGLang server. Note that `--skip-launch-server` is required to skip the launch of the SGLang server.

```bash
# GSM8K
python run_gsm8k.py

# MATH-500
python run_math500.py

# MTBench
python run_mtbench.py

# HumanEval
python run_humaneval.py
python bench_eagle3.py \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--port 30000 \
--config-list 1,3,1,4 \
--benchmark-list mtbench:5 ceval:5:accountant gsm8k:5 humaneval:5 math500:5 mtbench:5 aime:1 \
--skip-launch-server
```
Loading