Merge main back to flash_attn #1

yubofredwang · 2025-12-29T04:50:28Z

Motivation

Fix conflict so sgl-project#314 can be merged

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

* unified benchmark scripts * polish

Co-authored-by: baojiangnan <[email protected]>

add sglang args in gen hidden states polih polish polish

* Add subset options for opc * lint & cat datasets

* fixed non-runnable examples * polish * polish

* fix missing import * fix-args-type

* added tests for scripts * added tests for scripts * polish * polish * polish * polish * polish * polish * polish * added tests for scripts * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish

* support more sampling params * remove recommended * some comments * lint

* Add examples of qwen3-coder-30B-A3B training script * tiny fix * Remove WANDB API key export from script to align with other examples Removed Weights & Biases configuration from script.

* support checkpoint * lint * capture only required hidden states * revert regen * fix llama * backward compatible * Update specforge/modeling/target/custom_backend/qwen3_moe.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * gemini suggests * fix * fix phi --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* updated benchmark docs * polish * polish

* grouped args for better reference * grouped args for better reference

* feature: optimize online training use hf backend less GPU memory polish polish * polish

* added model-download-dir * polish

* add missing * Update specforge/modeling/target/custom_backend/qwen3_moe.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fixed kv head replication in qwen3 moe * poliosh * poliosh

* docs:add benchmark refer polish polish * polish

* optimized sglang backend memory usage * poliosh

* support qwen3 next * fix bug * fix bug * update sglang

* Add --is-preformatted flag to prepare_hidden_states.py Added support for preformatted input data in prepare_hidden_states.py, matching the existing flag in train_eagle3.py. This allows users to skip chat template application when their data already has the template applied. Changes: - Added --is-preformatted argument to data group - Updated cache key to include is_preformatted for proper caching - Pass is_preformatted to build_eagle3_dataset() * Update documentation for --is-preformatted flag in prepare_hidden_states.py - Updated script docstring with usage example for --is-preformatted - Updated data_preparation.md to document --is-preformatted for offline training * Address code review: add --output-path to docstring example Added back the --output-path argument to the first usage example in the docstring for clarity and consistency with the pre-formatted data example.

* added more benchmarks * polish * polish * polish

* fixed benchmarks * polish

- Add training script for Qwen3-Coder-480B-A35B-Instruct-FP8 with EP support - Fix MOE_TP group initialization to properly handle ep_size > 1

* added specbundle doc * polish

* Publish spec-bundle dashboard Co-authored-by: tony3liu <[email protected]> * Update publish_docs.yaml * Update publish_docs.yaml * fix bug * fix bug * fix gbu * fix bug * fix bug * fix bug * fix bug * fix bug * fix bug * fix bug * fix bug * fix bug * fix bug * fix bug * fix bug * fix bug * refactor: migrate benchmark dashboard from XLSX to JSON data format * fix bug * fix bug * fix code format * fix bug * fix bug * fix bug * fix bug * fix bug * fix bug --------- Co-authored-by: tony3liu <[email protected]> Co-authored-by: liutao <[email protected]>

…ng (#378) * feat: add training support for DeepSeek-v3 EAGLE-3 speculative decoding Co-authored-by: GeLee-Q <[email protected]> Co-authored-by: Gao016 <[email protected]> Co-authored-by: yzlnew <[email protected]> * fix: correct values in deepseek-v3-671b-eagle3.json Co-authored-by: GeLee-Q <[email protected]> Co-authored-by: Gao016 <[email protected]> Co-authored-by: yzlnew <[email protected]> * chore: update examples and templates for DeepSeek-V3 EAGLE-3 --------- Co-authored-by: chenyefei.cyf <[email protected]> Co-authored-by: GeLee-Q <[email protected]> Co-authored-by: Gao016 <[email protected]> Co-authored-by: yzlnew <[email protected]>

* supoort thinking models update kimi-k2 and deepssek polish fix lint fix kimi-k2 and gpt-oss fix lint * update parse to handle the Boundary token --------- Co-authored-by: Shenggui Li <[email protected]>

* add ds v3 * init * modify sglang fit deepseek * fix deepseek rparser * ulysses finish * ring offline finish * tmp * test pass * test fail * test * clean up * remove deepseek * clean up * clean up * - * - * format * fix unit test --------- Co-authored-by: Yu Feng <[email protected]> Co-authored-by: daiyajun <[email protected]>

* fixed templates * polish

* feat: make dataloader num_workers configurable and fix prefetch_factor issue 1. scripts/train_eagle3.py: - Added argument (default: 4) to replace the hardcoded value. - This allows adjusting worker count for low shared memory environments or debugging. 2. specforge/data/utils.py: - Fixed when is 0 by forcing to None. * Fix: add num_workers argument and fix dataloader bug --------- Co-authored-by: yeshihai <[email protected]>

* added regenerated datasets * polish

FrankLeeeee and others added 30 commits November 23, 2025 12:36

added sglang arguments (#317)

00edc17

unified benchmark scripts (#319)

72337ef

* unified benchmark scripts * polish

fixed data regeneration script (#321)

95cb2ae

fix ckpt dir check (#320)

f6ec513

Co-authored-by: baojiangnan <[email protected]>

support gen hidden states use fp8 (#318)

d582d7d

add sglang args in gen hidden states polih polish polish

Add subset options for opc (#312)

34b5883

* Add subset options for opc * lint & cat datasets

Fixed the installation command

d960896

organized unit tests (#324)

1e3fb6e

fixed non-runnable examples (#322)

44409f6

* fixed non-runnable examples * polish * polish

merged data generation scripts (#323)

341abf5

Fix args type (#328)

ed30525

* fix missing import * fix-args-type

added autoflakes pre-commit hook (#327)

04a6bcf

fixed specforge imports (#332)

70f5187

bump to v0.1.1 (#330)

8dff2b7

Support more sampling params in data generation (#333)

9b05770

* support more sampling params * remove recommended * some comments * lint

Add qwen3-coder-30B-A3B-Instruct Eagle3 Training Script (#329)

b77e6f7

* Add examples of qwen3-coder-30B-A3B training script * tiny fix * Remove WANDB API key export from script to align with other examples Removed Weights & Biases configuration from script.

fix mmstart benchmrk (#334)

5c43694

updated benchmark docs (#340)

3e0cda0

* updated benchmark docs * polish * polish

grouped args for better reference (#343)

44d5c62

* grouped args for better reference * grouped args for better reference

added profiling (#344)

3bca52c

Feature/online train use hf backend optimize GPU usage (#346)

94de9f8

* feature: optimize online training use hf backend less GPU memory polish polish * polish

added model-download-dir (#347)

5c355b8

* added model-download-dir * polish

fix: is_running to get_run (#353)

c65a358

add default build_dataset_num_proc value (#354)

dc44caf

fixed kv head replication in qwen3 moe (#357)

9639a52

* fixed kv head replication in qwen3 moe * poliosh * poliosh

[Docs] add benchmark refer (#358)

e0625b0

* docs:add benchmark refer polish polish * polish

optimized sglang backend memory usage (#359)

e012016

* optimized sglang backend memory usage * poliosh

sleepcoo and others added 25 commits December 12, 2025 16:05

update sglang && support qwen3 next (#355)

381476b

* support qwen3 next * fix bug * fix bug * update sglang

remove unuse code (#367)

86c1749

added more benchmarks (#369)

ef165ac

* added more benchmarks * polish * polish * polish

added deepwiki badge (#370)

901c868

fixed benchmarks (#372)

19e84eb

* fixed benchmarks * polish

feat: add support for Qwen3-Coder-480B-A35B-Instruct-FP8 training (#371)

f656ae7

- Add training script for Qwen3-Coder-480B-A35B-Instruct-FP8 with EP support - Fix MOE_TP group initialization to properly handle ep_size > 1

added specbundle doc (#383)

1c17635

* added specbundle doc * polish

fixed doc build (#384)

157745d

added link to specbundle (#385)

106874d

bump version to v0.2.0 (#386)

73e6f80

added dashboard link (#387)

e30518a

Support Qwen3,Qwen3-Next,Kimi-K2,Deepseek models template (#381)

280fab9

* supoort thinking models update kimi-k2 and deepssek polish fix lint fix kimi-k2 and gpt-oss fix lint * update parse to handle the Boundary token --------- Co-authored-by: Shenggui Li <[email protected]>

fixed templates (#389)

5660635

* fixed templates * polish

corrected llama3 examples (#391)

4ac6bb7

added regenerated datasets (#395)

a686e3d

* added regenerated datasets * polish

fixed benchmark process termination (#394)

866ca44

added regenerated data processing for llama series (#396)

b7febe8

added specbundle to readme (#397)

886ab9c

Merge branch 'main' into modal-labs/flash_attn

10004e7

fix deps

6742725

yubofredwang mentioned this pull request Dec 29, 2025

feat: added low VRAM flash attention backend sgl-project/SpecForge#314

Open

6 tasks

yubofredwang added 2 commits January 1, 2026 15:31

lint

5f18a47

bump flash-attn

d75ba86

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge main back to flash_attn #1

Merge main back to flash_attn #1

Uh oh!

yubofredwang commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Merge main back to flash_attn #1

Are you sure you want to change the base?

Merge main back to flash_attn #1

Uh oh!

Conversation

yubofredwang commented Dec 29, 2025

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants