[Speculative Decoding] Support suffix decoding #6403

Deleter-D · 2026-02-09T08:22:09Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Suffix Decoding is a model-free speculative decoding method that accelerates repetitive inference tasks (e.g., agent workflows, coding) using efficient CPU-based suffix trees for rapid draft token prediction, eliminating GPU overhead.

Modifications

Support suffix decoding
Refine CUDA Graph replay selection in speculative decoding

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server \
    --model ${path_to_main_model} \
    --tensor-parallel-size 4 \
    --config ${path_to_FastDeploy}benchmarks/yaml/eb45t-32k-wint4-mtp-h100-tp4.yaml \
    --speculative-config '{"method": "mtp", "num_speculative_tokens": 4, "suffix_decoding_max_tree_depth": 64, "suffix_decoding_max_cached_requests": 10000, "suffix_decoding_max_spec_factor": 1.0, "suffix_decoding_min_token_prob": 0.1}'

Parameter Descriptions

# The maximum length of token sequences cached in suffix trees.
self.suffix_decoding_max_tree_depth: int = 64

# The limits of requests that can be stored in the cache.
self.suffix_decoding_max_cached_requests: int = -1

# The factor of matched length, calculated as num_draft_tokens = suffix_max_spec_factor * matched_length
self.suffix_decoding_max_spec_factor: float = 1.0

# The probability threshold for speculated tokens.
self.suffix_decoding_min_token_prob: float = 0.1

Accuracy Tests

N/A

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-02-09T08:22:15Z

Thanks for your contribution!

Deleter-D had a problem deploying to Metax_ci February 9, 2026 08:22 — with GitHub Actions Failure

Deleter-D had a problem deploying to Metax_ci February 9, 2026 10:04 — with GitHub Actions Failure

Deleter-D added 3 commits February 9, 2026 22:12

support suffix decoding

7a485f1

add ut

8fb8318

solve conflict

b1a769c

Deleter-D force-pushed the dev_suffix_tree branch from 987c263 to b1a769c Compare February 10, 2026 04:12

Deleter-D temporarily deployed to Metax_ci February 10, 2026 04:12 — with GitHub Actions Inactive

simplify log

46a14dd

Deleter-D temporarily deployed to Metax_ci February 10, 2026 06:21 — with GitHub Actions Inactive

fix requirements

b0b995a

Deleter-D temporarily deployed to Metax_ci February 10, 2026 14:17 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speculative Decoding] Support suffix decoding #6403

[Speculative Decoding] Support suffix decoding #6403

Uh oh!

Deleter-D commented Feb 9, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Speculative Decoding] Support suffix decoding #6403

Are you sure you want to change the base?

[Speculative Decoding] Support suffix decoding #6403

Uh oh!

Conversation

Deleter-D commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Deleter-D commented Feb 9, 2026 •

edited

Loading