[vllm, rollout, cfg, doc] feat: Accelerate RL rollouts with EAGLE/EAGLE3 speculative decoding by alekseymalakhov11 · Pull Request #5925 · verl-project/verl

alekseymalakhov11 · 2026-04-08T17:31:37Z

What does this PR do?

Add inference-only speculative decoding for vLLM rollouts using EAGLE / EAGLE3 draft models. This PR adds the vLLM rollout/config wiring for the new speculative decoding path, reload handling for the draft model during weight sync, rollout metrics for speculative decoding, and documentation.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: [rollout] feat: support eagle3 speculative decode in rollout
[rollout] feat: support eagle3 speculative decode in rollout #5509
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

We used qwen3-8b together with RedHatAI/Qwen3-8B-speculator.eagle3 on verl-team/lighteval-MATH-preprocessed.

In our experiments, the final reward is comparable to the baseline. The screenshots also highlights two implementation details that are important for making this work correctly in RL rollout: reloading the draft-model weights after actor weight sync, and rebuilding the RoPE cos_sin cache after reload. Also it shows speculative decoding metrics such as acceptance rate and mean acceptance length so the behavior and performance of EAGLE/EAGLE3 can be analyzed directly during training.

In some settings, performance of draft model may degrade (acceptance length drops). Even with that limitation, this change opens a useful direction for more practical research on speculative decoding in RL for LLMs, and makes it easier to explore follow-up ideas such as retraining or adapting the draft model.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
[] If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

gemini-code-assist

Code Review

This pull request introduces support for inference-only speculative decoding in vLLM rollout, including necessary configuration updates, metrics tracking, and weight loading adjustments. I have identified two issues: an incorrect handling of the acceptance rate when no draft tokens are generated, and a malformed f-string in a validation error message.

gemini-code-assist · 2026-04-08T17:37:02Z

verl/experimental/agent_loop/agent_loop.py

+                    spec_delta["num_accepted_tokens"] / spec_delta["num_draft_tokens"]
+                    if spec_delta["num_draft_tokens"] > 0
+                    else float("inf")


The calculation for acceptance_rate when spec_delta["num_draft_tokens"] is 0 results in float("inf"). This is likely incorrect and can cause issues with metric aggregation (e.g., np.mean over inf will be inf). When no draft tokens are generated, the acceptance rate should be 0.0.

Suggested change

spec_delta["num_accepted_tokens"] / spec_delta["num_draft_tokens"]

if spec_delta["num_draft_tokens"] > 0

else float("inf")

spec_delta["num_accepted_tokens"] / spec_delta["num_draft_tokens"]

if spec_delta["num_draft_tokens"] > 0

else 0.0

gemini-code-assist · 2026-04-08T17:37:02Z

verl/workers/config/rollout.py

+                raise ValueError(
+                    f"draft_tensor_parallel_size={self.speculative_decoding.draft_tensor_parallel_size} "
+                    "cannot be other value than 1 or target model "
+                    "tensor_parallel_size={self.tensor_model_parallel_size} "
+                )


The f-string for the ValueError message is malformed. The variable self.tensor_model_parallel_size is outside the curly braces, so it will be printed literally instead of its value being interpolated. This will produce a confusing error message for users.

Suggested change

raise ValueError(

f"draft_tensor_parallel_size={self.speculative_decoding.draft_tensor_parallel_size} "

"cannot be other value than 1 or target model "

"tensor_parallel_size={self.tensor_model_parallel_size} "

)

raise ValueError(

f"draft_tensor_parallel_size={self.speculative_decoding.draft_tensor_parallel_size} "

"cannot be other value than 1 or target model "

f"tensor_parallel_size={self.tensor_model_parallel_size}"

)

alekseymalakhov11 added 8 commits April 8, 2026 15:02

add eagle speculative decoding for rollouts

9226279

merge origin/master

157c8fc

fix

743ad8d

fix docs

8349c49

remove config

81e2338

fix assert

2731f38

fix async requests

4c3396b

add warning

b91f6da

alekseymalakhov11 requested review from ArronHZG, PeterSH6, chenhaiq, eric-haibin-lin, tongyx361, vermouth1992 and wuxibin89 as code owners April 8, 2026 17:31

gemini-code-assist bot reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vllm, rollout, cfg, doc] feat: Accelerate RL rollouts with EAGLE/EAGLE3 speculative decoding#5925

[vllm, rollout, cfg, doc] feat: Accelerate RL rollouts with EAGLE/EAGLE3 speculative decoding#5925
alekseymalakhov11 wants to merge 8 commits intoverl-project:mainfrom
alekseymalakhov11:add-eagle-speculative-decoding

alekseymalakhov11 commented Apr 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 8, 2026

Uh oh!

gemini-code-assist bot Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alekseymalakhov11 commented Apr 8, 2026

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant