[vllm, rollout, cfg, doc] feat: Accelerate RL rollouts with EAGLE/EAGLE3 speculative decoding#5925
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for inference-only speculative decoding in vLLM rollout, including necessary configuration updates, metrics tracking, and weight loading adjustments. I have identified two issues: an incorrect handling of the acceptance rate when no draft tokens are generated, and a malformed f-string in a validation error message.
| spec_delta["num_accepted_tokens"] / spec_delta["num_draft_tokens"] | ||
| if spec_delta["num_draft_tokens"] > 0 | ||
| else float("inf") |
There was a problem hiding this comment.
The calculation for acceptance_rate when spec_delta["num_draft_tokens"] is 0 results in float("inf"). This is likely incorrect and can cause issues with metric aggregation (e.g., np.mean over inf will be inf). When no draft tokens are generated, the acceptance rate should be 0.0.
| spec_delta["num_accepted_tokens"] / spec_delta["num_draft_tokens"] | |
| if spec_delta["num_draft_tokens"] > 0 | |
| else float("inf") | |
| spec_delta["num_accepted_tokens"] / spec_delta["num_draft_tokens"] | |
| if spec_delta["num_draft_tokens"] > 0 | |
| else 0.0 |
| raise ValueError( | ||
| f"draft_tensor_parallel_size={self.speculative_decoding.draft_tensor_parallel_size} " | ||
| "cannot be other value than 1 or target model " | ||
| "tensor_parallel_size={self.tensor_model_parallel_size} " | ||
| ) |
There was a problem hiding this comment.
The f-string for the ValueError message is malformed. The variable self.tensor_model_parallel_size is outside the curly braces, so it will be printed literally instead of its value being interpolated. This will produce a confusing error message for users.
| raise ValueError( | |
| f"draft_tensor_parallel_size={self.speculative_decoding.draft_tensor_parallel_size} " | |
| "cannot be other value than 1 or target model " | |
| "tensor_parallel_size={self.tensor_model_parallel_size} " | |
| ) | |
| raise ValueError( | |
| f"draft_tensor_parallel_size={self.speculative_decoding.draft_tensor_parallel_size} " | |
| "cannot be other value than 1 or target model " | |
| f"tensor_parallel_size={self.tensor_model_parallel_size}" | |
| ) |
What does this PR do?
Checklist Before Starting
Search for similar PRs. Paste at least one query link here: [rollout] feat: support eagle3 speculative decode in rollout
[rollout] feat: support eagle3 speculative decode in rollout #5509
Format the PR title as
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,fully_async,one_step_off,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
We used
qwen3-8btogether withRedHatAI/Qwen3-8B-speculator.eagle3onverl-team/lighteval-MATH-preprocessed.In our experiments, the final reward is comparable to the baseline. The screenshots also highlights two implementation details that are important for making this work correctly in RL rollout: reloading the draft-model weights after actor weight sync, and rebuilding the RoPE
cos_sin cacheafter reload. Also it shows speculative decoding metrics such as acceptance rate and mean acceptance length so the behavior and performance of EAGLE/EAGLE3 can be analyzed directly during training.In some settings, performance of draft model may degrade (acceptance length drops). Even with that limitation, this change opens a useful direction for more practical research on speculative decoding in RL for LLMs, and makes it easier to explore follow-up ideas such as retraining or adapting the draft model.
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.