-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[Bugfix] Padded Eagle Specdec with Chunked Prefill #26263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Rémi Delacourt <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request provides a bugfix for padded speculative decoding, specifically within the prepare_next_token_ids_padded
function. The change removes a faulty special case for single-token generation (max_gen_len == 1
) that did not correctly handle discarded requests. This could have resulted in invalid token IDs being used in subsequent steps. The new implementation generalizes the validity masking logic, making it simpler, more robust, and correct for all scenarios. The fix is sound and improves the correctness of the speculative decoding implementation.
valid_mask = (valid_sampled_token_ids_gpu != -1) & ( | ||
valid_sampled_token_ids_gpu < gpu_input_batch.vocab_size | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This simplification is a great improvement. By removing the special case for max_gen_len == 1
, the code is now more robust. The previous logic didn't account for discarded requests when max_gen_len == 1
, which could lead to using an invalid token ID of -1. This unified approach correctly handles all cases.
Signed-off-by: Rémi Delacourt <[email protected]>
This bug has been fixed in #26231. I think it would still be nice to merge updated tests, so please update your PR with the fix from main if you wish to continue. |
@benchislett sounds good! Done. |
Signed-off-by: Rémi Delacourt <[email protected]>
Purpose
This adds test to #26231
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.