Skip to content

refactor: optimize speculative decoding pipeline.#974

Open
RobbieLeung wants to merge 1 commit intojd-opensource:mainfrom
RobbieLeung:refactor/mtp_input
Open

refactor: optimize speculative decoding pipeline.#974
RobbieLeung wants to merge 1 commit intojd-opensource:mainfrom
RobbieLeung:refactor/mtp_input

Conversation

@RobbieLeung
Copy link
Collaborator

Wait #960

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed refactoring of the speculative decoding pipeline. The introduction of spec_input_builder centralizes input construction logic, greatly improving code clarity and maintainability. The EmbeddingCache has been effectively redesigned to better support the specific needs of speculative decoding, and the worker implementations (MTPWorkerImpl, Eagle3WorkerImpl) are now more modular. Overall, these changes represent a substantial improvement to the architecture. I have one critical point regarding a potential crash scenario due to a strong assumption about the output of the rejection sampler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants