-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Closed
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy BackendSpeculative Decoding<NV>MTP/Eagle/Medusa/Lookahead/Prompt-Lookup-Decoding/Draft-Target-Model/ReDrafter<NV>MTP/Eagle/Medusa/Lookahead/Prompt-Lookup-Decoding/Draft-Target-Model/ReDrafter
Description
🚀 The feature, motivation and pitch
Currently we do not have a good way to write smoke tests for speculative decoding in AutoDeploy, because we have no way to prevent the draft model (whether it is a full-fledged draft or Eagle) from loading weights. These are not too many weights, but maybe we should skip it anyway.
It would be good to have lightweight smoke tests for this - see if it is easy to skip loading weights for the draft model.
Alternatives
Since drafters are pretty small, especially for Eagle checkpoints, we could just load them with the weights and say it's okay for our smoke tests. At least we don't load the target model weights.
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy BackendSpeculative Decoding<NV>MTP/Eagle/Medusa/Lookahead/Prompt-Lookup-Decoding/Draft-Target-Model/ReDrafter<NV>MTP/Eagle/Medusa/Lookahead/Prompt-Lookup-Decoding/Draft-Target-Model/ReDrafter