Skip to content

[Bugfix][Speculative Decoding] Fix Eagle3 quantization config inheritance#120

Closed
rahul-tuli wants to merge 1 commit intomainfrom
fix/eagle3-quantization-config
Closed

[Bugfix][Speculative Decoding] Fix Eagle3 quantization config inheritance#120
rahul-tuli wants to merge 1 commit intomainfrom
fix/eagle3-quantization-config

Conversation

@rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Sep 29, 2025

Eagle3 drafters were incorrectly inheriting the verifier's quantization
configuration instead of using their own, causing KeyError when loading
unquantized drafter weights with quantized verifiers.

This implements a clean inheritance pattern where:

  • Base LlamaDecoderLayer has configurable get_quant_config() method
  • Eagle3 LlamaDecoderLayer overrides to use drafter's quantization config
  • Uses existing VllmConfig.get_quantization_config() infrastructure

…ance

Eagle3 drafters were incorrectly inheriting the verifier's quantization
configuration instead of using their own, causing KeyError when loading
unquantized drafter weights with quantized verifiers.

This implements a clean inheritance pattern where:
- Base LlamaDecoderLayer has configurable get_quant_config() method
- Eagle3 LlamaDecoderLayer overrides to use drafter's quantization config
- Uses existing VllmConfig._get_quantization_config() infrastructure

Fixes speculative decoding with quantized verifier + unquantized drafter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: rtuli@redhat.com

Signed-off-by: Rahul Tuli <rtuli@redhat.com>
@rahul-tuli
Copy link
Member Author

Landed on vllm main!

@rahul-tuli rahul-tuli closed this Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant