Motivation.
Elastic Speculation, an adaptive control layer for EAGLE speculative decoding that delivers double-digit latency improvements over fixed-length speculation while reducing KV-cache DRAM traffic. refer to: https://iluvatarlabs.com/blog/2025/11/elastic-speculation/#confidence-based-early-exit-cutting-speculative-kv-writes
Proposed Change.
Two independent features:
Adaptive Draft Length: Dynamically adjusts speculation depth based on acceptance rates
Confidence-Based Early Exit: Gates KV writes for low-confidence draft tokens
Feedback Period.
No response
CC List.
No response
Any Other Things.
No response