[RFC]: Implement Elastic Speculation: Adaptive Draft Length + Confidence-Based Early Exit

### Motivation.

Elastic Speculation, an adaptive control layer for EAGLE speculative decoding that delivers double-digit latency improvements over fixed-length speculation while reducing KV-cache DRAM traffic.  refer to: https://iluvatarlabs.com/blog/2025/11/elastic-speculation/#confidence-based-early-exit-cutting-speculative-kv-writes 





### Proposed Change.

Two independent features:

Adaptive Draft Length: Dynamically adjusts speculation depth based on acceptance rates
Confidence-Based Early Exit: Gates KV writes for low-confidence draft tokens

### Feedback Period.

_No response_

### CC List.

_No response_

### Any Other Things.

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC]: Implement Elastic Speculation: Adaptive Draft Length + Confidence-Based Early Exit #4203

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: Implement Elastic Speculation: Adaptive Draft Length + Confidence-Based Early Exit #4203

Description

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions