You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/features/speculative_decoding.md
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,6 +18,13 @@ This project implements an efficient **Speculative Decoding** inference framewor
18
18
- ⏳ Coming Soon: Support Chunk-prefill
19
19
- ⏳ Coming Soon: Multi-layer MTP Layer
20
20
21
+
-**Decoding with Hybrid MTP and Ngram Methods(Hybrid-MTP-with-Ngram)**
22
+
23
+
- Overview: A hybrid method combining MTP and Ngram. First, MTP generates N draft tokens, then Ngram matching is used to supplement additional draft tokens.
24
+
25
+
- Use Cases: Suitable when higher draft token coverage is required, leveraging both MTP’s generation capability and the efficiency of Ngram matching.
This method uses an n-gram sliding window to match the prompt and generated tokens to predict draft tokens. It is particularly effective in scenarios with high input-output overlap (e.g., code completion, document search).
0 commit comments