We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 9fde251 commit 89ec06cCopy full SHA for 89ec06c
docs/source/models/spec_decode.rst
@@ -17,6 +17,7 @@ Speculating with a draft model
17
The following code configures vLLM to use speculative decoding with a draft model, speculating 5 tokens at a time.
18
19
.. code-block:: python
20
+
21
from vllm import LLM, SamplingParams
22
23
prompts = [
@@ -45,6 +46,7 @@ The following code configures vLLM to use speculative decoding where proposals a
45
46
matching n-grams in the prompt. For more information read `this thread. <https://x.com/joao_gante/status/1747322413006643259>`_
47
48
49
50
51
52
0 commit comments