You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -51,18 +56,23 @@ TRT-LLM supports a modified version of the algorithm presented in the paper: tre
51
56
The following draft model checkpoints can be used for EAGLE 3:
52
57
* Llama 3 variants: [use the checkpoints from the authors of the original EAGLE 3 paper](https://huggingface.co/yuhuili).
53
58
* Llama 4 Maverick: [use the checkpoint from the NVIDIA HuggingFace repository](https://huggingface.co/nvidia/Llama-4-Maverick-17B-128E-Eagle3).
59
+
* Other models, including `gpt-oss-120b` and `Qwen3`: check out the [Speculative Decoding Modules](https://huggingface.co/collections/nvidia/speculative-decoding-modules) collection from NVIDIA.
54
60
55
61
```python
56
62
from tensorrt_llm.llmapi import EagleDecodingConfig
57
63
58
64
# Enable to use the faster one-model implementation for Llama 4.
@@ -137,14 +147,34 @@ Speculative decoding options must be specified via `--config config.yaml` for bo
137
147
138
148
The rest of the argument names/valid values are the same as in their corresponding configuration class described in the Quick Start section. For example, a YAML configuration could look like this:
139
149
150
+
```yaml
151
+
# Using a HuggingFace Hub model ID (auto-downloaded)
The field name `speculative_model_dir` can also be used as an alias for `speculative_config.speculative_model`. For example:
170
+
171
+
speculative_config:
172
+
decoding_type: Eagle
173
+
max_draft_len: 4
174
+
speculative_model_dir: /path/to/draft/model
175
+
```
176
+
177
+
148
178
## Developer Guide
149
179
150
180
This section describes the components of a speculative decoding algorithm. All of the interfaces are defined in [`_torch/speculative/interface.py`](https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/speculative/interface.py).
Specify the path to the Eagle3 draft model (ensure the corresponding draft model weights are prepared).
844
+
-`speculative_config.speculative_model: <HUGGINGFACE ID / LOCAL PATH>`
845
+
Specify the Eagle3 draft model either as a Huggingface model ID or a local path. You can find ready-to-use Eagle3 draft models at https://huggingface.co/collections/nvidia/speculative-decoding-modules.
846
846
847
847
Currently, there are some limitations when enabling Eagle3:
0 commit comments