You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -130,7 +132,7 @@ For online training checkpoints, we can run in-framework evaluation on MT-bench:
130
132
python ar_validate.py --model_path $ONLINE_CKPT
131
133
```
132
134
133
-
Offline checkpoints does not support this evaluation due to missing of base model modules. To evaluate offline checkpoint, please export first and evaluate with serving frameworks.
135
+
**Note**: In-framework evaluation is supported only for online training. For offline training checkpoints, please export the model and evaluate it using serving frameworks.
134
136
135
137
## Export
136
138
@@ -183,6 +185,28 @@ See more details on deployment of quantized model to TRTLLM [here](../llm_ptq/RE
183
185
184
186
## Advanced Usage
185
187
188
+
### Other Datasets
189
+
190
+
In addition to `daring-anteater`, we provide scripts for adding several other commonly used datasets in `prepare_input_conversations`:
191
+
192
+
```text
193
+
prepare_input_conversations/
194
+
├── add_daring_anteater.py
195
+
├── add_mtbench.py
196
+
├── add_sharegpt.py
197
+
├── add_ultrachat.py
198
+
└── example_make_prompt_dataset.sh
199
+
```
200
+
201
+
To use your own datasets, please preprocess your data into a `.jsonl` file with each line in the format:
202
+
203
+
```json
204
+
{
205
+
"conversation_id": <unique id>,
206
+
"conversations": [{"role":<user or assistant>, "content":<content>}]
207
+
}
208
+
```
209
+
186
210
### Data Synthesis
187
211
188
212
To achieve higher acceptance rates during speculative decoding, it is beneficial to use conversations generated by the base model as training data. This ensures that the draft model's output distribution closely aligns with that of the base model.
@@ -199,7 +223,7 @@ Note: Add `--quantization=modelopt` flag for quantized models.
199
223
Then, we generate conversations with the base model using prompts from Daring-Anteater:
To add a system prompt, use the `--system_prompt <system_prompt_text>` argument.
@@ -211,7 +235,7 @@ For large scale data generation, please see [SLURM prepare data](SLURM_prepare_d
211
235
We can optionally use smaller vocab size for the draft model for faster training and inference. E.g. Llama3.2-1B has a vocab size of 128256. In this example, we construct a draft vocab mapping of size 32k by finding the most commonly appeared vocabs in our training set:
This will produce a `d2t.pt` file in `save_dir`, which is the mapping from draft token to target token. During inference, draft tokens can be mapped back to target tokens by `target_token = draft_token + d2t[draft_token]`.
0 commit comments