Skip to content

Commit 8821791

Browse files
docs: add instructions on how to correctly specify the chat template (#549)
Signed-off-by: Harikrishnan Balagopal <[email protected]>
1 parent dc77c63 commit 8821791

File tree

1 file changed

+51
-0
lines changed

1 file changed

+51
-0
lines changed

docs/advanced-data-preprocessing.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -277,6 +277,57 @@ Note: Streaming datasets or use of `IterableDatasets` is not compatible with the
277277
278278
If the dataset size is known to the user, `max_steps` can be calculated as the total number of samples divided by the batch size.
279279
280+
### How users can specify the chat template
281+
282+
In the `data_config.yaml` file:
283+
284+
**✅ USE:**
285+
286+
```yaml
287+
dataprocessor:
288+
chat_template: "my single line chat template"
289+
```
290+
291+
The recommended way is to copy paste the chat template from the official checkpoint https://huggingface.co/ibm-granite/granite-3.1-8b-instruct/blob/main/tokenizer_config.json#L188
292+
293+
294+
**✅ (Optional) USE:**
295+
296+
```yaml
297+
dataprocessor:
298+
chat_template: |
299+
my multi-line chat template
300+
```
301+
302+
Specifying a multi-line chat template will requires some manual effort on the user's part to ensure new lines are specified correctly.
303+
This approach is mainly useful for readability, especially if you are customizing the chat template.
304+
305+
Example:
306+
307+
```yaml
308+
dataprocessor:
309+
chat_template: |
310+
{%- if messages[0]['role'] == 'system' %}
311+
{%- set system_message = messages[0]['content'] %}
312+
{%- set loop_messages = messages[1:] %}
313+
{%- else %}
314+
{%- set system_message = "Knowledge Cutoff Date: April 2024.
315+
Today's Date: " + strftime_now('%B %d, %Y') + ".
316+
You are Granite, developed by IBM." %}
317+
{%- if tools and documents %}
318+
................
319+
```
320+
321+
**❌ DO NOT USE:**
322+
323+
```yaml
324+
dataprocessor:
325+
chat_template: |
326+
my single line chat template
327+
```
328+
329+
This can add extra backslashes to your chat template causing it to become invalid.
330+
280331
### Example data configs.
281332
282333
We provide some example data configs [here](../tests/artifacts/predefined_data_configs/)

0 commit comments

Comments
 (0)