You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+77Lines changed: 77 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -184,6 +184,83 @@ For the [granite model above](https://huggingface.co/ibm-granite/granite-3.0-8b-
184
184
185
185
The code internally uses [`DataCollatorForCompletionOnlyLM`](https://github.com/huggingface/trl/blob/main/trl/trainer/utils.py#L93) to perform masking of text ensuring model learns only on the `assistant` responses for both single and multi turn chat.
186
186
187
+
#### Aligning dataset formats
188
+
In some cases the chat template might not be aligned with the data format of the dataset. For example, consider the following data sample and suppose we want to use the list of contents associated with the `messages` key from the data sample for our multi-turn training job!
189
+
190
+
```
191
+
{
192
+
"messages": [
193
+
{"content": "You are an AI...", "role": "system"},
194
+
{"content": "Look up a word...", "role": "user"},
195
+
{"content": "A word that rhymes is 'mist'", "role": "assistant"}
196
+
],
197
+
"group": "lab_extension",
198
+
"dataset": "base/full-extension",
199
+
"metadata": "{\"num_turns\": 2}"
200
+
}
201
+
```
202
+
Different Chat templates support different data formats and the chat template might not always align with the data format of the dataset!
203
+
204
+
Here is a example of chat template that iterates over the nested data sample by addressing the "messages" key in `for message in messages['messages']` :
{% if loop.last and add_generation_prompt %}{{ '<|assistant|>' }}\
212
+
{% endif %}\
213
+
{% endfor %}
214
+
```
215
+
While the above template might be suitable for certain data formats, not all chat templates access the nested contents in a data sample.
216
+
217
+
In the following example notice the `for message in messages` line which does not access any nested contents in the data and expects the nested content to be passed directly to the chat template!
When working with multi-turn datasets, it's often necessary to extract specific fields from the data depending on the format. For example, in many multi-turn datasets, conversations may be stored under a dedicated key (e.g., `conversations`, `messages`, etc), and you may only need the content of that key for processing.
239
+
240
+
```
241
+
{
242
+
"conversations": [
243
+
{"content": "You are an AI...", "role": "system"},
244
+
{"content": "Look up a word...", "role": "user"},
245
+
{"content": "A word that rhymes is 'mist'", "role": "assistant"}
246
+
],
247
+
"group": "lab_extension",
248
+
"dataset": "base/full-extension",
249
+
"metadata": "{\"num_turns\": 2}"
250
+
}
251
+
252
+
```
253
+
To extract and use the conversations field, pass the following flag when running:
254
+
```
255
+
--dataset_conversation_field "conversations"
256
+
```
257
+
258
+
*Note:* For most cases, users using `Granite3.1+ Instruct` series models which already contain chat template should look to pass `--dataset_conversation_field "messages"` while using multi-turn data on the commandline or use `conversations_column` argument in the [data handler](https://github.com/foundation-model-stack/fms-hf-tuning/blob/30ceecc63f3e2bf3aadba2dfc3336b62187c240f/tests/artifacts/predefined_data_configs/mt_data_granite_3_1B_tokenize_and_mask_handler.yaml#L63) which processes chat template
259
+
260
+
We recommend inspecting the data and chat template to decide if you need to pass this flag.
261
+
262
+
### Guidelines
263
+
187
264
Depending on various scenarios users might need to decide on how to use chat template with their data or which chat template to use for their use case.
188
265
189
266
Following are the Guidelines from us in a flow chart :
{"conversations": [{"content": "You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.", "role": "system"}, {"content": "Using the word \"grace\", come up with a word that rhymes and has the same number of syllables\n<nopace>", "role": "user"}, {"content": "Certainly! Here's a word that rhymes with \"grace\" and has the same number of syllables:\n1\\. Space", "role": "assistant"}, {"content": "Using the word \"grace\", come up with a word that rhymes and has the same number of syllables\n<nopace>", "role": "user"}, {"content": "Certainly! Here's a word that rhymes with \"grace\" and has the same number of syllables:\n1\\. Space", "role": "assistant"}], "group": "lab_extension", "dataset": "base/full-extension", "metadata": "{\"num_turns\": 2}"}
2
+
{"conversations": [{"content": "You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.", "role": "system"}, {"content": "Tell me one word that rhymes and has the opposite meaning of \"open\".", "role": "user"}, {"content": "Sure! Here's a word that rhymes with \"open\" and has the opposite meaning:\n1\\. Shut", "role": "assistant"}, {"content": "Tell me one word that rhymes and has the opposite meaning of \"open\".", "role": "user"}, {"content": "Sure! Here's a word that rhymes with \"open\" and has the opposite meaning:\n1\\. Shut", "role": "assistant"}], "group": "lab_extension", "dataset": "base/full-extension", "metadata": "{\"num_turns\": 2}"}
3
+
{"conversations": [{"content": "You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.", "role": "system"}, {"content": "For the word \"dream\", give an example of a word that rhymes with it and its synonym.", "role": "user"}, {"content": "Here's an example for \"dream\" that includes a word that rhymes with it and a synonym:\n1\\. Word that rhymes with \"dream\": \"beam\"\nSynonym: \"ideal\"", "role": "assistant"}, {"content": "For the word \"dream\", give an example of a word that rhymes with it and its synonym.", "role": "user"}, {"content": "Here's an example for \"dream\" that includes a word that rhymes with it and a synonym:\n1\\. Word that rhymes with \"dream\": \"beam\"\nSynonym: \"ideal\"", "role": "assistant"}], "group": "lab_extension", "dataset": "base/full-extension", "metadata": "{\"num_turns\": 2}"}
4
+
{"conversations": [{"content": "You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.", "role": "system"}, {"content": "Can you think of a word that rhymes with tree?", "role": "user"}, {"content": "Of course, I can help! Here's one word that rhymes with \"tree\":\n1\\. Three", "role": "assistant"}, {"content": "Can you think of a word that rhymes with tree?", "role": "user"}, {"content": "Of course, I can help! Here's one word that rhymes with \"tree\":\n1\\. Three", "role": "assistant"}], "group": "lab_extension", "dataset": "base/full-extension", "metadata": "{\"num_turns\": 2}"}
5
+
{"conversations": [{"content": "You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.", "role": "system"}, {"content": "Look up a word that rhymes with exist", "role": "user"}, {"content": "I found a word that rhymes with \"exist\":\n1\\. Mist", "role": "assistant"}, {"content": "Look up a word that rhymes with exist", "role": "user"}, {"content": "I found a word that rhymes with \"exist\":\n1\\. Mist", "role": "assistant"}], "group": "lab_extension", "dataset": "base/full-extension", "metadata": "{\"num_turns\": 2}"}
6
+
{"conversations": [{"content": "You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.", "role": "system"}, {"content": "Provide five words that rhyme with \"green\".", "role": "user"}, {"content": "Here are five words that rhyme with \"green\":\n1\\. Bean\n2\\. Clean\n3\\. Keen\n4\\. Mean\n5\\. Seen", "role": "assistant"}, {"content": "Provide five words that rhyme with \"green\".", "role": "user"}, {"content": "Here are five words that rhyme with \"green\":\n1\\. Bean\n2\\. Clean\n3\\. Keen\n4\\. Mean\n5\\. Seen", "role": "assistant"}], "group": "lab_extension", "dataset": "base/full-extension", "metadata": "{\"num_turns\": 2}"}
7
+
{"conversations": [{"content": "You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.", "role": "system"}, {"content": "What are three words that rhyme with \"peace\"?", "role": "user"}, {"content": "Here are three words that rhyme with \"peace\":\n1\\. Breeze\n2\\. Pieces\n3\\. Increase", "role": "assistant"}, {"content": "What are three words that rhyme with \"peace\"?", "role": "user"}, {"content": "Here are three words that rhyme with \"peace\":\n1\\. Breeze\n2\\. Pieces\n3\\. Increase", "role": "assistant"}], "group": "lab_extension", "dataset": "base/full-extension", "metadata": "{\"num_turns\": 2}"}
8
+
{"conversations": [{"content": "You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.", "role": "system"}, {"content": "Can you name a word that rhymes with high and starts with the letter \"t\"?", "role": "user"}, {"content": "Certainly! Here's a word that rhymes with \"high\" and starts with \"t\":\n1\\. Sky", "role": "assistant"}, {"content": "Can you name a word that rhymes with high and starts with the letter \"t\"?", "role": "user"}, {"content": "Certainly! Here's a word that rhymes with \"high\" and starts with \"t\":\n1\\. Sky", "role": "assistant"}], "group": "lab_extension", "dataset": "base/full-extension", "metadata": "{\"num_turns\": 2}"}
9
+
{"conversations": [{"content": "You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.", "role": "system"}, {"content": "Identify the word that rhymes with \"love\" and has the same vowel sounds in the same order as the word \"busy\".", "role": "user"}, {"content": "The word that rhymes with \"love\" and has the same vowel sounds in the same order as the word \"busy\" is:\n1\\. \"Shoe\".", "role": "assistant"}, {"content": "Identify the word that rhymes with \"love\" and has the same vowel sounds in the same order as the word \"busy\".", "role": "user"}, {"content": "The word that rhymes with \"love\" and has the same vowel sounds in the same order as the word \"busy\" is:\n1\\. \"Shoe\".", "role": "assistant"}], "group": "lab_extension", "dataset": "base/full-extension", "metadata": "{\"num_turns\": 2}"}
10
+
{"conversations": [{"content": "You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.", "role": "system"}, {"content": "Provide two rhyming words for the word \"love\".\n<no above>", "role": "user"}, {"content": "Here are two rhyming words for \"love\":\n1\\. Dove\n2\\. Glove", "role": "assistant"}, {"content": "Provide two rhyming words for the word \"love\".\n<no above>", "role": "user"}, {"content": "Here are two rhyming words for \"love\":\n1\\. Dove\n2\\. Glove", "role": "assistant"}], "group": "lab_extension", "dataset": "base/full-extension", "metadata": "{\"num_turns\": 2}"}
0 commit comments