You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+66-1Lines changed: 66 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,11 +14,18 @@ The InstructLab Training library is an optimized model instruction-tuning librar
14
14
To simplify the process of fine-tuning models with the [LAB
15
15
method](https://arxiv.org/abs/2403.01081), or for general use, this library provides a simple pythonic training interface.
16
16
17
+
### Reasoning Content Support
18
+
19
+
The library now supports reasoning traces through the `reasoning_content` field in message samples. This enables training models that can handle both regular content and structured reasoning traces, making it ideal for training reasoning-capable models that can separate their thinking process from their final output.
-[Learning about the training arguments](#learning-about-training-arguments)
23
30
-[`TrainingArgs`](#trainingargs)
24
31
-[`DeepSpeedOptions`](#deepspeedoptions)
@@ -80,6 +87,64 @@ You can then define various training arguments. They will serve as the parameter
80
87
-[Learning about the training argument](#learning-about-training-arguments)
81
88
-[Example training run with arguments](#example-training-run-with-arguments)
82
89
90
+
## Data format
91
+
92
+
The library expects training data in the messages format, where each sample contains a list of messages with different roles (user, assistant, system, etc.). Each message should have at minimum:
93
+
94
+
-`role`: The role of the message sender (e.g., "user", "assistant", "system")
95
+
-`content`: The main content of the message
96
+
97
+
### Reasoning content support
98
+
99
+
The library now supports an optional `reasoning_content` field in addition to the standard `content` field. This enables training models with structured reasoning traces. The `reasoning_content` field is particularly useful for:
100
+
101
+
- Training reasoning-capable models that can separate their thinking process from their output
102
+
- Supporting models that need to generate internal reasoning traces
103
+
- Enabling step-by-step reasoning in model responses
104
+
105
+
**Example message structure with reasoning content:**
106
+
```json
107
+
{
108
+
"messages": [
109
+
{
110
+
"role": "user",
111
+
"content": "What is 15 * 23?"
112
+
},
113
+
{
114
+
"role": "assistant",
115
+
"reasoning_content": "I need to multiply 15 by 23. Let me break this down: 15 * 23 = 15 * (20 + 3) = 15 * 20 + 15 * 3 = 300 + 45 = 345",
116
+
"content": "15 * 23 = 345"
117
+
}
118
+
]
119
+
}
120
+
```
121
+
122
+
**Standard message structure:**
123
+
```json
124
+
{
125
+
"messages": [
126
+
{
127
+
"role": "user",
128
+
"content": "Hello! How are you?"
129
+
},
130
+
{
131
+
"role": "assistant",
132
+
"content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
133
+
}
134
+
]
135
+
}
136
+
```
137
+
138
+
Both `content` and `reasoning_content` fields are processed during training according to the unmasking rules specified by the unmask_roles parameter. When a message role is included in unmask_roles, both fields (if present) will be unmasked for training.
139
+
140
+
## Documentation
141
+
142
+
For detailed information about specific features:
143
+
144
+
-**[Reasoning Content Support](docs/reasoning_content.md)**: Comprehensive guide to using the `reasoning_content` field for training reasoning-capable models
145
+
-**[CI Documentation](docs/ci.md)**: Information about continuous integration processes
146
+
-**[Logging Documentation](docs/logging.md)**: Guide to logging configuration and usage
147
+
83
148
## Learning about training arguments
84
149
85
150
The `TrainingArgs` class provides most of the customization options
@@ -378,4 +443,4 @@ Below is a list of custom environment variables users can set in the training li
378
443
379
444
## Developer Certificate of Origin
380
445
381
-
When you make a contribution to InstructLab training, you implicitly agree to the Developer Certificate of Origin terms as set in `DCO.txt` at the root of this repository.
446
+
When you make a contribution to InstructLab training, you implicitly agree to the Developer Certificate of Origin terms as set in `DCO.txt` at the root of this repository.
0 commit comments