Skip to content

Commit 79c2f2e

Browse files
committed
enable the training repo to handle reasoning traces within reasoning_content fields
Signed-off-by: Oleg S <97077423+RobotSail@users.noreply.github.com>
1 parent a479f0b commit 79c2f2e

File tree

4 files changed

+624
-8
lines changed

4 files changed

+624
-8
lines changed

README.md

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,18 @@ The InstructLab Training library is an optimized model instruction-tuning librar
1414
To simplify the process of fine-tuning models with the [LAB
1515
method](https://arxiv.org/abs/2403.01081), or for general use, this library provides a simple pythonic training interface.
1616

17+
### Reasoning Content Support
18+
19+
The library now supports reasoning traces through the `reasoning_content` field in message samples. This enables training models that can handle both regular content and structured reasoning traces, making it ideal for training reasoning-capable models that can separate their thinking process from their final output.
20+
1721
## Usage and Guidance Sections
1822

1923
- [Installing](#installing-the-library)
2024
- [Additional Nvidia packages](#additional-nvidia-packages)
2125
- [Using the library](#using-the-library)
26+
- [Data format](#data-format)
27+
- [Reasoning content support](#reasoning-content-support-1)
28+
- [Documentation](#documentation)
2229
- [Learning about the training arguments](#learning-about-training-arguments)
2330
- [`TrainingArgs`](#trainingargs)
2431
- [`DeepSpeedOptions`](#deepspeedoptions)
@@ -80,6 +87,64 @@ You can then define various training arguments. They will serve as the parameter
8087
- [Learning about the training argument](#learning-about-training-arguments)
8188
- [Example training run with arguments](#example-training-run-with-arguments)
8289

90+
## Data format
91+
92+
The library expects training data in the messages format, where each sample contains a list of messages with different roles (user, assistant, system, etc.). Each message should have at minimum:
93+
94+
- `role`: The role of the message sender (e.g., "user", "assistant", "system")
95+
- `content`: The main content of the message
96+
97+
### Reasoning content support
98+
99+
The library now supports an optional `reasoning_content` field in addition to the standard `content` field. This enables training models with structured reasoning traces. The `reasoning_content` field is particularly useful for:
100+
101+
- Training reasoning-capable models that can separate their thinking process from their output
102+
- Supporting models that need to generate internal reasoning traces
103+
- Enabling step-by-step reasoning in model responses
104+
105+
**Example message structure with reasoning content:**
106+
```json
107+
{
108+
"messages": [
109+
{
110+
"role": "user",
111+
"content": "What is 15 * 23?"
112+
},
113+
{
114+
"role": "assistant",
115+
"reasoning_content": "I need to multiply 15 by 23. Let me break this down: 15 * 23 = 15 * (20 + 3) = 15 * 20 + 15 * 3 = 300 + 45 = 345",
116+
"content": "15 * 23 = 345"
117+
}
118+
]
119+
}
120+
```
121+
122+
**Standard message structure:**
123+
```json
124+
{
125+
"messages": [
126+
{
127+
"role": "user",
128+
"content": "Hello! How are you?"
129+
},
130+
{
131+
"role": "assistant",
132+
"content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
133+
}
134+
]
135+
}
136+
```
137+
138+
Both `content` and `reasoning_content` fields are processed during training according to the unmasking rules specified by the unmask_roles parameter. When a message role is included in unmask_roles, both fields (if present) will be unmasked for training.
139+
140+
## Documentation
141+
142+
For detailed information about specific features:
143+
144+
- **[Reasoning Content Support](docs/reasoning_content.md)**: Comprehensive guide to using the `reasoning_content` field for training reasoning-capable models
145+
- **[CI Documentation](docs/ci.md)**: Information about continuous integration processes
146+
- **[Logging Documentation](docs/logging.md)**: Guide to logging configuration and usage
147+
83148
## Learning about training arguments
84149

85150
The `TrainingArgs` class provides most of the customization options
@@ -378,4 +443,4 @@ Below is a list of custom environment variables users can set in the training li
378443

379444
## Developer Certificate of Origin
380445

381-
When you make a contribution to InstructLab training, you implicitly agree to the Developer Certificate of Origin terms as set in `DCO.txt` at the root of this repository.
446+
When you make a contribution to InstructLab training, you implicitly agree to the Developer Certificate of Origin terms as set in `DCO.txt` at the root of this repository.

src/instructlab/training/data_process.py

Lines changed: 41 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -444,21 +444,57 @@ def wrap_masked_messages(
444444
"""
445445
Given a list of messages and a set of roles we want to unmask, return
446446
a list with the matching messages wrapped with `<|UNMASK_BEGIN|>` and `<|UNMASK_END|>` tokens
447-
wrapped around the `message.content` field.
447+
wrapped around both the `content` and `reasoning_content` fields (if present).
448448
449449
Args:
450450
msgs (List[Message]): List of messages we want to wrap with unmask tokens.
451451
unmask_roles (List[str]): The roles whose messages we should wrap.
452452
453453
Returns:
454454
List[Message]: The resultant list with all appropriate messages wrapped.
455+
456+
Note:
457+
Both `content` and `reasoning_content` fields are processed if present in a message.
458+
The `reasoning_content` field is optional and enables support for structured reasoning traces.
455459
"""
456460
new_msgs: t.List[Message] = []
457461
for msg in msgs:
458-
content = msg["content"]
459-
if msg["role"] in unmask_roles:
460-
content = UNMASK_BEGIN_TOKEN + content + UNMASK_END_TOKEN
461-
new_msgs.append({"role": msg["role"], "content": content})
462+
if msg["role"] not in unmask_roles:
463+
# do nothing
464+
new_msgs += [msg]
465+
continue
466+
467+
# here, we need to be on the lookout for both string and non-string
468+
# entries (e.g. other content types, or pure reasoning traces)
469+
interesting_fields = ["content", "reasoning_content"]
470+
new_msg = {k: v for k, v in msg.items() if k not in interesting_fields}
471+
472+
# what's left to add then is content or reasoning_content
473+
content = msg.get("content", None)
474+
reasoning_content = msg.get("reasoning_content", None)
475+
476+
# we handle these conditionally since these may become optional fields in the future.
477+
if content:
478+
if type(content) is not str:
479+
raise ValueError(
480+
"Error: unmasking non-string data types is currently unsupported. "
481+
)
482+
new_msg["content"] = UNMASK_BEGIN_TOKEN + content + UNMASK_END_TOKEN
483+
484+
if reasoning_content:
485+
if type(reasoning_content) is not str:
486+
raise ValueError(
487+
"Error: received an entry for `reasoning_content` which was not a string. "
488+
"Non-string datatypes for this field are currently unsupported, if this is intentional please raise an issue."
489+
)
490+
491+
new_msg["reasoning_content"] = (
492+
UNMASK_BEGIN_TOKEN + reasoning_content + UNMASK_END_TOKEN
493+
)
494+
495+
# finally we save this
496+
new_msgs += [new_msg]
497+
462498
return new_msgs
463499

464500

src/instructlab/training/type_definitions.py

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,33 @@
1010
# Standard
1111
import typing as t
1212

13+
# For Python 3.8+ compatibility
14+
try:
15+
from typing import Required, NotRequired
16+
except ImportError:
17+
try:
18+
from typing_extensions import Required, NotRequired
19+
except ImportError:
20+
# Fallback for older Python versions
21+
Required = t.Annotated
22+
NotRequired = t.Annotated
23+
1324

1425
class Message(t.TypedDict):
1526
"""
1627
Format of a single message sample.
28+
29+
Fields:
30+
content: The main content of the message.
31+
role: The role of the message sender (e.g., "user", "assistant", "system").
32+
reasoning_content: Optional reasoning trace or thinking process associated with the message.
33+
This field is particularly useful for training reasoning-capable models
34+
that can separate their thinking process from their final output.
1735
"""
1836

19-
content: str
20-
role: str
37+
content: Required[str]
38+
role: Required[str]
39+
reasoning_content: NotRequired[str]
2140

2241

2342
class ProcessedMessagesData(t.TypedDict):

0 commit comments

Comments
 (0)