enable the training repo to handle reasoning traces within reasoning_content fields

RobotSail · RobotSail · commit 79c2f2ebee37 · 2025-06-24T00:03:59.000-04:00
Signed-off-by: Oleg S &lt;97077423+RobotSail@users.noreply.github.com&gt;
diff --git a/README.md b/README.md
@@ -14,11 +14,18 @@ The InstructLab Training library is an optimized model instruction-tuning librar
 To simplify the process of fine-tuning models with the [LAB
 method](https://arxiv.org/abs/2403.01081), or for general use, this library provides a simple pythonic training interface.
 
+### Reasoning Content Support
+
+The library now supports reasoning traces through the `reasoning_content` field in message samples. This enables training models that can handle both regular content and structured reasoning traces, making it ideal for training reasoning-capable models that can separate their thinking process from their final output.
+
 ## Usage and Guidance Sections
 
 - [Installing](#installing-the-library)
   - [Additional Nvidia packages](#additional-nvidia-packages)
 - [Using the library](#using-the-library)
+- [Data format](#data-format)
+  - [Reasoning content support](#reasoning-content-support-1)
+- [Documentation](#documentation)
 - [Learning about the training arguments](#learning-about-training-arguments)
   - [`TrainingArgs`](#trainingargs)
   - [`DeepSpeedOptions`](#deepspeedoptions)
@@ -80,6 +87,64 @@ You can then define various training arguments. They will serve as the parameter
 - [Learning about the training argument](#learning-about-training-arguments)
 - [Example training run with arguments](#example-training-run-with-arguments)
 
+## Data format
+
+The library expects training data in the messages format, where each sample contains a list of messages with different roles (user, assistant, system, etc.). Each message should have at minimum:
+
+- `role`: The role of the message sender (e.g., "user", "assistant", "system")
+- `content`: The main content of the message
+
+### Reasoning content support
+
+The library now supports an optional `reasoning_content` field in addition to the standard `content` field. This enables training models with structured reasoning traces. The `reasoning_content` field is particularly useful for:
+
+- Training reasoning-capable models that can separate their thinking process from their output
+- Supporting models that need to generate internal reasoning traces
+- Enabling step-by-step reasoning in model responses
+
+**Example message structure with reasoning content:**
+```json
+{
+  "messages": [
+    {
+      "role": "user",
+      "content": "What is 15 * 23?"
+    },
+    {
+      "role": "assistant",
+      "reasoning_content": "I need to multiply 15 by 23. Let me break this down: 15 * 23 = 15 * (20 + 3) = 15 * 20 + 15 * 3 = 300 + 45 = 345",
+      "content": "15 * 23 = 345"
+    }
+  ]
+}
+```
+
+**Standard message structure:**
+```json
+{
+  "messages": [
+    {
+      "role": "user", 
+      "content": "Hello! How are you?"
+    },
+    {
+      "role": "assistant",
+      "content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
+    }
+  ]
+}
+```
+
+Both `content` and `reasoning_content` fields are processed during training according to the unmasking rules specified by the unmask_roles parameter. When a message role is included in unmask_roles, both fields (if present) will be unmasked for training.
+
+## Documentation
+
+For detailed information about specific features:
+
+- **[Reasoning Content Support](docs/reasoning_content.md)**: Comprehensive guide to using the `reasoning_content` field for training reasoning-capable models
+- **[CI Documentation](docs/ci.md)**: Information about continuous integration processes
+- **[Logging Documentation](docs/logging.md)**: Guide to logging configuration and usage
+
 ## Learning about training arguments
 
 The `TrainingArgs` class provides most of the customization options
@@ -378,4 +443,4 @@ Below is a list of custom environment variables users can set in the training li
 
 ## Developer Certificate of Origin
 
-When you make a contribution to InstructLab training, you implicitly agree to the Developer Certificate of Origin terms as set in `DCO.txt` at the root of this repository.
+When you make a contribution to InstructLab training, you implicitly agree to the Developer Certificate of Origin terms as set in `DCO.txt` at the root of this repository.
diff --git a/src/instructlab/training/data_process.py b/src/instructlab/training/data_process.py
@@ -444,21 +444,57 @@ def wrap_masked_messages(
     """
     Given a list of messages and a set of roles we want to unmask, return
     a list with the matching messages wrapped with `<|UNMASK_BEGIN|>` and `<|UNMASK_END|>` tokens
-    wrapped around the `message.content` field.
+    wrapped around both the `content` and `reasoning_content` fields (if present).
 
     Args:
         msgs (List[Message]): List of messages we want to wrap with unmask tokens.
         unmask_roles (List[str]): The roles whose messages we should wrap.
 
     Returns:
         List[Message]: The resultant list with all appropriate messages wrapped.
+
+    Note:
+        Both `content` and `reasoning_content` fields are processed if present in a message.
+        The `reasoning_content` field is optional and enables support for structured reasoning traces.
     """
     new_msgs: t.List[Message] = []
     for msg in msgs:
-        content = msg["content"]
-        if msg["role"] in unmask_roles:
-            content = UNMASK_BEGIN_TOKEN + content + UNMASK_END_TOKEN
-        new_msgs.append({"role": msg["role"], "content": content})
+        if msg["role"] not in unmask_roles:
+            # do nothing
+            new_msgs += [msg]
+            continue
+
+        # here, we need to be on the lookout for both string and non-string
+        # entries (e.g. other content types, or pure reasoning traces)
+        interesting_fields = ["content", "reasoning_content"]
+        new_msg = {k: v for k, v in msg.items() if k not in interesting_fields}
+
+        # what's left to add then is content or reasoning_content
+        content = msg.get("content", None)
+        reasoning_content = msg.get("reasoning_content", None)
+
+        # we handle these conditionally since these may become optional fields in the future.
+        if content:
+            if type(content) is not str:
+                raise ValueError(
+                    "Error: unmasking non-string data types is currently unsupported. "
+                )
+            new_msg["content"] = UNMASK_BEGIN_TOKEN + content + UNMASK_END_TOKEN
+
+        if reasoning_content:
+            if type(reasoning_content) is not str:
+                raise ValueError(
+                    "Error: received an entry for `reasoning_content` which was not a string. "
+                    "Non-string datatypes for this field are currently unsupported, if this is intentional please raise an issue."
+                )
+
+            new_msg["reasoning_content"] = (
+                UNMASK_BEGIN_TOKEN + reasoning_content + UNMASK_END_TOKEN
+            )
+
+        # finally we save this
+        new_msgs += [new_msg]
+
     return new_msgs
 
 
diff --git a/src/instructlab/training/type_definitions.py b/src/instructlab/training/type_definitions.py
@@ -10,14 +10,33 @@
 # Standard
 import typing as t
 
+# For Python 3.8+ compatibility
+try:
+    from typing import Required, NotRequired
+except ImportError:
+    try:
+        from typing_extensions import Required, NotRequired
+    except ImportError:
+        # Fallback for older Python versions
+        Required = t.Annotated
+        NotRequired = t.Annotated
+
 
 class Message(t.TypedDict):
     """
     Format of a single message sample.
+
+    Fields:
+        content: The main content of the message.
+        role: The role of the message sender (e.g., "user", "assistant", "system").
+        reasoning_content: Optional reasoning trace or thinking process associated with the message.
+                          This field is particularly useful for training reasoning-capable models
+                          that can separate their thinking process from their final output.
     """
 
-    content: str
-    role: str
+    content: Required[str]
+    role: Required[str]
+    reasoning_content: NotRequired[str]
 
 
 class ProcessedMessagesData(t.TypedDict):
diff --git a/tests/unit/test_data_process.py b/tests/unit/test_data_process.py