Skip to content

Commit 5107611

Browse files
authored
feat: Add TraceType enum for granular trace control (#284)
1 parent 7248b9f commit 5107611

File tree

13 files changed

+233
-37
lines changed

13 files changed

+233
-37
lines changed

docs/assets/recipes/mcp_and_tooluse/basic_mcp.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ def build_config(model_alias: str, provider_name: str) -> dd.DataDesignerConfigB
132132
),
133133
system_prompt="You must call the get_fact tool before answering. Only use information from tool results.",
134134
tool_alias="basic-tools",
135-
with_trace=True,
135+
with_trace=dd.TraceType.ALL_MESSAGES,
136136
)
137137
)
138138

@@ -163,7 +163,7 @@ def build_config(model_alias: str, provider_name: str) -> dd.DataDesignerConfigB
163163
),
164164
system_prompt="You must call the add_numbers tool to perform the calculation. Report the exact result.",
165165
tool_alias="basic-tools",
166-
with_trace=True,
166+
with_trace=dd.TraceType.ALL_MESSAGES,
167167
)
168168
)
169169

docs/assets/recipes/mcp_and_tooluse/pdf_qa.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -312,7 +312,7 @@ def build_config(model_alias: str, provider_name: str) -> dd.DataDesignerConfigB
312312
),
313313
output_format=TopicList,
314314
tool_alias="doc-search",
315-
with_trace=True, # Enable trace to capture tool call history
315+
with_trace=dd.TraceType.ALL_MESSAGES, # Enable trace to capture tool call history
316316
)
317317
)
318318

@@ -341,7 +341,7 @@ def build_config(model_alias: str, provider_name: str) -> dd.DataDesignerConfigB
341341
),
342342
output_format=QAPair,
343343
tool_alias="doc-search",
344-
with_trace=True, # Enable trace to capture tool call history
344+
with_trace=dd.TraceType.ALL_MESSAGES, # Enable trace to capture tool call history
345345
)
346346
)
347347

docs/concepts/columns.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ LLM-Text columns generate natural language text: product descriptions, customer
3939
Use **Jinja2 templating** in prompts to reference other columns. Data Designer automatically manages dependencies and injects the referenced column values into the prompt.
4040

4141
!!! note "Generation Traces"
42-
LLM columns can optionally capture a full message trace in a separate `{column_name}__trace` column. Enable traces per-column via `with_trace=True` on the column config, or globally for all columns via `RunConfig(debug_override_save_all_column_traces=True)`. The trace includes the ordered message history for the final generation attempt (system/user/assistant/tool calls/tool results), and may include model reasoning fields when the provider exposes them.
42+
LLM columns can optionally capture message traces in a separate `{column_name}__trace` column. Set `with_trace` on the column config to control what's captured: `TraceType.NONE` (default, no trace), `TraceType.LAST_MESSAGE` (final assistant message only), or `TraceType.ALL_MESSAGES` (full conversation history). Override globally via `RunConfig(debug_trace_override=TraceType.ALL_MESSAGES)`. The trace includes the ordered message history for the final generation attempt (system/user/assistant/tool calls/tool results), and may include model reasoning fields when the provider exposes them.
4343

4444
!!! tip "Tool Use in LLM Columns"
4545
LLM columns can invoke external tools during generation via MCP (Model Context Protocol). Enable tools by setting `tool_alias` to reference a configured `ToolConfig`:
@@ -50,7 +50,7 @@ Use **Jinja2 templating** in prompts to reference other columns. Data Designer a
5050
model_alias="nvidia-text",
5151
prompt="Search for information and answer: {{ question }}",
5252
tool_alias="search-tools", # References a ToolConfig
53-
with_trace=True, # Capture tool call history
53+
with_trace=dd.TraceType.ALL_MESSAGES, # Capture tool call history
5454
)
5555
```
5656

@@ -162,6 +162,6 @@ You read this property for introspection but never set it—always computed from
162162

163163
### `side_effect_columns`
164164

165-
Computed property listing columns created implicitly alongside the primary column. Currently, only LLM columns produce side effects (trace columns like `{name}__trace` when `with_trace=True` is set on the column or `debug_override_save_all_column_traces` is enabled globally).
165+
Computed property listing columns created implicitly alongside the primary column. Currently, only LLM columns produce side effects (trace columns like `{name}__trace` when `with_trace` is not `TraceType.NONE` on the column or `debug_trace_override` is set globally).
166166

167167
For detailed information on each column type, refer to the [column configuration code reference](../code_reference/column_configs.md).

docs/concepts/mcp/enabling-tools.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ builder.add_column(
9090
prompt="Use the available tools to research and answer: {{ question }}",
9191
model_alias="nvidia-text",
9292
tool_alias="my-tools", # Enable tools
93-
with_trace=True, # Capture tool call history
93+
with_trace=dd.TraceType.ALL_MESSAGES, # Capture tool call history
9494
)
9595
)
9696

docs/concepts/traces.md

Lines changed: 41 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Message Traces
22

3-
Traces capture the full conversation history during LLM generation, including system prompts, user prompts, model reasoning, tool calls, tool results, and the final response. This visibility is essential for understanding model behavior, debugging generation issues, and iterating on prompts.
3+
Traces capture the conversation history during LLM generation, including system prompts, user prompts, model reasoning, tool calls, tool results, and the final response. This visibility is essential for understanding model behavior, debugging generation issues, and iterating on prompts.
44

55
Traces are also useful in certain scenarios as the target output of the workflow, e.g. producing an SFT dataset for fine-tuning tool-use capability, for instance.
66

@@ -19,39 +19,74 @@ When generating content with LLM columns, you often need to understand what happ
1919

2020
Traces provide this visibility by capturing the ordered message history for each generation, including any multi-turn conversations that occur during tool use or retry scenarios.
2121

22+
## Trace Types
23+
24+
Data Designer supports three trace modes via the `TraceType` enum:
25+
26+
| TraceType | Description |
27+
|-----------|-------------|
28+
| `TraceType.NONE` | No trace captured (default) |
29+
| `TraceType.LAST_MESSAGE` | Only the final assistant message is captured |
30+
| `TraceType.ALL_MESSAGES` | Full conversation history (system/user/assistant/tool) |
31+
2232
## Enabling Traces
2333

2434
### Per-Column (Recommended)
2535

26-
Enable `with_trace=True` on specific LLM columns:
36+
Set `with_trace` on specific LLM columns:
2737

2838
```python
2939
import data_designer.config as dd
3040

41+
# Capture full conversation history
3142
builder.add_column(
3243
dd.LLMTextColumnConfig(
3344
name="answer",
3445
prompt="Answer: {{ question }}",
3546
model_alias="nvidia-text",
36-
with_trace=True, # Enable trace for this column
47+
with_trace=dd.TraceType.ALL_MESSAGES, # Full trace
48+
)
49+
)
50+
51+
# Capture only the final assistant response
52+
builder.add_column(
53+
dd.LLMTextColumnConfig(
54+
name="summary",
55+
prompt="Summarize: {{ text }}",
56+
model_alias="nvidia-text",
57+
with_trace=dd.TraceType.LAST_MESSAGE, # Just the final response
3758
)
3859
)
3960
```
4061

4162
### Global Debug Override
4263

43-
Enable traces for ALL LLM columns (useful during development):
64+
Override trace settings for ALL LLM columns (useful during development):
4465

4566
```python
4667
import data_designer.config as dd
4768
from data_designer.interface import DataDesigner
4869

4970
data_designer = DataDesigner()
71+
72+
# Enable full traces for all columns
73+
data_designer.set_run_config(
74+
dd.RunConfig(debug_trace_override=dd.TraceType.ALL_MESSAGES)
75+
)
76+
77+
# Or capture only last messages for all columns
5078
data_designer.set_run_config(
51-
dd.RunConfig(debug_override_save_all_column_traces=True)
79+
dd.RunConfig(debug_trace_override=dd.TraceType.LAST_MESSAGE)
80+
)
81+
82+
# Disable all traces (overrides per-column settings)
83+
data_designer.set_run_config(
84+
dd.RunConfig(debug_trace_override=dd.TraceType.NONE)
5285
)
5386
```
5487

88+
When `debug_trace_override` is set (not `None`), it takes precedence over per-column `with_trace` settings.
89+
5590
## Trace Column Naming
5691

5792
When enabled, LLM columns produce an additional side-effect column:
@@ -161,4 +196,4 @@ When an assistant message includes tool calls:
161196
## See Also
162197

163198
- **[Safety and Limits](mcp/safety-and-limits.md)**: Understand turn limits and timeout behavior
164-
- **[Run Config](../code_reference/run_config.md)**: Runtime options including `debug_override_save_all_column_traces`
199+
- **[Run Config](../code_reference/run_config.md)**: Runtime options including `debug_trace_override`

packages/data-designer-config/src/data_designer/config/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@
7474
)
7575
from data_designer.config.utils.code_lang import CodeLang
7676
from data_designer.config.utils.info import InfoType
77+
from data_designer.config.utils.trace_type import TraceType
7778
from data_designer.config.validator_params import (
7879
CodeValidatorParams,
7980
LocalCallableValidatorParams,
@@ -144,6 +145,7 @@ def get_config_exports() -> list[str]:
144145
SeedDatasetColumnConfig.__name__,
145146
SubcategorySamplerParams.__name__,
146147
TimeDeltaSamplerParams.__name__,
148+
TraceType.__name__,
147149
UniformDistribution.__name__,
148150
UniformDistributionParams.__name__,
149151
UniformSamplerParams.__name__,

packages/data-designer-config/src/data_designer/config/column_configs.py

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
from data_designer.config.utils.code_lang import CodeLang
1717
from data_designer.config.utils.constants import TRACE_COLUMN_POSTFIX
1818
from data_designer.config.utils.misc import assert_valid_jinja2_template, extract_keywords_from_jinja2_template
19+
from data_designer.config.utils.trace_type import TraceType
1920
from data_designer.config.validator_params import ValidatorParamsT, ValidatorType
2021

2122

@@ -162,10 +163,12 @@ class LLMTextColumnConfig(SingleColumnConfig):
162163
tool_alias: Optional alias of the tool configuration to use for MCP tool calls.
163164
Must match a tool alias defined when initializing the DataDesignerConfigBuilder.
164165
When provided, the model may call permitted tools during generation.
165-
with_trace: If True, creates a `{column_name}__trace` column containing the full
166-
ordered message history (system/user/assistant/tool) for the generation.
167-
Can be overridden globally via `RunConfig.debug_override_save_all_column_traces`.
168-
Defaults to False.
166+
with_trace: Specifies what trace information to capture in a `{column_name}__trace`
167+
column. Options are:
168+
- `TraceType.NONE` (default): No trace is captured.
169+
- `TraceType.LAST_MESSAGE`: Only the final assistant message is captured.
170+
- `TraceType.ALL_MESSAGES`: Full conversation history (system/user/assistant/tool).
171+
Can be overridden globally via `RunConfig.debug_trace_override`.
169172
column_type: Discriminator field, always "llm-text" for this configuration type.
170173
"""
171174

@@ -174,7 +177,7 @@ class LLMTextColumnConfig(SingleColumnConfig):
174177
system_prompt: str | None = None
175178
multi_modal_context: list[ImageContext] | None = None
176179
tool_alias: str | None = None
177-
with_trace: bool = False
180+
with_trace: TraceType = TraceType.NONE
178181
column_type: Literal["llm-text"] = "llm-text"
179182

180183
@staticmethod
@@ -197,8 +200,8 @@ def required_columns(self) -> list[str]:
197200
def side_effect_columns(self) -> list[str]:
198201
"""Returns the trace column, which may be generated alongside the main column.
199202
200-
Traces are generated when `with_trace=True` on the column config or
201-
when `RunConfig.debug_override_save_all_column_traces=True` globally.
203+
Traces are generated when `with_trace` is not `TraceType.NONE` on the column config
204+
or when `RunConfig.debug_trace_override` is set globally.
202205
203206
Returns:
204207
List containing the trace column name.

packages/data-designer-config/src/data_designer/config/run_config.py

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
# SPDX-License-Identifier: Apache-2.0
33

44
from __future__ import annotations
@@ -7,6 +7,7 @@
77
from typing_extensions import Self
88

99
from data_designer.config.base import ConfigBase
10+
from data_designer.config.utils.trace_type import TraceType
1011

1112

1213
class RunConfig(ConfigBase):
@@ -33,10 +34,13 @@ class RunConfig(ConfigBase):
3334
max_conversation_correction_steps: Maximum number of correction rounds permitted within a
3435
single conversation when generation tasks call `ModelFacade.generate(...)`. Must be >= 0.
3536
Default is 0.
36-
debug_override_save_all_column_traces: If True, overrides per-column `with_trace` settings
37-
and includes `__trace` columns for ALL LLM generations, containing the full ordered
38-
message history (system/user/assistant/tool) for the final generation attempt.
39-
Useful for debugging. Default is False.
37+
debug_trace_override: If set, overrides per-column `with_trace` settings for ALL LLM
38+
generations. Options are:
39+
- `None` (default): Use per-column `with_trace` settings.
40+
- `TraceType.NONE`: Disable all traces, ignoring per-column settings.
41+
- `TraceType.LAST_MESSAGE`: Capture only the final assistant message for all columns.
42+
- `TraceType.ALL_MESSAGES`: Capture full conversation history for all columns.
43+
Useful for debugging or bulk trace collection.
4044
"""
4145

4246
disable_early_shutdown: bool = False
@@ -46,7 +50,7 @@ class RunConfig(ConfigBase):
4650
non_inference_max_parallel_workers: int = Field(default=4, ge=1)
4751
max_conversation_restarts: int = Field(default=5, ge=0)
4852
max_conversation_correction_steps: int = Field(default=0, ge=0)
49-
debug_override_save_all_column_traces: bool = False
53+
debug_trace_override: TraceType | None = None
5054

5155
@model_validator(mode="after")
5256
def normalize_shutdown_settings(self) -> Self:
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
from __future__ import annotations
5+
6+
from data_designer.config.utils.type_helpers import StrEnum
7+
8+
9+
class TraceType(StrEnum):
10+
"""Specifies the type of reasoning trace to capture for LLM columns.
11+
12+
Traces capture the conversation history during LLM generation, which is
13+
useful for debugging, analysis, and understanding model behavior.
14+
15+
Attributes:
16+
NONE: No trace is captured. This is the default.
17+
LAST_MESSAGE: Only the final assistant message is captured.
18+
ALL_MESSAGES: The full conversation history (system/user/assistant/tool)
19+
is captured.
20+
"""
21+
22+
NONE = "none"
23+
LAST_MESSAGE = "last_message"
24+
ALL_MESSAGES = "all_messages"

packages/data-designer-config/tests/config/test_columns.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
)
3737
from data_designer.config.utils.code_lang import CodeLang
3838
from data_designer.config.utils.errors import UserJinjaTemplateSyntaxError
39+
from data_designer.config.utils.trace_type import TraceType
3940
from data_designer.config.validator_params import CodeValidatorParams
4041

4142
stub_prompt = "test_prompt {{some_column}}"
@@ -86,6 +87,7 @@ def test_llm_text_column_config():
8687
assert llm_text_column_config.column_type == DataDesignerColumnType.LLM_TEXT
8788
assert set(llm_text_column_config.required_columns) == {"some_column", "some_other_column"}
8889
assert llm_text_column_config.side_effect_columns == ["test_llm_text__trace"]
90+
assert llm_text_column_config.with_trace == TraceType.NONE
8991

9092
# invalid prompt
9193
with pytest.raises(
@@ -110,6 +112,35 @@ def test_llm_text_column_config():
110112
)
111113

112114

115+
def test_llm_text_column_config_with_trace_serialization() -> None:
116+
"""Test that with_trace field serializes and deserializes correctly."""
117+
config = LLMTextColumnConfig(
118+
name="test_llm_text",
119+
prompt=stub_prompt,
120+
model_alias=stub_model_alias,
121+
with_trace=TraceType.ALL_MESSAGES,
122+
)
123+
assert config.with_trace == TraceType.ALL_MESSAGES
124+
125+
# Serialize
126+
serialized = config.model_dump()
127+
assert serialized["with_trace"] == "all_messages"
128+
129+
# Deserialize
130+
deserialized = LLMTextColumnConfig(**serialized)
131+
assert deserialized.with_trace == TraceType.ALL_MESSAGES
132+
133+
# Test with LAST_MESSAGE
134+
config_last = LLMTextColumnConfig(
135+
name="test_llm_text",
136+
prompt=stub_prompt,
137+
model_alias=stub_model_alias,
138+
with_trace=TraceType.LAST_MESSAGE,
139+
)
140+
assert config_last.with_trace == TraceType.LAST_MESSAGE
141+
assert config_last.model_dump()["with_trace"] == "last_message"
142+
143+
113144
def test_llm_code_column_config():
114145
llm_code_column_config = LLMCodeColumnConfig(
115146
name="test_llm_code",

0 commit comments

Comments
 (0)