Skip to content

Commit 75abac1

Browse files
jeremymanningclaude
andcommitted
feat: Implement comprehensive ValidationTool with JSON Schema support (#106)
- Add ValidationTool with full JSON Schema Draft 7 validation - Support three validation modes: STRICT, LENIENT, REPORT_ONLY - Implement type coercion in lenient mode (string->number/boolean) - Add built-in format validators for orchestrator-specific formats: - model-id (e.g., 'openai/gpt-4') - tool-name (e.g., 'web-search') - file-path, yaml-path, pipeline-ref, task-ref - Implement schema inference from sample data - Add custom format registration API - Implement basic structured extraction (Pydantic integration) - Add graph-based schema resolution infrastructure - Add AUTO tag schema inference with pattern matching - Create comprehensive documentation and quick reference - Add 15 unit tests with 100% passing rate Co-Authored-By: Claude <[email protected]>
1 parent 3ab02d4 commit 75abac1

File tree

14 files changed

+3196
-244
lines changed

14 files changed

+3196
-244
lines changed

docs/api/tools/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Tool Categories
1717
web_tools
1818
system_tools
1919
data_tools
20+
validation_tools
2021
report_tools
2122
mcp_server
2223

Lines changed: 264 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,264 @@
1+
Validation Tools
2+
================
3+
4+
The validation tools provide comprehensive data validation capabilities including JSON Schema validation, custom format support, and intelligent type coercion.
5+
6+
.. contents:: Table of Contents
7+
:local:
8+
:depth: 2
9+
10+
ValidationTool
11+
--------------
12+
13+
.. autoclass:: orchestrator.tools.validation.ValidationTool
14+
:members:
15+
:undoc-members:
16+
:show-inheritance:
17+
18+
The ValidationTool provides three main actions:
19+
20+
* **validate** - Validate data against a JSON Schema
21+
* **infer_schema** - Automatically infer a schema from sample data
22+
* **extract_structured** - Extract structured data from text (coming soon)
23+
24+
**Example usage in YAML:**
25+
26+
.. code-block:: yaml
27+
28+
steps:
29+
- id: validate_user_data
30+
tool: validation
31+
action: validate
32+
parameters:
33+
data: "{{ user_input }}"
34+
schema:
35+
type: object
36+
properties:
37+
name:
38+
type: string
39+
minLength: 1
40+
email:
41+
type: string
42+
format: email
43+
age:
44+
type: integer
45+
minimum: 0
46+
required: ["name", "email"]
47+
mode: strict
48+
49+
**Example usage in Python:**
50+
51+
.. code-block:: python
52+
53+
from orchestrator.tools.validation import ValidationTool
54+
55+
tool = ValidationTool()
56+
57+
# Validate data
58+
result = await tool.execute(
59+
action="validate",
60+
data={"name": "John", "email": "[email protected]", "age": 30},
61+
schema={
62+
"type": "object",
63+
"properties": {
64+
"name": {"type": "string"},
65+
"email": {"type": "string", "format": "email"},
66+
"age": {"type": "integer"}
67+
},
68+
"required": ["name", "email"]
69+
},
70+
mode="strict"
71+
)
72+
73+
if result["valid"]:
74+
print("Data is valid!")
75+
else:
76+
print("Validation errors:", result["errors"])
77+
78+
Validation Modes
79+
----------------
80+
81+
.. autoclass:: orchestrator.tools.validation.ValidationMode
82+
:members:
83+
:undoc-members:
84+
:show-inheritance:
85+
86+
The validation tool supports three modes:
87+
88+
* **STRICT** - Fail on any validation error (default)
89+
* **LENIENT** - Attempt to coerce compatible types and warn on minor issues
90+
* **REPORT_ONLY** - Never fail, only report validation issues
91+
92+
Type coercion in lenient mode:
93+
94+
* String to integer: ``"42"`` → ``42``
95+
* String to number: ``"3.14"`` → ``3.14``
96+
* String to boolean: ``"true"`` → ``True``, ``"false"`` → ``False``
97+
* Number to string: ``42`` → ``"42"``
98+
99+
Schema State
100+
------------
101+
102+
.. autoclass:: orchestrator.tools.validation.SchemaState
103+
:members:
104+
:undoc-members:
105+
:show-inheritance:
106+
107+
Schema resolution states:
108+
109+
* **FIXED** - Fully determined at compile time
110+
* **PARTIAL** - Some parts known, others ambiguous
111+
* **AMBIGUOUS** - Cannot be determined until runtime
112+
113+
Format Validators
114+
-----------------
115+
116+
.. autoclass:: orchestrator.tools.validation.FormatValidator
117+
:members:
118+
:undoc-members:
119+
:show-inheritance:
120+
121+
Built-in format validators:
122+
123+
* **model-id** - AI model identifiers (e.g., ``openai/gpt-4``)
124+
* **tool-name** - Tool names (e.g., ``web-search``)
125+
* **file-path** - Valid file system paths
126+
* **yaml-path** - JSONPath expressions
127+
* **pipeline-ref** - Pipeline identifiers
128+
* **task-ref** - Task output references (e.g., ``task1.output``)
129+
130+
**Registering custom formats:**
131+
132+
.. code-block:: python
133+
134+
from orchestrator.tools.validation import ValidationTool
135+
136+
tool = ValidationTool()
137+
138+
# Pattern-based validator
139+
tool.register_format(
140+
"order-id",
141+
r"^ORD-\d{6}$",
142+
"Order ID format (ORD-XXXXXX)"
143+
)
144+
145+
# Function-based validator
146+
def validate_even(value):
147+
return isinstance(value, int) and value % 2 == 0
148+
149+
tool.register_format(
150+
"even-number",
151+
validate_even,
152+
"Even integer validator"
153+
)
154+
155+
Schema Validator
156+
----------------
157+
158+
.. autoclass:: orchestrator.tools.validation.SchemaValidator
159+
:members:
160+
:undoc-members:
161+
:show-inheritance:
162+
163+
Core schema validation engine using JSON Schema Draft 7.
164+
165+
Validation Result
166+
-----------------
167+
168+
.. autoclass:: orchestrator.tools.validation.ValidationResult
169+
:members:
170+
:undoc-members:
171+
:show-inheritance:
172+
173+
Result object containing validation outcome and details.
174+
175+
Working with AUTO Tags
176+
----------------------
177+
178+
The ValidationTool supports AUTO tags for dynamic schema and mode selection:
179+
180+
.. code-block:: yaml
181+
182+
steps:
183+
- id: smart_validation
184+
tool: validation
185+
action: validate
186+
parameters:
187+
data: "{{ input_data }}"
188+
schema: <AUTO>Infer appropriate schema based on the data structure</AUTO>
189+
mode: <AUTO>Choose validation mode based on data quality and criticality</AUTO>
190+
191+
Schema Inference Example
192+
------------------------
193+
194+
.. code-block:: python
195+
196+
from orchestrator.tools.validation import ValidationTool
197+
198+
tool = ValidationTool()
199+
200+
# Sample data
201+
sample_data = {
202+
"users": [
203+
{
204+
"name": "Alice",
205+
"email": "[email protected]",
206+
"age": 30,
207+
"active": True
208+
}
209+
],
210+
"created": "2024-01-15"
211+
}
212+
213+
# Infer schema
214+
result = await tool.execute(
215+
action="infer_schema",
216+
data=sample_data
217+
)
218+
219+
print("Inferred schema:")
220+
print(json.dumps(result["schema"], indent=2))
221+
222+
This will generate a schema with appropriate types and detected formats (e.g., email format for the email field).
223+
224+
Integration with Pipelines
225+
--------------------------
226+
227+
Data validation can be used as quality gates in pipelines:
228+
229+
.. code-block:: yaml
230+
231+
steps:
232+
- id: fetch_data
233+
tool: web-search
234+
parameters:
235+
query: "{{ search_term }}"
236+
237+
- id: validate_results
238+
tool: validation
239+
action: validate
240+
parameters:
241+
data: "{{ fetch_data.results }}"
242+
schema:
243+
type: array
244+
items:
245+
type: object
246+
properties:
247+
title: {type: string}
248+
url: {type: string, format: uri}
249+
required: ["title", "url"]
250+
mode: strict
251+
252+
- id: process_valid_data
253+
tool: data-processing
254+
parameters:
255+
data: "{{ validate_results.data }}"
256+
dependencies: [validate_results]
257+
condition: "{{ validate_results.valid == true }}"
258+
259+
See Also
260+
--------
261+
262+
* :doc:`data_tools` - For data processing and transformation
263+
* :doc:`base` - For creating custom tools
264+
* `JSON Schema Documentation <https://json-schema.org/>`_ - For schema syntax reference

0 commit comments

Comments
 (0)