Skip to content

Commit 0daa522

Browse files
committed
Make Area and Capability description fields required instead of optional.
Add file references to all dataclass sections in PIPELINE_SCHEMAS.md documentation.
1 parent ec5e715 commit 0daa522

File tree

6 files changed

+153
-170
lines changed

6 files changed

+153
-170
lines changed

src/schemas/PIPELINE_SCHEMAS.md

Lines changed: 28 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,8 @@ All dataclasses used across pipeline stages are defined below. Stage implementat
209209

210210
### PipelineMetadata
211211

212+
**File:** [`metadata_schemas.py`](metadata_schemas.py)
213+
212214
All pipeline outputs include a `metadata` object (represented by the `PipelineMetadata` dataclass) that provides pipeline execution context and traceability.
213215

214216
**Required Fields:**
@@ -228,6 +230,8 @@ All pipeline outputs include a `metadata` object (represented by the `PipelineMe
228230

229231
### Experiment
230232

233+
**File:** [`experiment_schemas.py`](experiment_schemas.py)
234+
231235
**Fields:**
232236
- `experiment_id`: String (required, experiment identifier)
233237
- `domain`: String (required, human-readable domain name)
@@ -237,18 +241,22 @@ All pipeline outputs include a `metadata` object (represented by the `PipelineMe
237241

238242
### Domain
239243

244+
**File:** [`domain_schemas.py`](domain_schemas.py)
245+
240246
**Fields:**
241247
- `name`: String (required, human-readable domain name)
242248
- `domain_id`: String (required)
243249
- `description`: String (optional, domain description)
244250

245251
### Area
246252

253+
**File:** [`area_schemas.py`](area_schemas.py)
254+
247255
**Fields:**
248256
- `name`: String (required, human-readable area name)
249257
- `area_id`: String (required)
250-
- `description`: String (optional, area description)
251-
- `domain`: Optional[Domain] (optional, Domain dataclass object)
258+
- `domain`: Domain (required, Domain dataclass object)
259+
- `description`: String (required, area description)
252260
- `generation_metadata`: Dict (optional, nested dictionary containing process-specific information)
253261
- This field can contain any generation-specific data (e.g., generation method, parameters, intermediate steps)
254262
- Structure is flexible and depends on the generation method
@@ -257,11 +265,13 @@ All pipeline outputs include a `metadata` object (represented by the `PipelineMe
257265

258266
### Capability
259267

268+
**File:** [`capability_schemas.py`](capability_schemas.py)
269+
260270
**Fields:**
261271
- `name`: String (required, capability name)
262272
- `capability_id`: String (required)
263-
- `description`: String (optional, capability description)
264-
- `area`: Optional[Area] (optional, Area dataclass object)
273+
- `area`: Area (required, Area dataclass object)
274+
- `description`: String (required, capability description)
265275
- `generation_metadata`: Dict (optional, nested dictionary containing process-specific information)
266276
- This field can contain any generation-specific data (e.g., generation method, parameters, intermediate steps)
267277
- Structure is flexible and depends on the generation method
@@ -270,40 +280,46 @@ All pipeline outputs include a `metadata` object (represented by the `PipelineMe
270280

271281
### Task
272282

283+
**File:** [`task_schemas.py`](task_schemas.py)
284+
273285
**Fields:**
274286
- `task_id`: String (required, unique within capability)
275287
- `task`: String (required, the task/problem text)
276-
- `capability`: Optional[Capability] (optional, Capability dataclass object)
288+
- `capability`: Capability (required, Capability dataclass object)
277289

278290
**Note:** When serialized to JSON, the `capability` object is flattened to `capability` (string), `capability_id` (string), `area` (string), `area_id` (string), `domain` (string), and `domain_id` (string) fields.
279291

280292
### TaskSolution
281293

294+
**File:** [`solution_schemas.py`](solution_schemas.py)
295+
282296
**Fields:**
283297
- `task_id`: String (required)
284298
- `task`: String (required, the task/problem text from Stage 3)
285299
- `solution`: String (required, the final solution)
286300
- `reasoning`: String (required, explanation of the solution)
301+
- `task_obj`: Task (required, Task dataclass object with full hierarchy)
287302
- `numerical_answer`: String (optional, JSON string with numerical results)
288303
- `generation_metadata`: Dict (optional, nested dictionary containing process-specific information)
289304
- This field can contain any generation-specific data (e.g., debate rounds, agent interactions, pipeline type)
290305
- Structure is flexible and depends on the generation method (agentic, single-agent, etc.)
291-
- `task_obj`: Optional[Task] (optional, Task dataclass object with full hierarchy)
292306

293307
**Note:** When serialized to JSON, the `task_obj` object is flattened to `capability` (string), `capability_id` (string), `area` (string), `area_id` (string), `domain` (string), and `domain_id` (string) fields.
294308

295309
### ValidationResult
296310

311+
**File:** [`validation_schemas.py`](validation_schemas.py)
312+
297313
**Fields:**
298314
- `task_id`: String (required)
299315
- `task`: String (required, the task/problem text from Stage 3)
300316
- `verification`: Boolean (required, overall validation status - whether the solution is verified/valid)
301317
- `feedback`: String (required, detailed feedback on the validation)
318+
- `task_obj`: Task (required, Task dataclass object with full hierarchy)
302319
- `score`: Float (optional, validation score, typically 0.0 to 1.0)
303320
- `generation_metadata`: Dict (optional, nested dictionary containing process-specific information)
304321
- This field can contain any validation-specific data (e.g., validation method, criteria details, error details)
305322
- Structure is flexible and depends on the validation method
306-
- `task_obj`: Optional[Task] (optional, Task dataclass object with full hierarchy)
307323

308324
**Note:** When serialized to JSON, the `task_obj` object is flattened to `capability` (string), `capability_id` (string), `area` (string), `area_id` (string), `domain` (string), and `domain_id` (string) fields.
309325

@@ -333,7 +349,7 @@ This stage creates two files:
333349
#### Output 1: `experiment.json`
334350

335351
**Stage Output:** Experiment dataclass + PipelineMetadata
336-
**Save Function:** `save_experiment(experiment: Experiment, metadata: PipelineMetadata, output_path: Path)`
352+
**Save Function:** `save_experiment(experiment: Experiment, metadata: PipelineMetadata, output_path: Path)` (see [`io_utils.py`](io_utils.py))
337353

338354
**File Path:** `<output_dir>/<experiment_id>/experiment.json`
339355

@@ -364,7 +380,7 @@ This stage creates two files:
364380
#### Output 2: `domain.json`
365381

366382
**Stage Output:** Domain dataclass object + PipelineMetadata
367-
**Save Function:** `save_domain(domain: Domain, metadata: PipelineMetadata, output_path: Path)`
383+
**Save Function:** `save_domain(domain: Domain, metadata: PipelineMetadata, output_path: Path)` (see [`io_utils.py`](io_utils.py))
368384

369385
**File Path:** `<output_dir>/<experiment_id>/domain/domain.json`
370386

@@ -402,7 +418,7 @@ This stage creates two files:
402418
### Output: `areas.json`
403419

404420
**Stage Output:** List[Area] dataclasses + PipelineMetadata
405-
**Save Function:** `save_areas(areas: List[Area], metadata: PipelineMetadata, output_path: Path)`
421+
**Save Function:** `save_areas(areas: List[Area], metadata: PipelineMetadata, output_path: Path)` (see [`io_utils.py`](io_utils.py))
406422

407423
**File Path:** `<output_dir>/<experiment_id>/areas/<tag>/areas.json`
408424
```json
@@ -447,7 +463,7 @@ This stage creates two files:
447463
### Output: `capabilities.json` (one per area)
448464

449465
**Stage Output:** List[Capability] dataclasses + PipelineMetadata
450-
**Save Function:** `save_capabilities(capabilities: List[Capability], metadata: PipelineMetadata, output_path: Path)`
466+
**Save Function:** `save_capabilities(capabilities: List[Capability], metadata: PipelineMetadata, output_path: Path)` (see [`io_utils.py`](io_utils.py))
451467

452468
**File Path:** `<output_dir>/<experiment_id>/capabilities/<cap_tag>/<area_id>/capabilities.json`
453469

@@ -495,7 +511,7 @@ This stage creates two files:
495511
### Output: `tasks.json` (one per capability)
496512

497513
**Stage Output:** List[Task] dataclasses + PipelineMetadata
498-
**Save Function:** `save_tasks(tasks: List[Task], metadata: PipelineMetadata, output_path: Path)`
514+
**Save Function:** `save_tasks(tasks: List[Task], metadata: PipelineMetadata, output_path: Path)` (see [`io_utils.py`](io_utils.py))
499515

500516
**File Path:** `<output_dir>/<experiment_id>/tasks/<task_tag>/<area_id>/<capability_id>/tasks.json`
501517

src/schemas/area_schemas.py

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
"""Schemas for area generation stage (Stage 1).
22
33
Defines Area dataclass for domain area. Areas are high-level categories
4-
within a domain (e.g., "Budgeting" within "Personal Finance").
4+
within a domain.
55
"""
66

77
from dataclasses import dataclass, field
@@ -16,39 +16,35 @@ class Area:
1616

1717
name: str
1818
area_id: str
19-
description: Optional[str] = None
20-
domain: Optional[Domain] = None
19+
domain: Domain
20+
description: str
2121
generation_metadata: Optional[Dict] = field(default_factory=dict)
2222

2323
def to_dict(self):
2424
"""Convert to dictionary."""
2525
result = {
2626
"name": self.name,
2727
"area_id": self.area_id,
28+
"domain": self.domain.name,
29+
"domain_id": self.domain.domain_id,
30+
"description": self.description,
2831
}
29-
if self.domain is not None:
30-
result["domain"] = self.domain.name
31-
result["domain_id"] = self.domain.domain_id
32-
if self.description is not None:
33-
result["description"] = self.description
3432
if self.generation_metadata:
3533
result["generation_metadata"] = self.generation_metadata
3634
return result
3735

3836
@classmethod
3937
def from_dict(cls, data: dict):
4038
"""Create from dictionary."""
41-
domain = None
42-
if "domain" in data and "domain_id" in data:
43-
domain = Domain(
44-
name=data["domain"],
45-
domain_id=data["domain_id"],
46-
description=None,
47-
)
39+
domain = Domain(
40+
name=data["domain"],
41+
domain_id=data["domain_id"],
42+
description=data.get("domain_description"),
43+
)
4844
return cls(
4945
name=data["name"],
5046
area_id=data["area_id"],
51-
description=data.get("description"),
5247
domain=domain,
48+
description=data["description"],
5349
generation_metadata=data.get("generation_metadata", {}),
5450
)

src/schemas/capability_schemas.py

Lines changed: 21 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
"""Schemas for capability generation stage (Stage 2).
22
33
Defines Capability dataclass for capability within an area. Capabilities
4-
are specific skills or abilities (e.g., "Budget Creation" within "Budgeting" area).
4+
are specific skills or abilities.
55
"""
66

77
from dataclasses import dataclass, field
@@ -17,50 +17,44 @@ class Capability:
1717

1818
name: str
1919
capability_id: str
20-
description: Optional[str] = None
21-
area: Optional[Area] = None
20+
area: Area
21+
description: str
2222
generation_metadata: Optional[Dict] = field(default_factory=dict)
2323

2424
def to_dict(self):
2525
"""Convert to dictionary."""
2626
result = {
2727
"name": self.name,
2828
"capability_id": self.capability_id,
29+
"area": self.area.name,
30+
"area_id": self.area.area_id,
31+
"area_description": self.area.description,
32+
"domain": self.area.domain.name,
33+
"domain_id": self.area.domain.domain_id,
34+
"description": self.description,
2935
}
30-
if self.area is not None:
31-
result["area"] = self.area.name
32-
result["area_id"] = self.area.area_id
33-
if self.area.domain is not None:
34-
result["domain"] = self.area.domain.name
35-
result["domain_id"] = self.area.domain.domain_id
36-
if self.description is not None:
37-
result["description"] = self.description
3836
if self.generation_metadata:
3937
result["generation_metadata"] = self.generation_metadata
4038
return result
4139

4240
@classmethod
4341
def from_dict(cls, data: dict):
4442
"""Create from dictionary."""
45-
area = None
46-
if "area" in data and "area_id" in data:
47-
domain = None
48-
if "domain" in data and "domain_id" in data:
49-
domain = Domain(
50-
name=data["domain"],
51-
domain_id=data["domain_id"],
52-
description=None,
53-
)
54-
area = Area(
55-
name=data["area"],
56-
area_id=data["area_id"],
57-
description=None,
58-
domain=domain,
59-
)
43+
domain = Domain(
44+
name=data["domain"],
45+
domain_id=data["domain_id"],
46+
description=data.get("domain_description"),
47+
)
48+
area = Area(
49+
name=data["area"],
50+
area_id=data["area_id"],
51+
domain=domain,
52+
description=data["area_description"],
53+
)
6054
return cls(
6155
name=data["name"],
6256
capability_id=data["capability_id"],
63-
description=data.get("description"),
6457
area=area,
58+
description=data["description"],
6559
generation_metadata=data.get("generation_metadata", {}),
6660
)

src/schemas/solution_schemas.py

Lines changed: 32 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ class TaskSolution:
2121
task: str
2222
solution: str
2323
reasoning: str
24+
task_obj: Task
2425
numerical_answer: Optional[str] = None
2526
generation_metadata: Optional[Dict] = field(default_factory=dict)
26-
task_obj: Optional[Task] = None # Full task object with hierarchy
2727

2828
def to_dict(self):
2929
"""Convert to dictionary."""
@@ -32,16 +32,15 @@ def to_dict(self):
3232
"task": self.task,
3333
"solution": self.solution,
3434
"reasoning": self.reasoning,
35+
"capability_id": self.task_obj.capability.capability_id,
36+
"capability": self.task_obj.capability.name,
37+
"capability_description": self.task_obj.capability.description,
38+
"area": self.task_obj.capability.area.name,
39+
"area_id": self.task_obj.capability.area.area_id,
40+
"area_description": self.task_obj.capability.area.description,
41+
"domain": self.task_obj.capability.area.domain.name,
42+
"domain_id": self.task_obj.capability.area.domain.domain_id,
3543
}
36-
if self.task_obj is not None and self.task_obj.capability is not None:
37-
result["capability_id"] = self.task_obj.capability.capability_id
38-
result["capability"] = self.task_obj.capability.name
39-
if self.task_obj.capability.area is not None:
40-
result["area"] = self.task_obj.capability.area.name
41-
result["area_id"] = self.task_obj.capability.area.area_id
42-
if self.task_obj.capability.area.domain is not None:
43-
result["domain"] = self.task_obj.capability.area.domain.name
44-
result["domain_id"] = self.task_obj.capability.area.domain.domain_id
4544
if self.numerical_answer is not None:
4645
result["numerical_answer"] = self.numerical_answer
4746
if self.generation_metadata:
@@ -51,40 +50,34 @@ def to_dict(self):
5150
@classmethod
5251
def from_dict(cls, data: dict):
5352
"""Create from dictionary."""
54-
task_obj = None
55-
if "capability" in data and "capability_id" in data:
56-
area = None
57-
if "area" in data and "area_id" in data:
58-
domain = None
59-
if "domain" in data and "domain_id" in data:
60-
domain = Domain(
61-
name=data["domain"],
62-
domain_id=data["domain_id"],
63-
description=None,
64-
)
65-
area = Area(
66-
name=data["area"],
67-
area_id=data["area_id"],
68-
description=None,
69-
domain=domain,
70-
)
71-
capability = Capability(
72-
name=data["capability"],
73-
capability_id=data["capability_id"],
74-
description=None,
75-
area=area,
76-
)
77-
task_obj = Task(
78-
task_id=data["task_id"],
79-
task=data["task"],
80-
capability=capability,
81-
)
53+
domain = Domain(
54+
name=data["domain"],
55+
domain_id=data["domain_id"],
56+
description=data.get("domain_description"),
57+
)
58+
area = Area(
59+
name=data["area"],
60+
area_id=data["area_id"],
61+
domain=domain,
62+
description=data["area_description"],
63+
)
64+
capability = Capability(
65+
name=data["capability"],
66+
capability_id=data["capability_id"],
67+
area=area,
68+
description=data["capability_description"],
69+
)
70+
task_obj = Task(
71+
task_id=data["task_id"],
72+
task=data["task"],
73+
capability=capability,
74+
)
8275
return cls(
8376
task_id=data["task_id"],
8477
task=data["task"],
8578
solution=data["solution"],
8679
reasoning=data["reasoning"],
80+
task_obj=task_obj,
8781
numerical_answer=data.get("numerical_answer"),
8882
generation_metadata=data.get("generation_metadata", {}),
89-
task_obj=task_obj,
9083
)

0 commit comments

Comments
 (0)