Skip to content
145 changes: 145 additions & 0 deletions docs/features/self_refinement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Self Refinement

> **Iterative candidate improvement with judge scores and a reflection trajectory (as a reusable subgraph)**

## Overview

Self Refinement is a workflow pattern where SyGra:

- Generates a candidate output for a given input prompt
- Uses a separate *judge* step to score and critique the candidate
- Optionally refines the candidate using the critique
- Repeats until the candidate is accepted (meets a score threshold) or a maximum number of iterations is reached

This feature is implemented as a reusable **subgraph recipe**: `sygra.recipes.self_refinement`.

---

## What you get

When used in a task, the self-refinement subgraph produces:

- `candidate`: the final accepted (or last) candidate
- `self_refine_judge`: judge output containing score, pass/fail, critique, and raw parsed output
- `reflection_trajectory`: an ordered list capturing each iteration’s candidate + judge result (useful for training / analysis)

### Example output shape

```json
{
"candidate": "...final candidate...",
"self_refine_judge": {
"score": 4,
"is_good": true,
"critique": "...",
"raw": {"score": 4, "is_good": true, "critique": "..."}
},
"reflection_trajectory": [
{
"iteration": 0,
"candidate_key": "candidate",
"candidate": "...candidate at iteration 0...",
"judge": {"score": 2, "is_good": false, "critique": "...", "raw": {"...": "..."}}
},
{
"iteration": 1,
"candidate_key": "candidate",
"candidate": "...candidate at iteration 1...",
"judge": {"score": 4, "is_good": true, "critique": "...", "raw": {"...": "..."}}
}
]
}
```

---

## How it works (recipe)

The recipe lives at:

- `sygra/recipes/self_refinement/graph_config.yaml`
- `sygra/recipes/self_refinement/task_executor.py`

### Nodes

- `init_self_refine` (lambda)
- Initializes internal loop state and `reflection_trajectory`
- Uses internal state keys prefixed with `_self_refine_*` to minimize collision risk in host graphs
- `generate_candidate` (llm)
- Creates the initial candidate from `{prompt}`
- `judge_candidate` (llm + post-process)
- Returns JSON with `{score, is_good, critique}`
- Post-processor normalizes and writes:
- `self_refine_judge`
- `self_refine_is_good`
- `self_refine_critique`
- `update_trajectory` (lambda)
- Appends `{candidate, judge, iteration}` to `reflection_trajectory`
- `refine_candidate` (llm)
- Produces an improved candidate using `{self_refine_critique}`

### Looping behavior

A conditional edge (`SelfRefinementLoopCondition`) ends the subgraph when either:

- The judge accepts (`self_refine_is_good == True`), or
- The maximum number of refinement iterations is reached

---

## Using self refinement as a subgraph in a task

A typical pattern is to add a `subgraph` node in your task and point it at the recipe.

The example task is located at:

- `tasks/examples/self_refinement/graph_config.yaml`
- `tasks/examples/self_refinement/input.json`

### Example task config (high level)

In `tasks/examples/self_refinement/graph_config.yaml`, the task defines a single node:

- `self_refine`:
- `node_type: subgraph`
- `subgraph: sygra.recipes.self_refinement`
- Overrides recipe node config via `node_config_map` (e.g., model selection, temperatures, and loop parameters)

It also maps the key outputs into the final dataset using `output_config.output_map`:

- `candidate` from `candidate`
- `self_refine_judge` from `self_refine_judge`
- `reflection_trajectory` from `reflection_trajectory`

---

## Running the example

From the repo root:

```bash
python main.py --task examples.self_refinement --num_records 2
```

Artifacts:

- Output records are written under `tasks/examples/self_refinement/` (typically `output_<timestamp>.json`)
- Metadata is written under `tasks/examples/self_refinement/metadata/` (typically `metadata_tasks_examples_self_refinement_<timestamp>.json`)

The metadata file includes per-node execution statistics (e.g., `self_refine.generate_candidate`, `self_refine.judge_candidate`) and aggregate token/cost summaries.

---

## Customization

You can tune behavior either in the recipe or (recommended) from the task via `node_config_map`:

- `init_self_refine.params.max_iterations`
- `init_self_refine.params.score_threshold`
- `judge_candidate.model.parameters.temperature` (usually `0` for consistent scoring)
- Prompts for `generate_candidate`, `judge_candidate`, and `refine_candidate`

If you need the recipe to write outputs under different keys (to avoid collisions in a larger host graph), you can override:

- `update_trajectory.params.candidate_key`
- `update_trajectory.params.judge_key`
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ nav:
- Structured Output: concepts/structured_output/README.md
- Features:
- Metadata Tracking: features/metadata_tracking.md
- Self Refinement: features/self_refinement.md
- Tutorials:
- Agent Simulation: tutorials/agent_simulation_tutorial.md
- Agent Simulation with Tools: tutorials/agent_tool_simulation_tutorial.md
Expand Down
112 changes: 112 additions & 0 deletions sygra/recipes/self_refinement/graph_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
graph_config:
nodes:
init_self_refine:
node_type: lambda
lambda: sygra.recipes.self_refinement.task_executor.InitSelfRefinementState
params:
max_iterations: 2
score_threshold: 4
output_keys:
- _self_refine_iteration
- _self_refine_max_iterations
- _self_refine_score_threshold
- reflection_trajectory

generate_candidate:
node_type: llm
output_keys: candidate
prompt:
- system: |
You are a helpful assistant.
- user: |
{prompt}
model:
name: gpt-4o-mini
parameters:
temperature: 0.7

judge_candidate:
node_type: llm
post_process: sygra.recipes.self_refinement.task_executor.SelfRefinementJudgePostProcessor
output_keys:
- self_refine_judge
- self_refine_is_good
- self_refine_critique
prompt:
- system: |
You are a strict judge. Evaluate the candidate against the user's request.
Return JSON only.
- user: |
User request:
{prompt}

Candidate:
{candidate}

Score the candidate from 1-5.
If score >= 4 it is good.

OUTPUT JSON:
{{"score": <int>, "is_good": <true/false>, "critique": "..."}}
model:
name: gpt-4o-mini
parameters:
temperature: 0
structured_output:
schema:
fields:
score:
type: float
description: "judge quality score of the candidate"
is_good:
type: bool
description: "Boolean value whether the candidate is good or bad"
critique:
type: str
description: "Judge's critique description of the candidate"

update_trajectory:
node_type: lambda
lambda: sygra.recipes.self_refinement.task_executor.UpdateSelfRefinementTrajectory
params:
candidate_key: candidate
judge_key: self_refine_judge

refine_candidate:
node_type: llm
output_keys: candidate
prompt:
- system: |
You improve the candidate based on critique. Return only the improved candidate.
- user: |
User request:
{prompt}

Previous candidate:
{candidate}

Critique:
{self_refine_critique}

Produce an improved candidate.
model:
name: gpt-4o-mini
parameters:
temperature: 0.7

edges:
- from: START
to: init_self_refine
- from: init_self_refine
to: generate_candidate
- from: generate_candidate
to: judge_candidate
- from: judge_candidate
to: update_trajectory
- from: update_trajectory
condition: sygra.recipes.self_refinement.task_executor.SelfRefinementLoopCondition
path_map:
accept: END
refine: refine_candidate
- from: refine_candidate
to: judge_candidate
Loading