Skip to content

Commit 540a309

Browse files
author
Dariusz Debowczyk
committed
Changes in agent core + new capabilities
1 parent 3f6a5fb commit 540a309

23 files changed

+1341
-29
lines changed

.beads/.local_version

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.49.6
1+
0.49.3

.beads/issues.jsonl

Lines changed: 27 additions & 26 deletions
Large diffs are not rendered by default.
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
---
2+
title: 'Agent Execution Retrospective (D-Mail)'
3+
docname: 'agent_retrospective'
4+
order: 8
5+
id: 'f3a1'
6+
---
7+
## Overview
8+
9+
Execution retrospective lets an agent "rewind" its conversation to an earlier checkpoint
10+
when it realizes it has been going in circles or took a wrong path. Inspired by kimi-cli's
11+
D-Mail mechanism, this capability injects visible `[CHECKPOINT N]` markers before each step.
12+
When the agent calls `execution_retrospective(checkpoint_id, guidance)`, the message context
13+
is truncated to before that checkpoint and the guidance is injected as a message from the
14+
agent's "future self".
15+
16+
Key properties:
17+
- **Only the message buffer is rewound** — execution history (steps, token usage) is preserved
18+
- **Side effects are NOT undone** — file changes, API calls remain; guidance should account for them
19+
- **Checkpoint markers are visible to the LLM** — the agent can reference them by ID
20+
- **`onRewind` callback** — extension point for user-defined self-improvement (logging, memory, prompt tuning)
21+
22+
This significantly reduces wasted steps by:
23+
- Cutting dead-end exploration from the context window
24+
- Providing focused guidance to the agent's "past self"
25+
- Preserving full execution history for observability
26+
27+
Key concepts:
28+
- `UseExecutionRetrospective`: Capability that adds checkpoint markers, rewind logic, and system prompt instructions
29+
- `RetrospectivePolicy`: Configuration (maxRewinds, systemPromptInstructions)
30+
- `onRewind`: User callback invoked on every rewind with the result and agent state
31+
- `AgentConsoleLogger`: Shows checkpoint injection, tool calls, and step progression
32+
33+
## Example
34+
35+
```php
36+
<?php
37+
require 'examples/boot.php';
38+
39+
use Cognesy\Agents\Builder\AgentBuilder;
40+
use Cognesy\Agents\Capability\Bash\UseBash;
41+
use Cognesy\Agents\Capability\Core\UseContextConfig;
42+
use Cognesy\Agents\Capability\Core\UseGuards;
43+
use Cognesy\Agents\Capability\Core\UseLLMConfig;
44+
use Cognesy\Agents\Capability\Retrospective\ExecutionRetrospectiveResult;
45+
use Cognesy\Agents\Capability\Retrospective\RetrospectivePolicy;
46+
use Cognesy\Agents\Capability\Retrospective\UseExecutionRetrospective;
47+
use Cognesy\Agents\Data\AgentState;
48+
use Cognesy\Agents\Events\Support\AgentConsoleLogger;
49+
use Cognesy\Messages\Messages;
50+
51+
// Track rewinds for observability
52+
$rewindLog = [];
53+
54+
// Create console logger for execution visibility
55+
$logger = new AgentConsoleLogger(
56+
useColors: true,
57+
showTimestamps: true,
58+
showContinuation: true,
59+
showToolArgs: true,
60+
);
61+
62+
// Configure working directory — point at the Instructor codebase root (so `bd` works)
63+
$workDir = dirname(__DIR__, 3);
64+
65+
// Build agent with bash + retrospective capabilities
66+
// Note: The system prompt gives NO instructions about `bd` — the agent must explore it.
67+
// The massive --help output becomes wasted context once the agent knows the right command.
68+
// UseExecutionRetrospective automatically appends retrospective instructions
69+
// to the system prompt via BeforeExecution hook — no manual prompt setup needed.
70+
$agent = AgentBuilder::base()
71+
->withCapability(new UseLLMConfig(model: 'gpt-5.2'))
72+
->withCapability(new UseContextConfig(
73+
systemPrompt: <<<'SYSTEM'
74+
You are a CLI automation agent. You accomplish tasks using bash commands.
75+
Always limit command output — use --limit, | head -20, etc.
76+
77+
WORKFLOW — you always work in two passes:
78+
Pass 1: Explore the tool (--help, trial runs). Once you get the result, do NOT answer.
79+
Instead call execution_retrospective to rewind with the exact command as guidance.
80+
Pass 2: After rewind, guidance from your future self is in the conversation.
81+
Trust it. Run the command from guidance. Answer. Done.
82+
Do NOT explore again. Do NOT call execution_retrospective again.
83+
SYSTEM,
84+
))
85+
->withCapability(new UseBash(baseDir: $workDir))
86+
->withCapability(new UseExecutionRetrospective(
87+
policy: new RetrospectivePolicy(
88+
maxRewinds: 1,
89+
systemPromptInstructions: <<<'PROMPT'
90+
## Execution Retrospective (IMPORTANT)
91+
92+
The conversation contains [CHECKPOINT N] markers before each step. You have the
93+
`execution_retrospective` tool available.
94+
95+
[CHECKPOINT N] markers appear before each step. You have `execution_retrospective`.
96+
97+
After a rewind, guidance from your future self appears as an assistant message.
98+
If you see such guidance: trust it, run the command it specifies, answer. Done.
99+
Do NOT read --help. Do NOT explore. Do NOT call execution_retrospective again.
100+
PROMPT,
101+
),
102+
onRewind: function (ExecutionRetrospectiveResult $result, AgentState $state) use (&$rewindLog) {
103+
$rewindLog[] = [
104+
'checkpoint' => $result->checkpointId,
105+
'guidance' => $result->guidance,
106+
'step' => $state->stepCount(),
107+
];
108+
echo "\n ** REWIND to checkpoint {$result->checkpointId}: {$result->guidance}\n\n";
109+
},
110+
))
111+
->withCapability(new UseGuards(maxSteps: 20, maxTokens: 65536, maxExecutionTime: 180))
112+
->build()
113+
->wiretap($logger->wiretap());
114+
115+
// Task: List issues using the `bd` CLI — with zero prior knowledge.
116+
// The agent has no idea what `bd` is. It must explore via --help and trial/error.
117+
//
118+
// Expected flow:
119+
// Phase 1 (steps 1-3): Agent explores `bd` (--help, list --help, maybe a wrong attempt)
120+
// → Context now polluted with massive help output
121+
// Phase 2 (step 4): Agent successfully runs `bd list`
122+
// Phase 3 (step 5): Agent recognizes exploration waste → calls execution_retrospective
123+
// → Rewinds to checkpoint 1 with guidance: "Run `bd list` to list issues"
124+
// Phase 4 (step 6): With clean context, agent one-shots `bd list` and responds
125+
// ~6 steps total, but context is clean after rewind
126+
$question = <<<'QUESTION'
127+
List the 5 most recent open issues tracked in this project.
128+
I believe the command is `bd issues --open --limit 5`.
129+
QUESTION;
130+
131+
$state = AgentState::empty()->withMessages(
132+
Messages::fromString($question)
133+
);
134+
135+
echo "=== Agent Execution Log ===\n";
136+
echo "Task: List issues using unknown CLI tool (bd)\n\n";
137+
138+
// Execute agent until completion
139+
$finalState = $agent->execute($state);
140+
141+
echo "\n=== Result ===\n";
142+
$answer = $finalState->finalResponse()->toString() ?: 'No answer';
143+
echo "Answer: {$answer}\n";
144+
echo "Steps: {$finalState->stepCount()}\n";
145+
echo "Tokens: {$finalState->usage()->total()}\n";
146+
echo "Status: {$finalState->status()->value}\n";
147+
148+
if ($rewindLog !== []) {
149+
echo "\n=== Rewind Log ===\n";
150+
foreach ($rewindLog as $i => $entry) {
151+
echo "Rewind #{$i}: checkpoint={$entry['checkpoint']}, at step={$entry['step']}\n";
152+
echo " Guidance: {$entry['guidance']}\n";
153+
}
154+
} else {
155+
echo "\nNo rewinds occurred — agent completed on first attempt.\n";
156+
}
157+
158+
// Assertions
159+
assert($finalState->stepCount() >= 1, 'Expected at least 1 step');
160+
assert($finalState->usage()->total() > 0, 'Expected token usage > 0');
161+
?>
162+
```

packages/agents/src/AgentLoop.php

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,7 @@ public function iterate(AgentState $state): iterable {
130130

131131
protected function onBeforeExecution(AgentState $state): AgentState {
132132
$state = $this->ensureNextExecution($state);
133+
$state = $state->with(executionCount: $state->executionCount() + 1);
133134
$this->emitExecutionStarted($state, count($this->tools->names()));
134135
$state = $this->interceptor->intercept(HookContext::beforeExecution($state))->state();
135136
return $state;
@@ -191,7 +192,7 @@ private function handleStopException(AgentState $state, AgentStopException $stop
191192

192193
private function ensureNextExecution(AgentState $state): AgentState {
193194
return match ($state->status()) {
194-
ExecutionStatus::Completed, ExecutionStatus::Failed => $state->forNextExecution(),
195+
ExecutionStatus::Completed, ExecutionStatus::Stopped, ExecutionStatus::Failed => $state->forNextExecution(),
195196
default => $state,
196197
};
197198
}

packages/agents/src/Capability/Core/UseLLMConfig.php

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
{
1414
public function __construct(
1515
private ?string $preset = null,
16+
private ?string $model = null,
1617
private int $maxRetries = 1,
1718
) {}
1819

@@ -28,6 +29,10 @@ public function configure(CanConfigureAgent $agent): CanConfigureAgent {
2829
default => LLMProvider::using($this->preset),
2930
};
3031

32+
if ($this->model !== null) {
33+
$llm = $llm->withModel($this->model);
34+
}
35+
3136
$retryPolicy = match (true) {
3237
$this->maxRetries > 1 => new InferenceRetryPolicy(maxAttempts: $this->maxRetries),
3338
default => null,
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
<?php declare(strict_types=1);
2+
3+
namespace Cognesy\Agents\Capability\ExecutionHistory;
4+
5+
/**
6+
* In-memory execution store backed by a plain array.
7+
* Useful for testing, short-lived scripts, and single-process agents.
8+
*/
9+
final class ArrayExecutionStore implements ExecutionStore
10+
{
11+
/** @var array<string, ExecutionSummary[]> */
12+
private array $store = [];
13+
14+
#[\Override]
15+
public function record(string $agentId, ExecutionSummary $summary): void
16+
{
17+
$this->store[$agentId][] = $summary;
18+
}
19+
20+
#[\Override]
21+
public function all(string $agentId): array
22+
{
23+
return $this->store[$agentId] ?? [];
24+
}
25+
26+
#[\Override]
27+
public function last(string $agentId): ?ExecutionSummary
28+
{
29+
$history = $this->store[$agentId] ?? [];
30+
return $history !== [] ? $history[array_key_last($history)] : null;
31+
}
32+
33+
#[\Override]
34+
public function count(string $agentId): int
35+
{
36+
return count($this->store[$agentId] ?? []);
37+
}
38+
}
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
<?php declare(strict_types=1);
2+
3+
namespace Cognesy\Agents\Capability\ExecutionHistory;
4+
5+
use Cognesy\Agents\Hook\Contracts\HookInterface;
6+
use Cognesy\Agents\Hook\Data\HookContext;
7+
8+
/**
9+
* AfterExecution hook that records an ExecutionSummary into the ExecutionStore.
10+
*/
11+
final class ExecutionHistoryHook implements HookInterface
12+
{
13+
public function __construct(
14+
private readonly ExecutionStore $store,
15+
) {}
16+
17+
#[\Override]
18+
public function handle(HookContext $context): HookContext
19+
{
20+
$state = $context->state();
21+
22+
if ($state->execution() === null) {
23+
return $context;
24+
}
25+
26+
$summary = ExecutionSummary::fromState($state);
27+
$this->store->record($state->agentId(), $summary);
28+
29+
return $context;
30+
}
31+
}
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
<?php declare(strict_types=1);
2+
3+
namespace Cognesy\Agents\Capability\ExecutionHistory;
4+
5+
/**
6+
* Contract for storing and retrieving execution summaries by agent ID.
7+
*/
8+
interface ExecutionStore
9+
{
10+
public function record(string $agentId, ExecutionSummary $summary): void;
11+
12+
/** @return ExecutionSummary[] */
13+
public function all(string $agentId): array;
14+
15+
public function last(string $agentId): ?ExecutionSummary;
16+
17+
public function count(string $agentId): int;
18+
}
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
<?php declare(strict_types=1);
2+
3+
namespace Cognesy\Agents\Capability\ExecutionHistory;
4+
5+
use Cognesy\Agents\Data\AgentState;
6+
use Cognesy\Agents\Enums\ExecutionStatus;
7+
use Cognesy\Polyglot\Inference\Data\Usage;
8+
use DateTimeImmutable;
9+
10+
/**
11+
* Lightweight summary of a completed execution, suitable for storage and querying.
12+
*/
13+
final readonly class ExecutionSummary
14+
{
15+
public function __construct(
16+
public string $executionId,
17+
public int $executionNumber,
18+
public ExecutionStatus $status,
19+
public int $stepCount,
20+
public Usage $usage,
21+
public float $duration,
22+
public DateTimeImmutable $startedAt,
23+
public ?DateTimeImmutable $completedAt,
24+
public ?string $stopReason,
25+
public ?string $stopMessage,
26+
public int $errorCount,
27+
) {}
28+
29+
/**
30+
* Build a summary from the agent state at execution end.
31+
*
32+
* NOTE: AfterExecution hooks fire before withExecutionCompleted() sets the
33+
* final status, so we derive it from stop signals and error state.
34+
*/
35+
public static function fromState(AgentState $state): self
36+
{
37+
$execution = $state->execution();
38+
$signal = $state->lastStopSignal();
39+
40+
$status = match (true) {
41+
$execution?->isFailed() => ExecutionStatus::Failed,
42+
$execution?->hasErrors() => ExecutionStatus::Failed,
43+
$signal?->reason->wasForceStopped() => ExecutionStatus::Stopped,
44+
default => ExecutionStatus::Completed,
45+
};
46+
47+
return new self(
48+
executionId: $execution?->executionId() ?? '',
49+
executionNumber: $state->executionCount(),
50+
status: $status,
51+
stepCount: $state->stepCount(),
52+
usage: $state->usage(),
53+
duration: $execution?->totalDuration() ?? 0.0,
54+
startedAt: $execution?->startedAt() ?? new DateTimeImmutable(),
55+
completedAt: $execution?->completedAt() ?? new DateTimeImmutable(),
56+
stopReason: $signal?->reason->value,
57+
stopMessage: $signal?->message,
58+
errorCount: $state->errors()->count(),
59+
);
60+
}
61+
62+
public function toArray(): array
63+
{
64+
return [
65+
'executionId' => $this->executionId,
66+
'executionNumber' => $this->executionNumber,
67+
'status' => $this->status->value,
68+
'stepCount' => $this->stepCount,
69+
'usage' => $this->usage->toArray(),
70+
'duration' => $this->duration,
71+
'startedAt' => $this->startedAt->format(DateTimeImmutable::ATOM),
72+
'completedAt' => $this->completedAt?->format(DateTimeImmutable::ATOM),
73+
'stopReason' => $this->stopReason,
74+
'stopMessage' => $this->stopMessage,
75+
'errorCount' => $this->errorCount,
76+
];
77+
}
78+
}

0 commit comments

Comments
 (0)