Skip to content

Commit 5044ab0

Browse files
committed
feat(board): add AI-assisted verification milestone
Create milestone 04-ai-assisted-verification in coherence-verification epic with SRS, DESIGN, and 6 stories for implementing multimodal AI analysis of verification artifacts. Stories: - CHORE0146: Create config and templates - FEAT0197: Implement story parser - FEAT0198: Implement artifact collector - FEAT0199: Implement model router (Ollama/Claude) - FEAT0200: Implement AI report generator - FEAT0201: Add just verify ai command Also fix _next-milestone to search epics/*/milestones/ instead of global milestones/ directory.
1 parent 9be0ed5 commit 5044ab0

File tree

12 files changed

+561
-3
lines changed

12 files changed

+561
-3
lines changed

.justfiles/board.just

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -626,11 +626,20 @@ _next-id TYPE:
626626

627627
printf "%04d" $((max + 1))
628628

629-
# Get next milestone number
629+
# Get next milestone number (searches all epics)
630630
_next-milestone:
631631
#!/usr/bin/env bash
632632
cd "{{board_dir}}"
633633
max=0
634+
# Search milestones within all epics
635+
for ms_dir in epics/*/milestones/[0-9]*/; do
636+
[[ -d "$ms_dir" ]] || continue
637+
num=$(basename "$ms_dir" | sed -n 's/^\([0-9]*\).*/\1/p')
638+
if [[ -n "$num" ]] && [[ "$num" -gt "$max" ]]; then
639+
max=$num
640+
fi
641+
done
642+
# Also check top-level milestones for backwards compatibility
634643
for d in milestones/[0-9]*/; do
635644
[[ -d "$d" ]] || continue
636645
num=$(basename "$d" | sed -n 's/^\([0-9]*\).*/\1/p')

docs/board/README.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,17 @@
1313

1414
| Story | Type | Priority | Scope |
1515
|-------|------|----------|-------|
16+
| [[CHORE][0146]-create-ai-verification-config](stages/backlog/stories/[CHORE][0146]-create-ai-verification-config.md) | chore | high | coherence-verification/04-ai-assisted-verification |
1617
| [[FEAT][0115]-eval-web-ui](stages/backlog/stories/[FEAT][0115]-eval-web-ui.md) | feat | medium | evals/39-performance-evaluation |
1718
| [[FEAT][0116]-pty-server-core](stages/backlog/stories/[FEAT][0116]-pty-server-core.md) | feat | medium | tui/46-terminal-server |
1819
| [[FEAT][0117]-session-management](stages/backlog/stories/[FEAT][0117]-session-management.md) | feat | medium | tui/46-terminal-server |
1920
| [[FEAT][0118]-websocket-pty-endpoint](stages/backlog/stories/[FEAT][0118]-websocket-pty-endpoint.md) | feat | medium | tui/46-terminal-server |
2021
| [[FEAT][0119]-xtermjs-web-integration](stages/backlog/stories/[FEAT][0119]-xtermjs-web-integration.md) | feat | medium | tui/46-terminal-server |
22+
| [[FEAT][0197]-implement-story-parser](stages/backlog/stories/[FEAT][0197]-implement-story-parser.md) | feat | high | coherence-verification/04-ai-assisted-verification |
23+
| [[FEAT][0198]-implement-artifact-collector](stages/backlog/stories/[FEAT][0198]-implement-artifact-collector.md) | feat | high | coherence-verification/04-ai-assisted-verification |
24+
| [[FEAT][0199]-implement-model-router](stages/backlog/stories/[FEAT][0199]-implement-model-router.md) | feat | high | coherence-verification/04-ai-assisted-verification |
25+
| [[FEAT][0200]-implement-ai-report-generator](stages/backlog/stories/[FEAT][0200]-implement-ai-report-generator.md) | feat | high | coherence-verification/04-ai-assisted-verification |
26+
| [[FEAT][0201]-add-verify-ai-command](stages/backlog/stories/[FEAT][0201]-add-verify-ai-command.md) | feat | high | coherence-verification/04-ai-assisted-verification |
2127

2228
## Icebox
2329

@@ -42,13 +48,14 @@
4248
| [35-guided-setup](epics/cli/milestones/35-guided-setup/) | done |
4349
| [54-enhanced-cli-experience](epics/cli/milestones/54-enhanced-cli-experience/) | planned |
4450

45-
### [coherence-verification](epics/coherence-verification/) (active) - 3 milestones, 2 done
51+
### [coherence-verification](epics/coherence-verification/) (active) - 4 milestones, 2 done
4652

4753
| Milestone | Status |
4854
|-----------|--------|
4955
| [01-verification-artifact-pipeline](epics/coherence-verification/milestones/01-verification-artifact-pipeline/) | done |
5056
| [02-epic-based-project-hierarchy](epics/coherence-verification/milestones/02-epic-based-project-hierarchy/) | done |
5157
| [03-formal-planning-process](epics/coherence-verification/milestones/03-formal-planning-process/) | in-progress |
58+
| [04-ai-assisted-verification](epics/coherence-verification/milestones/04-ai-assisted-verification/) | backlog |
5259

5360
### [evals](epics/evals/) (planned) - 6 milestones, 0 done
5461

docs/board/epics/coherence-verification/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,13 @@ Build a verification system that captures system behavior as visual artifacts (s
2222
<!-- BEGIN GENERATED -->
2323
## Milestones
2424

25-
**Progress:** 2/3 milestones complete, 15/15 stories done
25+
**Progress:** 2/4 milestones complete, 15/21 stories done
2626
**Active:** Formal Planning Process
2727

2828
| ID | Milestone | Stories | Status |
2929
|----|-----------|---------|--------|
3030
| 01 | [Verification Artifact Pipeline](milestones/01-verification-artifact-pipeline/) | 0/0 | done |
3131
| 02 | [Epic-Based Project Hierarchy](milestones/02-epic-based-project-hierarchy/) | 6/6 | done |
3232
| 03 | [Formal Planning Process](milestones/03-formal-planning-process/) | 9/9 | in-progress |
33+
| 04 | [AI-Assisted Verification](milestones/04-ai-assisted-verification/) | 0/6 | backlog |
3334
<!-- END GENERATED -->
Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# AI-Assisted Verification — Design Document
2+
3+
> Configurable multimodal AI analysis of verification artifacts against acceptance criteria.
4+
5+
**SRS:** [SRS.md](SRS.md)
6+
7+
## Overview
8+
9+
AI-Assisted Verification adds intelligent analysis to the existing verification pipeline. Given a story and its captured artifacts, an AI model reviews whether acceptance criteria are met and produces a detailed report with confidence levels.
10+
11+
## Architecture
12+
13+
```
14+
┌─────────────────────────────────────────────────────────────────┐
15+
│ AI VERIFICATION FLOW │
16+
│ │
17+
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
18+
│ │ Story │ │ Artifacts │ │ Model │ │
19+
│ │ Parser │──▶│ Collector │──▶│ Router │ │
20+
│ │ │ │ │ │ │ │
21+
│ └─────────────┘ └─────────────┘ └─────────────┘ │
22+
│ │ │ │ │
23+
│ ▼ ▼ ▼ │
24+
│ ┌─────────────────────────────────────────────────┐ │
25+
│ │ Verification Engine │ │
26+
│ │ • Parse acceptance criteria from story │ │
27+
│ │ • Match criteria to artifacts via annotations │ │
28+
│ │ • Route to configured model (Ollama/Claude) │ │
29+
│ │ • Aggregate results with confidence levels │ │
30+
│ └─────────────────────────────────────────────────┘ │
31+
│ │ │
32+
│ ▼ │
33+
│ ┌───────────────┐ │
34+
│ │ AI Report │ │
35+
│ │ (markdown) │ │
36+
│ └───────────────┘ │
37+
└─────────────────────────────────────────────────────────────────┘
38+
```
39+
40+
## Key Decisions
41+
42+
| Decision | Choice | Rationale |
43+
|----------|--------|-----------|
44+
| **Default model** | Ollama qwen3-vl:32b | Local-first, no API costs, good multimodal support |
45+
| **Implementation** | TypeScript | Consistent with existing web-ui tooling |
46+
| **Configuration** | TOML file | Simple, human-readable, matches project style |
47+
| **Confidence levels** | High/Medium/Low | Clear thresholds for trust vs review |
48+
| **Report format** | Markdown | Matches existing verification reports |
49+
50+
## Components
51+
52+
### Story Parser (`verification/scripts/lib/parser.ts`)
53+
54+
Extracts acceptance criteria and verification annotations from story markdown.
55+
56+
```typescript
57+
interface ParsedStory {
58+
id: string;
59+
title: string;
60+
scope: string;
61+
criteria: Criterion[];
62+
}
63+
64+
interface Criterion {
65+
text: string;
66+
annotation?: {
67+
type: 'snapshot' | 'checkpoint' | 'video';
68+
name: string;
69+
hint?: string; // Optional: "should show 3 sessions"
70+
};
71+
}
72+
```
73+
74+
### Artifact Collector (`verification/scripts/lib/collector.ts`)
75+
76+
Gathers referenced artifacts from `verification/` directory.
77+
78+
```typescript
79+
interface CollectedArtifact {
80+
type: 'image' | 'video';
81+
path: string;
82+
data: Buffer; // For images, base64 for API
83+
}
84+
```
85+
86+
### Model Router (`verification/scripts/lib/router.ts`)
87+
88+
Routes artifacts to configured model.
89+
90+
```typescript
91+
interface ModelConfig {
92+
provider: 'ollama' | 'claude';
93+
model: string; // e.g., 'qwen3-vl:32b' or 'claude-sonnet-4-20250514'
94+
}
95+
96+
async function analyze(
97+
artifact: CollectedArtifact,
98+
criterion: Criterion,
99+
config: ModelConfig
100+
): Promise<Verdict>
101+
```
102+
103+
### Report Generator (`verification/scripts/lib/report.ts`)
104+
105+
Produces markdown report from verdicts.
106+
107+
## Configuration
108+
109+
`verification/config.toml`:
110+
111+
```toml
112+
[ai]
113+
# Default model for all artifacts
114+
default_model = "ollama:qwen3-vl:32b"
115+
116+
# Confidence thresholds
117+
[ai.confidence]
118+
high = 80 # ≥80% = trust result
119+
medium = 50 # 50-79% = review recommended
120+
# <50% = human review required
121+
```
122+
123+
## Commands
124+
125+
| Command | Description |
126+
|---------|-------------|
127+
| `just verify ai <story-id>` | Run AI verification for a story |
128+
| `just verify ai <story-id> --model "ollama:llava:34b"` | Override model |
129+
130+
### Just Command
131+
132+
```just
133+
# Run AI verification for a story
134+
ai STORY_ID *ARGS:
135+
npx tsx verification/scripts/ai-verify.ts {{STORY_ID}} {{ARGS}}
136+
```
137+
138+
## Report Format
139+
140+
Generated at `verification/reports/<scope>/<id>-ai.md`:
141+
142+
```markdown
143+
# AI Verification Report: FEAT0109
144+
145+
**Story:** Board generator grouped layout
146+
**Scope:** coherence-verification/01-artifact-pipeline
147+
**Model:** ollama:qwen3-vl:32b
148+
**Generated:** 2026-01-19 14:32:05
149+
150+
## Summary
151+
152+
| Result | Count |
153+
|--------|-------|
154+
| ✅ Pass | 3 |
155+
| ❌ Fail | 1 |
156+
| ⚠️ Needs Review | 1 |
157+
158+
## Criteria
159+
160+
### 1. Sessions page displays list
161+
**Artifact:** `snapshots/sessions.png`
162+
**Verdict:** ✅ Pass (High confidence: 92%)
163+
164+
> The screenshot shows a sessions page with a table containing 4 session
165+
> rows. Each row displays session ID, status, and timestamp.
166+
```
167+
168+
## Prompt Structure
169+
170+
```typescript
171+
const VERIFICATION_PROMPT = `
172+
You are verifying if a UI artifact meets an acceptance criterion.
173+
174+
**Criterion:** {criterion}
175+
176+
**Additional context:** {annotation_hint}
177+
178+
Analyze the provided artifact and determine:
179+
1. Does this artifact demonstrate the criterion is met?
180+
2. What specific evidence supports your verdict?
181+
3. How confident are you? (0-100%)
182+
183+
Respond in JSON:
184+
{
185+
"verdict": "pass" | "fail" | "unclear",
186+
"confidence": <0-100>,
187+
"evidence": "<what you observed>",
188+
"suggestion": "<if fail, how to fix>" | null
189+
}
190+
`;
191+
```
192+
193+
## Error Handling
194+
195+
| Scenario | Behavior |
196+
|----------|----------|
197+
| Story not found | Exit with error: `Story FEAT0109 not found` |
198+
| No acceptance criteria | Exit with error: `No acceptance criteria found` |
199+
| No verify annotations | Warning, analyze all criteria against all artifacts |
200+
| Artifact missing | Report shows `⚠️ Artifact not found` |
201+
| Ollama not running | Exit with error: `Ollama not reachable. Run: ollama serve` |
202+
| Model not available | Exit with error: `Model not found. Run: ollama pull <model>` |
203+
| Model timeout | Report shows `⚠️ Analysis timed out` |
204+
205+
## Data Flow
206+
207+
1. User runs `just verify ai FEAT0109`
208+
2. Script finds story file by ID
209+
3. Parser extracts criteria and annotations
210+
4. Collector gathers referenced artifacts
211+
5. Router loads config, determines model
212+
6. For each criterion:
213+
- Send artifact + criterion to model
214+
- Receive verdict with confidence
215+
7. Report generator creates markdown
216+
8. Report saved to `verification/reports/<scope>/<id>-ai.md`
217+
218+
## Dependencies
219+
220+
```json
221+
{
222+
"devDependencies": {
223+
"ollama": "^0.5.0",
224+
"@anthropic-ai/sdk": "^0.25.0",
225+
"toml": "^3.0.0",
226+
"tsx": "^4.0.0"
227+
}
228+
}
229+
```
230+
231+
## Deliverables
232+
233+
- [ ] `verification/config.toml`Model configuration
234+
- [ ] `verification/templates/ai-report.md`Report template
235+
- [ ] `verification/scripts/ai-verify.ts`Main entry point
236+
- [ ] `verification/scripts/lib/parser.ts`Story parser
237+
- [ ] `verification/scripts/lib/collector.ts`Artifact collector
238+
- [ ] `verification/scripts/lib/router.ts`Model router
239+
- [ ] `verification/scripts/lib/report.ts`Report generator
240+
- [ ] `.justfiles/verify.just`Add `ai` command
241+
- [ ] Documentation in CLAUDE.md
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
id: 04-ai-assisted-verification
3+
title: AI-Assisted Verification
4+
status: backlog
5+
epic: coherence-verification
6+
created: 2026-01-19
7+
---
8+
9+
# AI-Assisted Verification
10+
11+
> Use multimodal AI to validate artifacts against story acceptance criteria.
12+
13+
## Documents
14+
15+
| Document | Description |
16+
|----------|-------------|
17+
| [SRS.md](SRS.md) | Requirements and verification criteria |
18+
| [DESIGN.md](DESIGN.md) | Architecture and implementation details |
19+
20+
## Stories
21+
22+
| # | Story | Description | Status |
23+
|---|-------|-------------|--------|
24+
| 1 | [CHORE0146](../../../../stages/backlog/stories/[CHORE][0146]-create-ai-verification-config.md) | Create AI verification config and templates | backlog |
25+
| 2 | [FEAT0197](../../../../stages/backlog/stories/[FEAT][0197]-implement-story-parser.md) | Implement story parser for AI verification | backlog |
26+
| 3 | [FEAT0198](../../../../stages/backlog/stories/[FEAT][0198]-implement-artifact-collector.md) | Implement artifact collector | backlog |
27+
| 4 | [FEAT0199](../../../../stages/backlog/stories/[FEAT][0199]-implement-model-router.md) | Implement model router for Ollama and Claude | backlog |
28+
| 5 | [FEAT0200](../../../../stages/backlog/stories/[FEAT][0200]-implement-ai-report-generator.md) | Implement AI report generator | backlog |
29+
| 6 | [FEAT0201](../../../../stages/backlog/stories/[FEAT][0201]-add-verify-ai-command.md) | Add just verify ai command | backlog |
30+
31+
## Progress
32+
33+
**Requirements:** 0/14 verified
34+
**Stories:** 0/6 complete
35+

0 commit comments

Comments
 (0)