|
| 1 | +# AI-Assisted Verification — Design Document |
| 2 | + |
| 3 | +> Configurable multimodal AI analysis of verification artifacts against acceptance criteria. |
| 4 | +
|
| 5 | +**SRS:** [SRS.md](SRS.md) |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +AI-Assisted Verification adds intelligent analysis to the existing verification pipeline. Given a story and its captured artifacts, an AI model reviews whether acceptance criteria are met and produces a detailed report with confidence levels. |
| 10 | + |
| 11 | +## Architecture |
| 12 | + |
| 13 | +``` |
| 14 | +┌─────────────────────────────────────────────────────────────────┐ |
| 15 | +│ AI VERIFICATION FLOW │ |
| 16 | +│ │ |
| 17 | +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ |
| 18 | +│ │ Story │ │ Artifacts │ │ Model │ │ |
| 19 | +│ │ Parser │──▶│ Collector │──▶│ Router │ │ |
| 20 | +│ │ │ │ │ │ │ │ |
| 21 | +│ └─────────────┘ └─────────────┘ └─────────────┘ │ |
| 22 | +│ │ │ │ │ |
| 23 | +│ ▼ ▼ ▼ │ |
| 24 | +│ ┌─────────────────────────────────────────────────┐ │ |
| 25 | +│ │ Verification Engine │ │ |
| 26 | +│ │ • Parse acceptance criteria from story │ │ |
| 27 | +│ │ • Match criteria to artifacts via annotations │ │ |
| 28 | +│ │ • Route to configured model (Ollama/Claude) │ │ |
| 29 | +│ │ • Aggregate results with confidence levels │ │ |
| 30 | +│ └─────────────────────────────────────────────────┘ │ |
| 31 | +│ │ │ |
| 32 | +│ ▼ │ |
| 33 | +│ ┌───────────────┐ │ |
| 34 | +│ │ AI Report │ │ |
| 35 | +│ │ (markdown) │ │ |
| 36 | +│ └───────────────┘ │ |
| 37 | +└─────────────────────────────────────────────────────────────────┘ |
| 38 | +``` |
| 39 | + |
| 40 | +## Key Decisions |
| 41 | + |
| 42 | +| Decision | Choice | Rationale | |
| 43 | +|----------|--------|-----------| |
| 44 | +| **Default model** | Ollama qwen3-vl:32b | Local-first, no API costs, good multimodal support | |
| 45 | +| **Implementation** | TypeScript | Consistent with existing web-ui tooling | |
| 46 | +| **Configuration** | TOML file | Simple, human-readable, matches project style | |
| 47 | +| **Confidence levels** | High/Medium/Low | Clear thresholds for trust vs review | |
| 48 | +| **Report format** | Markdown | Matches existing verification reports | |
| 49 | + |
| 50 | +## Components |
| 51 | + |
| 52 | +### Story Parser (`verification/scripts/lib/parser.ts`) |
| 53 | + |
| 54 | +Extracts acceptance criteria and verification annotations from story markdown. |
| 55 | + |
| 56 | +```typescript |
| 57 | +interface ParsedStory { |
| 58 | + id: string; |
| 59 | + title: string; |
| 60 | + scope: string; |
| 61 | + criteria: Criterion[]; |
| 62 | +} |
| 63 | + |
| 64 | +interface Criterion { |
| 65 | + text: string; |
| 66 | + annotation?: { |
| 67 | + type: 'snapshot' | 'checkpoint' | 'video'; |
| 68 | + name: string; |
| 69 | + hint?: string; // Optional: "should show 3 sessions" |
| 70 | + }; |
| 71 | +} |
| 72 | +``` |
| 73 | + |
| 74 | +### Artifact Collector (`verification/scripts/lib/collector.ts`) |
| 75 | + |
| 76 | +Gathers referenced artifacts from `verification/` directory. |
| 77 | + |
| 78 | +```typescript |
| 79 | +interface CollectedArtifact { |
| 80 | + type: 'image' | 'video'; |
| 81 | + path: string; |
| 82 | + data: Buffer; // For images, base64 for API |
| 83 | +} |
| 84 | +``` |
| 85 | + |
| 86 | +### Model Router (`verification/scripts/lib/router.ts`) |
| 87 | + |
| 88 | +Routes artifacts to configured model. |
| 89 | + |
| 90 | +```typescript |
| 91 | +interface ModelConfig { |
| 92 | + provider: 'ollama' | 'claude'; |
| 93 | + model: string; // e.g., 'qwen3-vl:32b' or 'claude-sonnet-4-20250514' |
| 94 | +} |
| 95 | + |
| 96 | +async function analyze( |
| 97 | + artifact: CollectedArtifact, |
| 98 | + criterion: Criterion, |
| 99 | + config: ModelConfig |
| 100 | +): Promise<Verdict> |
| 101 | +``` |
| 102 | + |
| 103 | +### Report Generator (`verification/scripts/lib/report.ts`) |
| 104 | + |
| 105 | +Produces markdown report from verdicts. |
| 106 | + |
| 107 | +## Configuration |
| 108 | + |
| 109 | +`verification/config.toml`: |
| 110 | + |
| 111 | +```toml |
| 112 | +[ai] |
| 113 | +# Default model for all artifacts |
| 114 | +default_model = "ollama:qwen3-vl:32b" |
| 115 | +
|
| 116 | +# Confidence thresholds |
| 117 | +[ai.confidence] |
| 118 | +high = 80 # ≥80% = trust result |
| 119 | +medium = 50 # 50-79% = review recommended |
| 120 | +# <50% = human review required |
| 121 | +``` |
| 122 | + |
| 123 | +## Commands |
| 124 | + |
| 125 | +| Command | Description | |
| 126 | +|---------|-------------| |
| 127 | +| `just verify ai <story-id>` | Run AI verification for a story | |
| 128 | +| `just verify ai <story-id> --model "ollama:llava:34b"` | Override model | |
| 129 | + |
| 130 | +### Just Command |
| 131 | + |
| 132 | +```just |
| 133 | +# Run AI verification for a story |
| 134 | +ai STORY_ID *ARGS: |
| 135 | + npx tsx verification/scripts/ai-verify.ts {{STORY_ID}} {{ARGS}} |
| 136 | +``` |
| 137 | + |
| 138 | +## Report Format |
| 139 | + |
| 140 | +Generated at `verification/reports/<scope>/<id>-ai.md`: |
| 141 | + |
| 142 | +```markdown |
| 143 | +# AI Verification Report: FEAT0109 |
| 144 | +
|
| 145 | +**Story:** Board generator grouped layout |
| 146 | +**Scope:** coherence-verification/01-artifact-pipeline |
| 147 | +**Model:** ollama:qwen3-vl:32b |
| 148 | +**Generated:** 2026-01-19 14:32:05 |
| 149 | +
|
| 150 | +## Summary |
| 151 | +
|
| 152 | +| Result | Count | |
| 153 | +|--------|-------| |
| 154 | +| ✅ Pass | 3 | |
| 155 | +| ❌ Fail | 1 | |
| 156 | +| ⚠️ Needs Review | 1 | |
| 157 | +
|
| 158 | +## Criteria |
| 159 | +
|
| 160 | +### 1. Sessions page displays list |
| 161 | +**Artifact:** `snapshots/sessions.png` |
| 162 | +**Verdict:** ✅ Pass (High confidence: 92%) |
| 163 | +
|
| 164 | +> The screenshot shows a sessions page with a table containing 4 session |
| 165 | +> rows. Each row displays session ID, status, and timestamp. |
| 166 | +``` |
| 167 | + |
| 168 | +## Prompt Structure |
| 169 | + |
| 170 | +```typescript |
| 171 | +const VERIFICATION_PROMPT = ` |
| 172 | +You are verifying if a UI artifact meets an acceptance criterion. |
| 173 | + |
| 174 | +**Criterion:** {criterion} |
| 175 | + |
| 176 | +**Additional context:** {annotation_hint} |
| 177 | + |
| 178 | +Analyze the provided artifact and determine: |
| 179 | +1. Does this artifact demonstrate the criterion is met? |
| 180 | +2. What specific evidence supports your verdict? |
| 181 | +3. How confident are you? (0-100%) |
| 182 | + |
| 183 | +Respond in JSON: |
| 184 | +{ |
| 185 | + "verdict": "pass" | "fail" | "unclear", |
| 186 | + "confidence": <0-100>, |
| 187 | + "evidence": "<what you observed>", |
| 188 | + "suggestion": "<if fail, how to fix>" | null |
| 189 | +} |
| 190 | +`; |
| 191 | +``` |
| 192 | + |
| 193 | +## Error Handling |
| 194 | + |
| 195 | +| Scenario | Behavior | |
| 196 | +|----------|----------| |
| 197 | +| Story not found | Exit with error: `Story FEAT0109 not found` | |
| 198 | +| No acceptance criteria | Exit with error: `No acceptance criteria found` | |
| 199 | +| No verify annotations | Warning, analyze all criteria against all artifacts | |
| 200 | +| Artifact missing | Report shows `⚠️ Artifact not found` | |
| 201 | +| Ollama not running | Exit with error: `Ollama not reachable. Run: ollama serve` | |
| 202 | +| Model not available | Exit with error: `Model not found. Run: ollama pull <model>` | |
| 203 | +| Model timeout | Report shows `⚠️ Analysis timed out` | |
| 204 | + |
| 205 | +## Data Flow |
| 206 | + |
| 207 | +1. User runs `just verify ai FEAT0109` |
| 208 | +2. Script finds story file by ID |
| 209 | +3. Parser extracts criteria and annotations |
| 210 | +4. Collector gathers referenced artifacts |
| 211 | +5. Router loads config, determines model |
| 212 | +6. For each criterion: |
| 213 | + - Send artifact + criterion to model |
| 214 | + - Receive verdict with confidence |
| 215 | +7. Report generator creates markdown |
| 216 | +8. Report saved to `verification/reports/<scope>/<id>-ai.md` |
| 217 | + |
| 218 | +## Dependencies |
| 219 | + |
| 220 | +```json |
| 221 | +{ |
| 222 | + "devDependencies": { |
| 223 | + "ollama": "^0.5.0", |
| 224 | + "@anthropic-ai/sdk": "^0.25.0", |
| 225 | + "toml": "^3.0.0", |
| 226 | + "tsx": "^4.0.0" |
| 227 | + } |
| 228 | +} |
| 229 | +``` |
| 230 | + |
| 231 | +## Deliverables |
| 232 | + |
| 233 | +- [ ] `verification/config.toml` — Model configuration |
| 234 | +- [ ] `verification/templates/ai-report.md` — Report template |
| 235 | +- [ ] `verification/scripts/ai-verify.ts` — Main entry point |
| 236 | +- [ ] `verification/scripts/lib/parser.ts` — Story parser |
| 237 | +- [ ] `verification/scripts/lib/collector.ts` — Artifact collector |
| 238 | +- [ ] `verification/scripts/lib/router.ts` — Model router |
| 239 | +- [ ] `verification/scripts/lib/report.ts` — Report generator |
| 240 | +- [ ] `.justfiles/verify.just` — Add `ai` command |
| 241 | +- [ ] Documentation in CLAUDE.md |
0 commit comments