-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Summary
Algorithm v3.7.0 transitions directly from PLAN to BUILD without validating that planned prerequisites actually exist. Running the PAI Upgrade skill's reflection mining workflow against 62 algorithm reflections over 3 weeks revealed this as the #1 recurring failure pattern — 18 occurrences (29% of sessions) across diverse task types — causing 1-5 wasted iterations per incident as the agent discovers missing tools, wrong data formats, or unexpected page structures mid-execution.
Evidence from Reflection Mining
The PAI Upgrade skill's MineReflections workflow clusters Q1/Q2 answers from algorithm-reflections.jsonl by similarity. The PROBE-BEFORE-BUILD cluster had the highest frequency of any theme:
| Pattern | Occurrences | Example Quote |
|---|---|---|
| Tool/binary assumed to exist | 6 | "Should have verified [tool] existence before PLAN phase instead of discovering it failed in EXECUTE" |
| Data format assumed without sampling | 5 | "Should have immediately verified the data format before assuming [expected format] exists" |
| DOM/page structure assumed | 4 | "Should have inspected the actual DOM structure first before assuming standard selectors would work" |
| API/auth not validated | 3 | "Should have anticipated connection authorization issue before building around it" |
Average wasted iterations per incident: ~2.5. At 18 incidents over 62 sessions, this is roughly 45 wasted iterations that a single prerequisite gate would have prevented.
Proposal
Add a PREREQUISITE GATE to the Algorithm that mandates validating assumptions before committing to execution. The gate runs lightweight probes (single commands, not full scripts) to confirm that the planned approach is viable.
Gate Placement Options
| Option | Where | Pros | Cons |
|---|---|---|---|
| A. End of PLAN phase | After planning, before BUILD. "Validate what you just planned." | Probes are logically part of planning. Keeps BUILD/EXECUTE clean. No phase renumbering. | Blurs the PLAN phase's scope slightly. |
| B. Start of BUILD phase | First action in BUILD, before any construction. "Check your tools before building." | Semantically clean — you're in BUILD, so check tools. No new sections in PLAN. | BUILD phase gets longer. Easy to skip under time pressure. |
| C. New phase 3.5: PROBE | Explicit new phase between PLAN and BUILD. Own header, own voice announcement. | Maximum visibility. Impossible to skip. Clear audit trail in PRD. | Changes phase numbering (7→8). More ceremony. |
Gate Strictness Options
| Option | Behavior | When a Probe Fails |
|---|---|---|
| Hard gate | MUST revise plan before proceeding | Cannot enter BUILD with any failed prerequisite. Strictest — prevents the most waste. |
| Soft gate | Log as risk, proceed if agent judges acceptable | Document in PRD under Risks, but allow BUILD entry. More flexible, may not prevent the pattern. |
| Tiered | Hard for tools, soft for data | Tool/binary existence = hard (can't proceed without the tool). Data format/DOM = soft (try likely format, pivot if wrong). Balanced. |
Probe Categories to Consider
These are the recurring failure categories from reflection mining. All, some, or a different set could be mandatory:
- Tool/binary existence — Verify all tools planned for use actually exist (
which <tool>,python3 -c "import <lib>", etc.) - Data format sampling — Fetch/inspect ONE sample from each data source to confirm expected format (image vs text, JSON vs HTML, etc.)
- DOM/page structure inspection — For browser automation tasks: inspect actual page structure before writing selectors
- API/auth validation — For API-dependent tasks: verify endpoints are reachable and auth tokens are valid
Suggested Output Format
🔍 PREREQUISITE PROBES:
🔍 [Tool]: [command to verify] → [PASS/FAIL]
🔍 [Data]: [sample command] → [format confirmed / unexpected → revise plan]
🔍 [DOM]: [inspection command] → [structure confirmed / unexpected → revise plan]
🔍 [Auth]: [validation command] → [PASS/FAIL]
⚠️ PROBE FAILURES: [list any, with revised plan if hard-gate category]
Total probe time: 10-30 seconds for most tasks, capped at 60 seconds.
PRD Integration
Probe results go into the PRD's ## Context section under ### Prerequisites. Failed probes that trigger plan revisions get documented in ## Decisions.
Related Patterns
The same reflection mining run surfaced two connected themes:
- PROGRAMMATIC-FIRST (10 occurrences): Defaulting to browser automation when a direct API would be faster. A tool selection heuristic in the probe gate could enforce API-first choices.
- COOKIE-AND-AUTH-HANDLING (5 occurrences): Browser scripts failing on known-problematic domains. A domain behavior lookup during probing would catch these.
Impact
- Sessions affected: ~29% of Algorithm runs (18/62)
- Iterations saved per incident: ~2.5
- Total waste prevented: ~45 iterations over 3 weeks
- Implementation effort: Low-Medium (Algorithm text change + PRD format extension)
Identified by the PAI Upgrade skill's MineReflections workflow, which clusters algorithm-reflections.jsonl entries by theme and frequency to surface structural improvement candidates.