Skip to content

feat(algorithm): Add PROBE-BEFORE-BUILD prerequisite validation gate #1001

@catchingknives

Description

@catchingknives

Summary

Algorithm v3.7.0 transitions directly from PLAN to BUILD without validating that planned prerequisites actually exist. Running the PAI Upgrade skill's reflection mining workflow against 62 algorithm reflections over 3 weeks revealed this as the #1 recurring failure pattern — 18 occurrences (29% of sessions) across diverse task types — causing 1-5 wasted iterations per incident as the agent discovers missing tools, wrong data formats, or unexpected page structures mid-execution.

Evidence from Reflection Mining

The PAI Upgrade skill's MineReflections workflow clusters Q1/Q2 answers from algorithm-reflections.jsonl by similarity. The PROBE-BEFORE-BUILD cluster had the highest frequency of any theme:

Pattern Occurrences Example Quote
Tool/binary assumed to exist 6 "Should have verified [tool] existence before PLAN phase instead of discovering it failed in EXECUTE"
Data format assumed without sampling 5 "Should have immediately verified the data format before assuming [expected format] exists"
DOM/page structure assumed 4 "Should have inspected the actual DOM structure first before assuming standard selectors would work"
API/auth not validated 3 "Should have anticipated connection authorization issue before building around it"

Average wasted iterations per incident: ~2.5. At 18 incidents over 62 sessions, this is roughly 45 wasted iterations that a single prerequisite gate would have prevented.

Proposal

Add a PREREQUISITE GATE to the Algorithm that mandates validating assumptions before committing to execution. The gate runs lightweight probes (single commands, not full scripts) to confirm that the planned approach is viable.

Gate Placement Options

Option Where Pros Cons
A. End of PLAN phase After planning, before BUILD. "Validate what you just planned." Probes are logically part of planning. Keeps BUILD/EXECUTE clean. No phase renumbering. Blurs the PLAN phase's scope slightly.
B. Start of BUILD phase First action in BUILD, before any construction. "Check your tools before building." Semantically clean — you're in BUILD, so check tools. No new sections in PLAN. BUILD phase gets longer. Easy to skip under time pressure.
C. New phase 3.5: PROBE Explicit new phase between PLAN and BUILD. Own header, own voice announcement. Maximum visibility. Impossible to skip. Clear audit trail in PRD. Changes phase numbering (7→8). More ceremony.

Gate Strictness Options

Option Behavior When a Probe Fails
Hard gate MUST revise plan before proceeding Cannot enter BUILD with any failed prerequisite. Strictest — prevents the most waste.
Soft gate Log as risk, proceed if agent judges acceptable Document in PRD under Risks, but allow BUILD entry. More flexible, may not prevent the pattern.
Tiered Hard for tools, soft for data Tool/binary existence = hard (can't proceed without the tool). Data format/DOM = soft (try likely format, pivot if wrong). Balanced.

Probe Categories to Consider

These are the recurring failure categories from reflection mining. All, some, or a different set could be mandatory:

  1. Tool/binary existence — Verify all tools planned for use actually exist (which <tool>, python3 -c "import <lib>", etc.)
  2. Data format sampling — Fetch/inspect ONE sample from each data source to confirm expected format (image vs text, JSON vs HTML, etc.)
  3. DOM/page structure inspection — For browser automation tasks: inspect actual page structure before writing selectors
  4. API/auth validation — For API-dependent tasks: verify endpoints are reachable and auth tokens are valid

Suggested Output Format

🔍 PREREQUISITE PROBES:
 🔍 [Tool]: [command to verify] → [PASS/FAIL]
 🔍 [Data]: [sample command] → [format confirmed / unexpected → revise plan]
 🔍 [DOM]: [inspection command] → [structure confirmed / unexpected → revise plan]
 🔍 [Auth]: [validation command] → [PASS/FAIL]

⚠️ PROBE FAILURES: [list any, with revised plan if hard-gate category]

Total probe time: 10-30 seconds for most tasks, capped at 60 seconds.

PRD Integration

Probe results go into the PRD's ## Context section under ### Prerequisites. Failed probes that trigger plan revisions get documented in ## Decisions.

Related Patterns

The same reflection mining run surfaced two connected themes:

  • PROGRAMMATIC-FIRST (10 occurrences): Defaulting to browser automation when a direct API would be faster. A tool selection heuristic in the probe gate could enforce API-first choices.
  • COOKIE-AND-AUTH-HANDLING (5 occurrences): Browser scripts failing on known-problematic domains. A domain behavior lookup during probing would catch these.

Impact

  • Sessions affected: ~29% of Algorithm runs (18/62)
  • Iterations saved per incident: ~2.5
  • Total waste prevented: ~45 iterations over 3 weeks
  • Implementation effort: Low-Medium (Algorithm text change + PRD format extension)

Identified by the PAI Upgrade skill's MineReflections workflow, which clusters algorithm-reflections.jsonl entries by theme and frequency to surface structural improvement candidates.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions