-
Notifications
You must be signed in to change notification settings - Fork 168
Description
Summary
Restructure the AI-DLC workflows from a rules-only delivery model to a multi-mechanism architecture using rules, skills, subagents, and hooks. The core change: convert the 532-line always-loaded core-workflow.md into a lean AGENTS.md (~125 lines of permanent constraints) plus 16 on-demand skills (13 stage skills + 1 orchestrator skill that replaces core-workflow.md + 1 workflow-changes skill + 1 security-check fallback skill). For typical workflows, total main-context consumption drops 33–45% depending on the number of stages executed, with always-loaded context reduced from ~12,000 tokens to ~3,000 tokens (75% reduction at session start). The restructure also enables independent stage invocation, deterministic quality enforcement through hooks, context-isolated security review via subagents (with skill-based fallback on platforms without subagent support), and explicit mid-workflow change handling (going back, skipping, cascade re-generation).
Motivation
Current State: Measured Context Consumption
Today, AI-DLC is delivered entirely as rules (markdown files placed in platform-specific directories). The following measurements are from the current aidlc-rules/ directory:
Always loaded at workflow start (~12,000 tokens):
| File | Lines | Words | Est. Tokens |
|---|---|---|---|
core-workflow.md |
532 | 3,271 | ~4,250 |
process-overview.md |
140 | 733 | ~950 |
question-format-guide.md |
332 | 1,343 | ~1,750 |
session-continuity.md |
46 | 336 | ~440 |
content-validation.md |
78 | 376 | ~490 |
welcome-message.md |
109 | 602 | ~780 |
security-baseline.md |
323 | 2,601 | ~3,380 |
| Total always loaded | 1,560 | 9,262 | ~12,040 |
Loaded on demand per stage (~18,000 tokens across 13 files):
| Group | Lines | Words | Est. Tokens |
|---|---|---|---|
| Inception stages (7 files) | 1,722 | 8,824 | ~11,470 |
| Construction stages (6 files) | 973 | 5,050 | ~6,570 |
| Total on demand | 2,695 | 13,874 | ~18,040 |
Other common files loaded as needed (~7,400 tokens across 6 files):
| File | Lines | Words | Est. Tokens |
|---|---|---|---|
error-handling.md |
373 | 1,833 | ~2,380 |
workflow-changes.md |
285 | 1,561 | ~2,030 |
terminology.md |
189 | 925 | ~1,200 |
ascii-diagram-standards.md |
116 | 367 | ~480 |
overconfidence-prevention.md |
99 | 591 | ~770 |
depth-levels.md |
73 | 382 | ~500 |
| Total other common | 1,135 | 5,659 | ~7,360 |
Grand total across all 27 files: 5,409 lines, 28,880 words, ~37,500 tokens.
The ~12,000 tokens always loaded at workflow start represent a meaningful cost. On a 200k-token context window, this is 6%. But on models with smaller context windows, or in brownfield projects where the AI also needs to hold thousands of lines of existing code in context, this overhead compounds. More critically, the bulk of those 12,000 tokens are workflow orchestration instructions (core-workflow.md at 4,250 tokens) and question formatting rules (question-format-guide.md at 1,750 tokens)—content that is only relevant during specific stages, not permanently.
Problems with the rules-only approach
-
Upfront context cost for stage-specific content: 12,000 tokens are loaded before any work begins. Of those, only ~2,000 tokens (directory structure, terminology, content validation basics) are genuinely needed at all times. The remaining ~10,000 tokens are orchestration logic, question formatting, session resumption, and security rules—content needed only at specific stages.
-
No independent stage invocation: A user cannot say "just reverse-engineer my codebase" or "just generate requirements" without triggering the entire AI-DLC workflow. Every entry point goes through the full
core-workflow.mdorchestration. -
No automated enforcement: Content validation (Mermaid syntax, ASCII diagrams), artifact existence checks, and build verification are instructions to the AI rather than deterministic automated checks. The AI may skip or forget them.
-
No composability: The workflow stages cannot be mixed with other tools, triggered independently, or reused as building blocks. Everything is monolithic.
-
Context accumulation across stages: When the AI loads
inception/reverse-engineering.md(311 lines) during the reverse engineering stage, those instructions persist in the conversation context through subsequent stages. Over a full workflow run, on-demand files accumulate. There is no platform-supported mechanism to remove content from context once loaded (see "Skill Context Lifecycle" section below). -
Platform fragmentation: The core workflow contains platform-detection logic (check 3 paths for rule details). Each platform requires different file placement, and some require special wrapping (Cursor needs YAML frontmatter).
Detailed Proposal
Skill Context Lifecycle: What Actually Happens
Before describing the architecture, it's important to be precise about how skills behave across platforms. The Agent Skills specification and platform implementations follow a deferred loading model, not a load/unload lifecycle:
- At session start: Only skill
nameanddescriptionfields are loaded into the system prompt (~50-100 tokens per skill). This is how the agent knows what skills are available. - On invocation: When a task matches a skill (or the user explicitly triggers it), the full
SKILL.mdbody is loaded into the conversation context. - After completion: The skill content remains in the conversation context. No platform (Claude Code, Kiro, Cursor) supports explicit skill unloading. Content persists until the context is compacted or the session ends.
This means the primary context benefit is deferred loading, not unloading. In the current rules-only model, 12,000 tokens load upfront. With skills, only ~980 tokens of AGENTS.md + ~1,600 tokens of 16 skill descriptions + ~400 tokens of subagent descriptions load upfront (~3,000 tokens total at session start, before any stage is invoked). Each stage's content loads only when that stage is reached.
For stages requiring true context isolation, Claude Code supports context: fork in skill frontmatter, which runs the skill in a subagent with its own context window. The skill's content never enters the main conversation context. This RFC uses this mechanism for context-heavy stages (reverse engineering, security review).
How skill invocation works mechanically
The orchestration map in AGENTS.md says things like "Invoke aidlc-reverse-engineering." This relies on a specific interaction between rules and skills:
- AGENTS.md is always in context (it's a rule). The orchestration map is always visible to the agent.
- Skill descriptions are always indexed at session start (~50-100 tokens each). The agent sees the names and descriptions of all available skills.
- When the agent follows the orchestration map and reaches "Invoke
aidlc-reverse-engineering", it recognizes this as a registered skill name and triggers it. The platform then loads the full SKILL.md body into context. - On re-invocation (e.g., the same skill for a second unit of work), the skill content is already in context from the first invocation—it does not load a second copy. The agent simply re-follows the procedure.
This is an agent-directed loading model: the AGENTS.md instructs the agent when to trigger each skill; the platform handles how (loading the body into context). This differs from purely platform-controlled loading (where the platform heuristically decides which skill matches). The orchestration map provides the deterministic sequencing; skills provide the procedural content.
For users invoking individual stages directly (e.g., /aidlc-reverse-engineering), the platform's standard slash-command mechanism triggers the skill without needing the orchestration map.
Platform support for the deferred loading model:
| Platform | Skill descriptions at startup | Full content on invocation | context: fork isolation |
|---|---|---|---|
| Claude Code | Yes (2% context budget) | Yes | Yes |
| Kiro | Yes | Yes | No |
| Cursor | Yes | Yes | No |
| 30+ others via agentskills.io | Yes | Yes | Varies |
New Directory Structure
aidlc/
├── AGENTS.md # Rules: always-on project constraints
├── skills/ # Skills: stage procedures (deferred loading)
│ ├── aidlc/ # Orchestrator: workflow entry point
│ │ └── SKILL.md
│ ├── aidlc-workspace-detection/
│ │ └── SKILL.md
│ ├── aidlc-reverse-engineering/
│ │ └── SKILL.md
│ ├── aidlc-requirements-analysis/
│ │ └── SKILL.md
│ ├── aidlc-user-stories/
│ │ └── SKILL.md
│ ├── aidlc-workflow-planning/
│ │ └── SKILL.md
│ ├── aidlc-application-design/
│ │ └── SKILL.md
│ ├── aidlc-units-generation/
│ │ └── SKILL.md
│ ├── aidlc-functional-design/
│ │ └── SKILL.md
│ ├── aidlc-nfr-requirements/
│ │ └── SKILL.md
│ ├── aidlc-nfr-design/
│ │ └── SKILL.md
│ ├── aidlc-infrastructure-design/
│ │ └── SKILL.md
│ ├── aidlc-code-generation/
│ │ └── SKILL.md
│ ├── aidlc-build-and-test/
│ │ └── SKILL.md
│ ├── aidlc-workflow-changes/ # Mid-workflow changes: go back, skip, cascade
│ │ └── SKILL.md
│ └── aidlc-security-check/ # Security fallback for non-subagent platforms
│ └── SKILL.md
├── agents/ # Subagents: isolated context work
│ ├── aidlc-security-reviewer.md # Cross-cutting security compliance
│ ├── aidlc-reverse-engineer.md # Large codebase analysis
│ └── aidlc-code-reviewer.md # Post-generation code review
└── scripts/ # Supporting scripts for hooks + build
├── validate-mermaid.sh # Mermaid syntax validation
├── validate-audit-format.sh # Audit log format checker
└── build-rules.sh # Backward-compat rules generation
Mechanism 1: Rules (AGENTS.md) — Always-on constraints
The AGENTS.md replaces core-workflow.md + mandatory common files with a lean file containing only what the AI needs at all times. Content is organized into two categories: project constraints (things the AI must always obey) and skill orchestration (how the AI navigates between skills).
Content inventory:
| Section | Source | Est. Lines | Purpose |
|---|---|---|---|
| Adaptive workflow principle | core-workflow.md lines 1-9 |
10 | Core philosophy |
| Directory structure | core-workflow.md lines 502-532 |
20 | aidlc-docs/ layout |
| Terminology glossary | common/terminology.md |
20 | Key terms (subset) |
| Overconfidence prevention | common/overconfidence-prevention.md |
15 | "When in doubt, ask" guidelines |
| Approval gate protocol | core-workflow.md lines 442-454 |
15 | Standardized 2-option messages, audit requirements |
| Content validation basics | common/content-validation.md |
15 | Mermaid/ASCII rules (brief; hooks enforce) |
| Workflow change trigger | common/workflow-changes.md |
3 | One-line trigger: "When the user requests a workflow change (go back, skip, restart, add/remove stage or unit, change depth), invoke aidlc-workflow-changes." Full procedures live in the aidlc-workflow-changes skill. |
| Error recovery basics | common/error-handling.md |
10 | Severity levels, general recovery principle (detail in skills) |
| Shared procedures | common/question-format-guide.md |
15 | Condensed question format (see "Common Content" below) |
| Session state detection | — | 3 | Always-on rule: "If aidlc-docs/aidlc-state.md exists, inform user and suggest /aidlc to resume or a specific stage skill." Prevents silent state loss on session resume. |
| Total | ~125 |
The orchestration map (workflow sequencing logic, welcome message, session resume) is not in AGENTS.md. It lives in the aidlc orchestrator skill, loaded only when the user starts a full AI-DLC workflow. AGENTS.md contains two triggers: "When the user says 'Using AI-DLC' or invokes /aidlc, invoke the aidlc skill" and "When the user requests a workflow change (go back, skip, restart, add/remove stage or unit, change depth), invoke aidlc-workflow-changes." AGENTS.md also contains a session state detection rule: "If aidlc-docs/aidlc-state.md exists in the workspace, inform the user and suggest /aidlc to resume or a specific stage skill."
Estimated always-on context: ~125 lines, ~750 words, ~980 tokens of AGENTS.md + ~1,600 tokens of 16 skill descriptions (13 stages + 1 orchestrator + 1 workflow-changes + 1 security-check) + ~400 tokens of 3 subagent descriptions = ~3,000 tokens at session start (down from ~12,000). When the user starts a full workflow, the aidlc orchestrator skill loads an additional ~2,130 tokens.
Orchestration Design: The aidlc Orchestrator Skill
The orchestration logic lives in a dedicated aidlc skill (skills/aidlc/SKILL.md), not in AGENTS.md. This skill replaces core-workflow.md as the workflow entry point. It is loaded only when the user starts a full AI-DLC workflow (by saying "Using AI-DLC, ..." or typing /aidlc), and contains four things:
- Welcome message (~780 tokens, from current
welcome-message.md) - Session resume detection (~440 tokens, from current
session-continuity.md— checks for existingaidlc-state.mdand offers to resume) - Skill orchestration map with enforcement (~710 tokens, the sequential workflow logic with health checks, post-invocation artifact verification, and cascade dependency annotations)
- Cross-cutting concerns (~200 tokens, security enforcement with subagent/skill fallback, mid-workflow change delegation)
Total orchestrator skill size: ~2,130 tokens. This content is not in the always-on AGENTS.md. For standalone stage invocations (e.g., /aidlc-reverse-engineering), the orchestrator is never loaded.
The orchestration map within the skill:
## Skill Orchestration
### Startup Health Check
Before starting the workflow, verify all required skills are registered.
Check that the following skills are accessible: aidlc-workspace-detection,
aidlc-reverse-engineering, aidlc-requirements-analysis, aidlc-user-stories,
aidlc-workflow-planning, aidlc-application-design, aidlc-units-generation,
aidlc-functional-design, aidlc-nfr-requirements, aidlc-nfr-design,
aidlc-infrastructure-design, aidlc-code-generation, aidlc-build-and-test,
aidlc-workflow-changes, aidlc-security-check.
If any skill is not recognized, halt and display:
"AIDLC ERROR: Skill '{name}' not found. Verify aidlc/skills/{name}/SKILL.md
exists and has valid YAML frontmatter."
### Workflow Execution Rules
At each step, invoke the named skill and wait for its
completion and user approval before proceeding. **After each
skill completes, verify its expected output artifacts exist
before advancing to the next step.** If expected artifacts are
missing after skill completion, re-invoke the skill or halt
with: "AIDLC ERROR: Stage '{name}' completed but expected
artifacts are missing: {list}. Re-run this stage or investigate."
### Stage Dependency Map (Cascade Rules)
When a user requests going back to a previous stage, invoke
`aidlc-workflow-changes` to handle the full cascade procedure
(impact assessment, artifact archival, state reset, re-execution
sequencing). The following dependency map defines which downstream
stages are invalidated when a stage is re-executed:
- workspace-detection → reverse-engineering, requirements-analysis
- reverse-engineering → requirements-analysis
- requirements-analysis → user-stories, workflow-planning, application-design
- user-stories → workflow-planning
- workflow-planning → all remaining stages
- application-design → units-generation, all construction stages
- units-generation → all construction stages (per unit)
- functional-design → code-generation (per unit)
- nfr-requirements → nfr-design (per unit)
- nfr-design → infrastructure-design, code-generation (per unit)
- infrastructure-design → code-generation (per unit)
- code-generation → build-and-test
### Inception Phase (WHAT and WHY)
1. **Always**: Invoke `aidlc-workspace-detection`
- Verify: aidlc-state.md updated with workspace type
- If result is "brownfield" and no reverse engineering artifacts exist → step 2
- Otherwise → step 3
2. **Conditional**: Invoke `aidlc-reverse-engineering`
- For large codebases (>500 files), prefer delegating to the
`aidlc-reverse-engineer` subagent for context isolation
- Verify: aidlc-docs/inception/reverse-engineering/ artifacts exist
3. **Always**: Invoke `aidlc-requirements-analysis`
- Verify: aidlc-docs/inception/requirements/ artifacts exist
4. **Conditional**: Invoke `aidlc-user-stories`
- Execute if: new user-facing features, multiple user types, complex business
requirements, cross-functional collaboration
- Skip if: pure refactoring, simple bug fixes, infrastructure-only,
documentation-only
- Verify (if executed): aidlc-docs/inception/user-stories/ artifacts exist
5. **Always**: Invoke `aidlc-workflow-planning`
- This skill determines which remaining stages to execute or skip
- User can override inclusions/exclusions
- Verify: aidlc-docs/inception/workflow-planning/ artifacts exist
6. **Conditional**: Invoke `aidlc-application-design`
- Execute if: new components, service layer design needed
- Verify (if executed): aidlc-docs/inception/application-design/ artifacts exist
7. **Conditional**: Invoke `aidlc-units-generation`
- Execute if: system needs decomposition into multiple units
- Verify (if executed): aidlc-docs/inception/application-design/unit-of-work.md exists
### Construction Phase (HOW)
For each unit of work (from aidlc-docs/inception/application-design/unit-of-work.md),
execute stages 8-12 in sequence. Track the current unit in aidlc-state.md under
`## Current Unit`. Each skill reads the current unit name from state to determine
which unit's artifacts to produce (output path: aidlc-docs/construction/{unit-name}/).
On the first unit, skill invocations load their full content into context.
On subsequent units, the skill content is already in context — the agent
re-follows the same procedures for the next unit without reloading.
8. **Conditional**: Invoke `aidlc-functional-design`
- Verify (if executed): aidlc-docs/construction/{unit-name}/functional-design/ exists
9. **Conditional**: Invoke `aidlc-nfr-requirements`
- Verify (if executed): aidlc-docs/construction/{unit-name}/nfr-requirements/ exists
10. **Conditional**: Invoke `aidlc-nfr-design` (only if step 9 executed)
- Verify (if executed): aidlc-docs/construction/{unit-name}/nfr-design/ exists
11. **Conditional**: Invoke `aidlc-infrastructure-design`
- Verify (if executed): aidlc-docs/construction/{unit-name}/infrastructure-design/ exists
12. **Always**: Invoke `aidlc-code-generation`
- Verify: generated code files exist at expected paths
After completing all stages for a unit, update aidlc-state.md to mark the unit
complete and set the next unit as current **in a single write operation** (atomic
state update to prevent inconsistency on interruption). Repeat until all units
are done.
After all units complete:
13. **Always**: Invoke `aidlc-build-and-test`
- Verify: build succeeds, test results recorded in aidlc-state.md
### Mid-Workflow Changes
When the user requests going back to a previous stage, skipping a stage,
adding/removing stages or units, or changing depth mid-workflow: invoke
`aidlc-workflow-changes`. This skill handles cascade impact assessment,
artifact archival, state reset, and re-execution sequencing. **Do not
attempt to handle workflow changes inline** — always delegate to the skill,
which contains the full procedural guidance from the current
`workflow-changes.md` (285 lines of tested change-handling logic).
### Cross-cutting: Security
When the security extension is enabled (tracked in aidlc-state.md),
enforce security compliance at each stage completion:
1. **Try subagent first**: Delegate compliance checking to the
`aidlc-security-reviewer` subagent. The subagent runs in its own
context, evaluates artifacts independently, and returns a compliance
report.
2. **Fallback to skill**: If subagent invocation fails (platform does not
support subagents, or subagent times out / returns malformed output),
invoke the `aidlc-security-check` skill instead. This skill contains
the same 15 SECURITY rules and evaluation procedure, but runs in the
main context (~3,600 tokens) rather than in isolation.
3. **Fail-closed policy**: If neither the subagent nor the fallback skill
produces a valid compliance report, treat the result as NON-COMPLIANT.
Display: "Security review could not be completed. Resolve the issue
before proceeding." **Never silently skip security review.**
Non-compliance (from either mechanism) is a blocking finding. Block stage
transition until all non-compliant findings are resolved.The orchestration map portion is ~105 lines and ~700 words (~910 tokens). This is larger than the original ~55-line/~450-token map because it now includes three enforcement mechanisms that were absent from the initial proposal:
- Startup health check (~8 lines): Verifies all 15 skills are registered before the workflow begins, preventing silent skill-loading failures.
- Post-invocation artifact verification (~1 line per step): After each skill completes, the orchestrator verifies expected output artifacts exist before proceeding. This replaces the "MANDATORY" enforcement blocks in the current
core-workflow.mdwith a more targeted mechanism. - Stage dependency map (~15 lines): Explicit cascade annotations so the agent knows which downstream stages are invalidated when a stage is re-executed. This enables the
aidlc-workflow-changesskill to compute cascade impacts. - Mid-workflow change delegation (~6 lines): Explicit instruction to invoke
aidlc-workflow-changesfor any user-requested change, rather than attempting inline handling. - Security enforcement with fallback (~15 lines): Try-subagent-first, fallback-to-skill, fail-closed policy. Ensures security enforcement is preserved on all platforms.
Combined with the welcome message and session resume logic, the full orchestrator skill is ~2,130 tokens. It captures the same adaptive workflow logic as the current 532-line core-workflow.md by:
- Keeping the conditional execution criteria as brief inline rules (2-3 lines each)
- Delegating all procedural detail to the individual skills
- Replacing "MANDATORY" enforcement blocks with targeted post-invocation verification and a fail-closed security policy
- Delegating mid-workflow changes to the
aidlc-workflow-changesskill rather than embedding change-handling logic inline
The current core-workflow.md is long because it repeats a ~17-20 line execution block for every stage (load steps, log in audit, execute, wait for approval, log response), with some inception stages embedding larger conditional assessment criteria (up to ~70 lines for User Stories). In the skill model, each skill owns its own execution procedure, so the orchestrator focuses on sequencing, verification, and delegation.
Mechanism 2: Skills — Stage procedures with deferred loading
Each AI-DLC stage becomes an independent skill following the Agent Skills specification.
Skill YAML frontmatter structure:
---
name: aidlc-reverse-engineering
description: >
Comprehensive analysis of an existing codebase. Generates business overview,
architecture docs, code structure, API documentation, component inventory,
technology stack, and dependencies documentation. Use when working on a
brownfield project that hasn't been analyzed yet.
user-invocable: true
---Setting user-invocable: true allows users to invoke any stage directly as a slash command (/aidlc-reverse-engineering), solving the "no independent stage invocation" problem without needing a separate commands mechanism.
Skill body structure (SKILL.md):
# [Stage Name]
## Prerequisites
- What artifacts must exist before this skill runs
- What state conditions are required
## Shared Procedures
- Reference to AGENTS.md question format (brief: "Follow the question
format defined in AGENTS.md § Question Format")
- Reference to AGENTS.md approval protocol
## Procedure
[Full step-by-step instructions — migrated from the current detail file]
## Outputs
- List of artifacts produced with paths under aidlc-docs/
- State updates to aidlc-state.md
## Completion
[Standardized approval gate message]
Skills to create (16 total — 1 orchestrator + 13 stages + 1 workflow-changes + 1 security-check fallback):
Token estimates below include the original stage detail file content plus absorbed common content (error handling sections, depth level guidance, ASCII diagram standards, approval blocks). The "Absorbed from" column shows which Group 3 common files contribute content to each skill.
| Skill | Source File | Base Tokens | Absorbed from | Est. Total Tokens | context: fork |
|---|---|---|---|---|---|
aidlc (orchestrator) |
core-workflow.md + welcome-message.md + session-continuity.md |
~2,130 | — | ~2,130 | No |
aidlc-workspace-detection |
inception/workspace-detection.md |
~520 | error-handling (~150) | ~700 | No |
aidlc-reverse-engineering |
inception/reverse-engineering.md |
~1,300 | error-handling (~150), ascii-diagrams (~250) | ~1,730 | Optional |
aidlc-requirements-analysis |
inception/requirements-analysis.md |
~1,310 | error-handling (~150), depth-levels (~500) | ~1,990 | No |
aidlc-user-stories |
inception/user-stories.md |
~2,910 | error-handling (~150) | ~3,090 | No |
aidlc-workflow-planning |
inception/workflow-planning.md |
~2,520 | error-handling (~150), depth-levels (~300), ascii-diagrams (~250) | ~3,250 | No |
aidlc-application-design |
inception/application-design.md |
~1,240 | error-handling (~150), ascii-diagrams (~250) | ~1,670 | No |
aidlc-units-generation |
inception/units-generation.md |
~1,670 | error-handling (~150) | ~1,850 | No |
aidlc-functional-design |
construction/functional-design.md |
~1,010 | error-handling (~150) | ~1,190 | No |
aidlc-nfr-requirements |
construction/nfr-requirements.md |
~870 | error-handling (~150) | ~1,050 | No |
aidlc-nfr-design |
construction/nfr-design.md |
~680 | error-handling (~150) | ~860 | No |
aidlc-infrastructure-design |
construction/infrastructure-design.md |
~740 | error-handling (~150), ascii-diagrams (~250) | ~1,170 | No |
aidlc-code-generation |
construction/code-generation.md |
~1,820 | error-handling (~200) | ~2,050 | No |
aidlc-build-and-test |
construction/build-and-test.md |
~1,460 | error-handling (~200) | ~1,690 | No |
aidlc-workflow-changes |
common/workflow-changes.md |
~2,030 | — | ~2,030 | No |
aidlc-security-check |
extensions/security/baseline/security-baseline.md |
~3,380 | evaluation procedure (~220) | ~3,600 | No |
| ~30,050 |
Note on the security extension: The security baseline (323 lines, ~3,380 tokens) is a cross-cutting concern evaluated at every stage boundary. On platforms with subagent support, it moves to the aidlc-security-reviewer subagent (see Mechanism 3), which has its own context window and loads the security rules internally — keeping the main context clean. On platforms without subagent support (Cline), the aidlc-security-check fallback skill provides the same 15-rule evaluation procedure in the main context (~3,600 tokens). The orchestrator's cross-cutting security section defines the try-subagent-first, fallback-to-skill, fail-closed policy that ensures security enforcement is never silently dropped regardless of platform capabilities.
Common Content Distribution
Currently, shared content is delivered as 11 common rule files plus 1 extension file. The skill model must account for every one of them. The following table traces each current file to its destination:
Group 1 files (currently always loaded):
| Current file | Tokens | Destination | Rationale |
|---|---|---|---|
core-workflow.md |
~4,250 | AGENTS.md orchestration map (~450 tokens) + skills own their procedures | Orchestration logic condensed; stage detail delegated |
process-overview.md |
~950 | Dropped (redundant with orchestration map) | The map IS the process overview |
question-format-guide.md |
~1,750 | AGENTS.md (~200 token summary) + embedded in skills that ask questions (~150 tokens each in requirements-analysis, user-stories, workflow-planning) | Multiple-choice format and [Answer]: tag condensed |
session-continuity.md |
~440 | Embedded in aidlc orchestrator skill (resume detection at workflow start) |
Orchestrator checks for aidlc-state.md and offers to resume |
content-validation.md |
~490 | AGENTS.md (~200 token brief) + hooks enforce deterministically | Rules state the principle; hooks enforce it |
welcome-message.md |
~780 | Embedded in aidlc orchestrator skill (displayed at workflow start) |
Loaded only when full workflow is invoked, not always-on |
security-baseline.md |
~3,380 | aidlc-security-reviewer subagent (embedded in full) + aidlc-security-check fallback skill (for non-subagent platforms) |
Cross-cutting concern isolated in subagent where supported; fallback skill ensures enforcement on all platforms |
Group 3 files (currently loaded as needed):
| Current file | Tokens | Destination | Rationale |
|---|---|---|---|
error-handling.md |
~2,380 | AGENTS.md (~130 token summary: severity levels, general recovery principle) + stage-specific recovery procedures embedded in each skill (~150-200 tokens per skill) | General principle is permanent; per-stage recovery is stage-specific |
workflow-changes.md |
~2,030 | AGENTS.md (~40 token trigger) + aidlc-workflow-changes skill (full procedures, deferred loading) |
Cross-cutting but infrequent; one-line trigger in AGENTS.md invokes the skill only when the user requests a change |
depth-levels.md |
~500 | Embedded in aidlc-requirements-analysis (~500 tokens) and aidlc-workflow-planning (~300 tokens) |
Only these two skills use adaptive depth selection |
terminology.md |
~1,200 | AGENTS.md (~260 token subset: Phase, Stage, Unit of Work, Service, Component, Planning vs Generation) | Core terms only; full glossary available in skills |
overconfidence-prevention.md |
~770 | AGENTS.md (~200 tokens condensed) | Always-relevant behavioral constraint |
ascii-diagram-standards.md |
~480 | Embedded in skills that produce diagrams: reverse-engineering, application-design, workflow-planning, infrastructure-design (~250 tokens each) + hooks validate | Only 4 skills produce ASCII diagrams |
Three-layer distribution model:
-
Truly permanent content → AGENTS.md: Terminology, directory structure, approval protocol, condensed question format, workflow change trigger, error recovery basics, overconfidence prevention, session state detection, and content validation principles. These are always needed regardless of which stage is active. (~125 lines, ~980 tokens)
-
Stage-specific content → embedded in each skill: Error recovery procedures specific to each phase, detailed question formatting examples, depth-level guidance (requirements + planning only), ASCII diagram standards (4 diagram-producing skills only), and the approval gate message block. Each skill is self-contained and can be invoked independently. Duplication is intentional and quantified in the skills table above.
-
Enforcement → hooks: Mermaid syntax and ASCII diagram validation, audit format checking. The AI doesn't need to "remember" to validate—the hook runs automatically.
Duplication budget: Among the 13 stage skills, the absorbed common content adds ~2,100 tokens of error-handling (vs. a single 2,380-token file — net savings of ~280 tokens since each skill embeds only its relevant recovery procedures), ~1,000 tokens of ASCII standards across 4 skills (vs. a single 480-token file — +520 overhead), and ~800 tokens of depth-levels across 2 skills (vs. a single 500-token file — +300 overhead). Net duplication overhead is ~540 tokens above the original file sizes. The aidlc-workflow-changes and aidlc-security-check skills do not duplicate content — they absorb their source files wholesale (~2,030 and ~3,600 tokens respectively) and are only loaded when triggered. This is a modest cost for skill independence.
Mechanism 3: Subagents — Context-isolated specialized work
Subagents handle tasks that require their own context window. Each subagent is defined as a markdown file with YAML frontmatter in the agents/ directory.
| Subagent | Purpose | Context isolation needed because | When invoked |
|---|---|---|---|
aidlc-security-reviewer |
Evaluates all 15 SECURITY rules against stage artifacts. Returns a compliance report with compliant/non-compliant/N/A per rule. | The security baseline is 3,380 tokens of cross-cutting rules. Loading it into the main context at every stage boundary would accumulate. Running it in a subagent keeps the main context clean. | At each stage completion when security extension is enabled |
aidlc-reverse-engineer |
Analyzes large codebases. Produces 8 reverse engineering artifacts (business overview, architecture, code structure, APIs, components, tech stack, dependencies, code quality). | Large codebase analysis can consume 50k+ tokens of code context. Isolating this prevents the main conversation from being overwhelmed. | When aidlc-reverse-engineering skill detects a large codebase (>500 files) |
aidlc-code-reviewer |
Reviews generated code against requirements documents, design docs, and security rules. | Independent review benefits from a fresh context that isn't anchored to the generation process. | After code generation, before build-and-test |
Example subagent definition (agents/aidlc-security-reviewer.md):
---
name: aidlc-security-reviewer
description: >
Evaluates AI-DLC stage artifacts against the 15 SECURITY baseline rules
(OWASP Top 10 2025 mapped). Returns a structured compliance report.
Invoke at each stage completion when security extension is enabled.
model: haiku
tools:
- Read
- Glob
- Grep
---
# Security Baseline Reviewer
## Your Role
You are a security compliance reviewer. You evaluate stage artifacts against
the security baseline rules below and return a structured compliance report.
## Input
You will receive:
- The stage name and its output artifacts (file paths)
- The current extension configuration from aidlc-state.md
## Security Rules
[Full content of current security-baseline.md embedded here — 323 lines]
## Output Format
Return a markdown compliance report:
- For each applicable SECURITY rule: Compliant / Non-Compliant / N/A
- For Non-Compliant: specific finding and remediation guidance
- Summary: blocking findings count, N/A count, compliant countUsing a lightweight model (e.g., Haiku) for the security reviewer keeps cost low while the main agent continues on a more capable model.
Mechanism 4: Hooks — Deterministic quality enforcement
Hooks replace AI instructions with automated checks. Hook configuration is platform-specific — there is no universal hooks file. The build-rules.sh script generates the appropriate configuration for each platform:
- Claude Code: Hooks go in
.claude/settings.jsonunder thehookskey - Cursor: Hooks go in
.cursor/hooks.json - Kiro: Hooks configured via Kiro's hooks UI or config files
The validation scripts themselves (scripts/validate-mermaid.sh, scripts/validate-audit-format.sh) are platform-agnostic shell scripts. Only the hook wiring is platform-specific.
Claude Code format (.claude/settings.json):
{
"hooks": {
"PostToolUse": [
{
"matcher": "Write|Edit",
"hooks": [
{
"type": "command",
"command": "./aidlc/scripts/validate-mermaid.sh \"$TOOL_INPUT_FILE_PATH\""
}
]
}
],
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "echo 'AI-DLC session ended. Check aidlc-docs/aidlc-state.md for last saved state (may not reflect in-progress work).'"
}
]
}
]
}
}Note on file path filtering: Claude Code's PostToolUse matcher field matches tool names (e.g., Write|Edit), not file paths. File path filtering (only validating files under aidlc-docs/) is handled inside the validation scripts themselves. validate-mermaid.sh checks whether $TOOL_INPUT_FILE_PATH matches aidlc-docs/**/*.md before running validation, and exits 0 (no-op) for non-matching paths. Similarly, validate-audit-format.sh only acts when the target file is aidlc-docs/audit.md.
What moves from rules/instructions to hooks:
- Mermaid syntax validation →
PostToolUsehook onWrite|Edit, script filters toaidlc-docs/**/*.md - Audit format enforcement →
PostToolUsehook onWrite|Edit, script filters toaudit.md - Session end notification →
Stophook
What stays as skill instructions (no suitable hook event exists):
- Artifact existence verification before stage transition — this remains in each skill's "Prerequisites" section because no platform offers an "after skill completion" hook event
- Build and lint execution after code generation — the
aidlc-build-and-testskill handles this procedurally
Supporting scripts: The scripts/ directory contains the actual validation logic. validate-mermaid.sh checks the file path against aidlc-docs/**/*.md, then uses mmdc --validate (Mermaid CLI) if available, or falls back to a regex-based syntax checker for common errors (unclosed brackets, invalid diagram type keywords). validate-audit-format.sh checks that the last entry in audit.md has the required fields (Timestamp in ISO 8601, User Input, AI Response, Context). Both scripts exit 0 on success and exit 1 with an error message on failure, which the platform surfaces to the agent for correction.
Context Budget Analysis: Before and After
Baseline comparison
| Metric | Current (rules-only) | Proposed | Difference |
|---|---|---|---|
| Always-on context at session start | ~12,040 tokens | ~3,000 tokens | -75% |
After full-workflow invocation (/aidlc) |
~12,040 tokens | ~5,130 tokens (+ orchestrator skill) | -57% |
| Security rules in main context | ~3,380 tokens (always) | 0 tokens (subagent) | -100% |
| Total stage skill content (13 stages) | ~18,040 tokens | ~22,290 tokens | +24% (absorbed common content) |
| Orchestrator skill | (in core-workflow.md) | ~2,130 tokens | Extracted from AGENTS.md |
The total stage skill content is larger than the current stage files because common content (error handling, depth levels, ASCII standards) is now embedded in skills rather than loaded as shared files. The net duplication overhead across the 13 stage skills is ~540 tokens (see Duplication Budget above). The aidlc-workflow-changes (~2,030 tokens) and aidlc-security-check (~3,600 tokens) skills are demand-loaded and not included in the scenario analyses below unless specifically triggered.
Scenario-based analysis
Scenario 1: Simple bug fix (3 stages: workspace-detection, requirements-analysis, code-generation)
| Current | Proposed | |
|---|---|---|
| Always-on | ~12,040 | ~3,000 |
| Orchestrator | (included above) | ~2,130 |
| Stage content loaded | ~3,650 (3 detail files) | ~4,740 (3 skills with absorbed content) |
| Common files loaded | ~2,380 (error-handling) | 0 (embedded in skills) |
| Security (if disabled) | 0 | 0 |
| Total in main context | ~18,070 | ~9,870 |
Savings: ~45%. This is the primary use case where the restructure delivers the most value. Most AI-DLC invocations are partial workflows. For standalone invocations (e.g., /aidlc-reverse-engineering without the orchestrator), savings are even larger since the orchestrator's ~2,130 tokens are never loaded.
Scenario 2: Greenfield single-unit project (8 stages: no reverse engineering, no NFR, no infrastructure)
Stages: workspace-detection, requirements-analysis, user-stories, workflow-planning, application-design, units-generation, code-generation, build-and-test.
| Current | Proposed | |
|---|---|---|
| Always-on | ~12,040 | ~3,000 |
| Orchestrator | (included above) | ~2,130 |
| Stage content loaded | ~13,450 (8 detail files) | ~16,290 (8 skills with absorbed content) |
| Common files loaded | ~5,390 (error-handling, workflow-changes, depth-levels, ascii-diagrams) | 0 (embedded in skills) |
| Security (if enabled) | ~3,380 (in main context) | 0 (subagent) |
| Total in main context | ~34,260 | ~21,420 |
Savings: ~37%. The security subagent isolation accounts for a significant portion.
Scenario 3: Brownfield 3-unit project with security (full 13 stages, construction loop runs 3 times)
All 13 stages execute. Construction stages 8-12 run for each of 3 units. Skills loaded in the first unit iteration remain in context for subsequent units (no re-loading, no accumulation of duplicates).
| Current | Proposed | |
|---|---|---|
| Always-on | ~12,040 | ~3,000 |
| Orchestrator | (included above) | ~2,130 |
| Inception skills (stages 1-7) | ~11,470 | ~14,280 (with absorbed content) |
| Construction skills (stages 8-13, loaded once) | ~6,570 | ~8,010 (with absorbed content) |
| Common files loaded during workflow | ~7,360 (all 6 common files) | 0 (embedded) |
| Security (always on) | ~3,380 | 0 (subagent) |
| Total in main context | ~40,820 | ~27,420 |
Savings: ~33%. In the worst case (full workflow), savings come primarily from the leaner AGENTS.md (-10,840 vs always-loaded rules), security subagent isolation (-3,380), and eliminating shared file overhead (-7,360), partially offset by the orchestrator skill (+2,130) and absorbed content duplication (+540 across all skills).
Note on multi-unit loops: Skills invoked for unit 1 remain in context for units 2 and 3. The agent re-follows the same procedures without reloading the skill body. This means multi-unit workflows do not multiply skill context cost—the construction skills are loaded once regardless of unit count. This matches the current behavior where detail files, once loaded, also persist.
Note on standalone invocations: When a user invokes a single stage directly (e.g., /aidlc-reverse-engineering), neither the orchestrator skill nor unrelated stage skills are loaded. Context is: AGENTS.md (~980 tokens) + skill descriptions (~1,600 tokens) + subagent descriptions (~400 tokens) + the invoked skill body only. For reverse engineering: ~980 + 1,600 + 400 + 1,730 = ~4,710 tokens, vs. ~12,040 + 1,300 = ~13,340 in the current model. Savings: ~65%.
Migration Path
Two phases, both committed:
Phase 1: Skills + Rules (core value)
- Create
AGENTS.mdwith condensed permanent constraints (~140 lines) - Create
aidlcorchestrator skill with welcome message, session resume, and orchestration map - Convert all 13 stage detail files to skills with SKILL.md format
- Create
aidlc-workflow-changesskill with full cascade/skip/restart procedures - Create
aidlc-security-checkfallback skill for non-subagent platforms - Create
aidlc-security-reviewersubagent with embedded security rules - Create
build-rules.shbackward-compatibility script (see below) - Run evaluation suite (Tier 1 all prompts, Tier 2 key prompts, cross-platform validation)
- Update README with new installation instructions
- Release as v0.2.0
Phase 2: Hooks + Additional Subagents
- Add
hooks.jsonwith Mermaid validation and audit format hooks - Create validation scripts (
validate-mermaid.sh,validate-audit-format.sh) - Create
aidlc-reverse-engineerandaidlc-code-reviewersubagents - Release as v0.3.0
Evaluation Plan
"small wording tweaks produce large behavioral swings." This restructure changes the delivery format of every instruction file. Before shipping, we must validate that the skills-based delivery produces equivalent workflow behavior to the current rules-based delivery.
Evaluation methodology
1. Reference prompt suite: Create 6-8 reference prompts that exercise the key workflow paths:
| Prompt | Stages exercised | Key behaviors to verify |
|---|---|---|
| "Using AI-DLC, add a login page to this React app" (brownfield) | 1, 2, 3, 4, 5, 6, 7, 8, 12, 13 | Brownfield detection, reverse engineering triggers, full inception |
| "Using AI-DLC, build a REST API for todo items" (greenfield) | 1, 3, 5, 12, 13 | Greenfield detection, minimal stages for simple request |
| "Using AI-DLC, refactor the database layer" (brownfield, no user-facing) | 1, 2, 3, 5, 12, 13 | User stories correctly skipped |
| "Using AI-DLC, build a multi-service e-commerce platform" (greenfield, complex) | 1, 3, 4, 5, 6, 7, 8-12 (×N units), 13 | Full workflow with multi-unit loop |
/aidlc-reverse-engineering (standalone invocation) |
1, 2 | Independent stage invocation works |
| Resume from interrupted session (aidlc-state.md exists) | Varies | Session continuity preserved |
2. Behavioral equivalence criteria: For each reference prompt, compare the rules-based and skills-based runs on:
- Stage selection: Same stages executed/skipped (pass/fail)
- Artifact production: Same output files created in
aidlc-docs/(pass/fail) - Approval gates: Same number of user approval checkpoints (pass/fail)
- Question quality: Clarifying questions cover the same topics (subjective, scored 1-5)
- Audit completeness: Audit log captures all interactions (pass/fail)
- Security enforcement: When enabled, blocking findings are surfaced at the same points (pass/fail)
3. Execution: Run each reference prompt against both the current rules and the proposed skills on the same model (Claude Sonnet, as the most common user model). Compare outputs side by side. A prompt passes if all pass/fail criteria match and subjective scores are ≥3.
Full multi-turn workflow runs are expensive. To manage cost, the eval is split into two tiers:
- Tier 1 — Automated (all 6 prompts): Run each prompt through the first 2-3 stages only and programmatically verify stage selection logic (which stages are invoked/skipped) and artifact creation (correct files exist in
aidlc-docs/). This validates the orchestration map and skill loading without running full workflows. - Tier 2 — Full run (3 key prompts): Run prompts 1, 2, and 4 end-to-end and manually evaluate all criteria including question quality. These three cover brownfield/greenfield, simple/complex, and multi-unit paths.
4. Cross-platform validation: The agent-directed skill loading model (AGENTS.md instructs the agent to trigger skills by name) must be validated on at least one non-Claude-Code platform (Kiro or Cursor) to confirm that the orchestration map's sequential "invoke skill X" instructions are followed correctly. If a platform's skill loading is purely heuristic (platform decides, not agent), the orchestration map may need platform-specific adaptation. This must be confirmed before Phase 1 release.
5. Regression detection: After initial validation, the Tier 1 automated checks become a regression suite run in CI. The build-rules.sh backward-compatibility script is validated by running the same prompts against the generated rules output and confirming equivalence with the current rules.
Evaluation gates
- Phase 1 release gate: All 6 prompts pass Tier 1 automated checks. 3 key prompts pass Tier 2 full-run evaluation. Cross-platform validation passes on Claude Code + one skill-supporting platform (Kiro or Cursor).
- Phase 2 release gate: Hook enforcement must not produce false positives on the reference prompts (validation scripts don't block valid content).
Backward Compatibility: Rules Generation Build
For platforms that only support rules (e.g., GitHub Copilot, Amazon Q with rules-only mode), a build script generates the current rules format from the skills source.
scripts/build-rules.sh:
The script performs the following transformations:
-
AGENTS.md →
core-workflow.md: Takes the AGENTS.md orchestration map and expands each "invoke skill X" step into the current inline execution block format (load steps, log audit, execute, wait for approval, log response). This is a template expansion: for each skill reference, inject a ~15-line block using the skill's description and output paths. -
Skills → detail files: Strips YAML frontmatter from each SKILL.md and copies the body to the appropriate platform path (
aws-aidlc-rule-details/inception/,aws-aidlc-rule-details/construction/). -
Security subagent → extension file: Extracts the security rules section from
agents/aidlc-security-reviewer.mdand writes it toaws-aidlc-rule-details/extensions/security/baseline/security-baseline.md. -
Platform-specific output: Generates the complete directory structure for each supported platform:
.kiro/steering/+.kiro/aws-aidlc-rule-details/.amazonq/rules/+.amazonq/aws-aidlc-rule-details/.cursor/rules/(with YAML frontmatter wrapping)CLAUDE.md+.aidlc-rule-details/.github/copilot-instructions.md+.aidlc-rule-details/
The build script is tested in CI: the release pipeline runs build-rules.sh, and a validation step confirms the generated output matches the expected structure and contains all stage content.
The release pipeline produces two artifacts:
ai-dlc-skills-vX.X.X.zip— the new skills-based structure (for Claude Code, Kiro, Cursor)ai-dlc-rules-vX.X.X.zip— the generated rules structure (backward compatibility)
Platform Compatibility
| Platform | Primary delivery | Mechanisms supported |
|---|---|---|
| Claude Code | Skills + subagents + hooks | All 4 mechanisms |
| Kiro | Skills + subagents + hooks | All 4 mechanisms |
| Cursor | Skills + subagents + hooks | All 4 mechanisms |
| Amazon Q Developer | Generated rules (build script) | Rules only |
| GitHub Copilot | Generated rules (build script) | Rules only |
| Cline | Skills (via AGENTS.md + skills/) | Skills. No hooks/subagents — uses aidlc-security-check fallback skill. |
For rules-only platforms, the generated output is functionally identical to the current delivery. No degradation occurs. For skills-capable platforms without subagent support (Kiro, Cline), the aidlc-security-check fallback skill ensures security enforcement is preserved, though the security baseline loads into the main context (~3,600 tokens) rather than being isolated in a subagent.
Alternatives Considered
Alternative 1: Keep rules-only, split into smaller files
Split core-workflow.md into smaller rule files with more granular auto/manual loading. This reduces per-file size but doesn't address the core problems: always-loaded rules consume the same total context, stages can't be invoked independently, no automated enforcement, no cross-platform standard for the split.
Alternative 2: Skills only (no subagents or hooks)
Convert stages to skills and keep security as a skill. Simpler, but: (a) security as a skill means it must be explicitly loaded at each stage boundary, breaking the cross-cutting enforcement model; (b) no deterministic validation—Mermaid and audit format checking remain AI instructions.
Drawbacks
-
Increased structural complexity: The rules-only model is one directory of markdown files. The proposed model has 4 directories (skills, agents, hooks, scripts) plus AGENTS.md. Contributors need to understand which mechanism to use for new content.
-
Skill content accumulation: Skills do not unload from context. A full 13-stage workflow accumulates ~22,290 tokens of stage skill content plus ~2,130 tokens for the orchestrator. Combined with AGENTS.md (~980 tokens), skill descriptions (~1,600 tokens), and subagent descriptions (~400 tokens), this totals ~27,400 tokens in the main context—a 33% reduction from the current ~40,820 worst case, but not the 75% session-start improvement. The benefit is most pronounced for partial workflows (45% savings on a 3-stage bug fix) and standalone invocations (65% savings). The
aidlc-workflow-changesskill (~2,030 tokens) andaidlc-security-checkfallback skill (~3,600 tokens) are only loaded when triggered by a workflow change request or on platforms without subagent support, respectively — they do not contribute to the typical workflow accumulation. -
Common content duplication: Absorbing Group 3 common files into individual skills adds ~540 tokens of net duplication overhead (ASCII standards in 4 skills, depth levels in 2 skills, partially offset by error-handling being shorter per skill than the monolithic file). This is a modest cost for skill independence and is quantified in the skills table.
-
Build script maintenance: The backward-compatibility build script must be updated whenever skills are added or modified. This is additional CI/release engineering work.
-
Platform disparity: Users on Claude Code, Kiro and Cursor get the full experience (skills + subagents + hooks). Users on platforms without subagent support (Cline) get a skill-based security fallback that loads the security baseline into the main context (~3,600 tokens) instead of isolating it in a subagent. Users on rules-only platforms get a generated version that is functionally identical to today but doesn't benefit from subagent isolation or hook enforcement. This creates a three-tier experience, though all tiers preserve security enforcement.
Additional Context
- Agent Skills specification: agentskills.io — adopted by Claude Code, Cursor, Kiro, Gemini CLI, JetBrains Junie, GitHub, VS Code, OpenAI Codex, and 20+ others
- Current AI-DLC version: 0.1.5
- Current delivery: ZIP from GitHub Releases containing
aidlc-rules/directory - The Operations phase (currently a placeholder) would be added as a new skill when implemented, without modifying the orchestrator—demonstrating the extensibility benefit of this architecture