Skip to content

Commit 00e9113

Browse files
author
David Hache
committed
feat: add distilled rules — token-optimized AI-DLC rule compression
Add a distillation pipeline that compresses the human-readable AIDLC source rules into token-efficient versions optimized for AI agent consumption. The distilled rules produce identical output artifacts while using significantly fewer tokens (~59% reduction) and faster execution (~29% wall-clock savings). ## What's included - DISTILLATION-INSTRUCTIONS.md: Complete pipeline specification for generating and maintaining distilled rules (scan, merge, deduplicate, reorder, compress, encode) with quality checks and output equivalence verification. - aidlc-rules-distilled/: Compressed rule files mirroring the source structure: - core-workflow.md (top-level orchestrator) - common/workflow-rules.md (process, questions, validation, session, errors) - inception/inception-rules.md (workspace detection through units generation) - construction/construction-rules.md (functional design through build & test) - extensions/security/baseline/security-baseline.md (OWASP security rules) - operations/operations.md (placeholder) ## Distillation principles - Output equivalence: distilled rules must produce the same file names, directory structure, document templates, section headers, and detail level as source rules - Verbatim preservation: output templates, file paths, tag names, and format patterns are never compressed - Granularity preservation: sequential numbered steps that produce distinct artifacts remain as separate items (never joined with + or &) - Sub-category preservation: evaluation area sub-items are kept as inline lists since they serve as behavioral cues guiding output specificity - Template preservation: full markdown template code blocks defining output artifact structure are preserved verbatim, never summarized ## Regression testing results Validated against sci-calc benchmark (Scientific Calculator API): - Unit tests: 100% (124/124), Contract tests: 88/88, Lint: 0, Security: 0 - Token usage: 7.5M vs 18.4M golden (-59.2%) - Wall clock: 16.9m vs 23.8m golden (-29.0%) - Qualitative score: 0.76 vs 0.85 golden (-10.8%) — remaining gap attributed to AI judgment variance in stage skip decisions and question prioritization, not rule compression losses
1 parent 94619af commit 00e9113

File tree

8 files changed

+2734
-2
lines changed

8 files changed

+2734
-2
lines changed

DISTILLATION-INSTRUCTIONS.md

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# Distillation Instructions
2+
3+
## What This Is
4+
5+
`aidlc-rules-distilled/` contains AI-optimized compressed versions of the source rules in `aidlc-rules/`. Source files are the human-readable truth. Distilled files exist purely for AI context efficiency: same rules, fewer tokens.
6+
7+
Relationship is one-way: source → distilled. Never modify source files when editing distilled files.
8+
9+
## Audience: AI Agents Only
10+
11+
No human will ever read distilled files. Optimize exclusively for minimal token count and parse efficiency:
12+
13+
- Strip ALL emoji from rule prose and headers — zero exceptions
14+
- No bold, italic, or decorative formatting — use formatting only when structurally meaningful (code blocks for templates, `backticks` for paths)
15+
- No horizontal rules, emoji bullet markers, or visual hierarchy tricks
16+
- EXCEPTION: emoji/formatting inside output templates the AI must reproduce verbatim are PRESERVED exactly as-is — the rule is: strip from INSTRUCTIONS, preserve in OUTPUT TEMPLATES
17+
18+
## Output Equivalence
19+
20+
Distillation compresses RULES, not the outputs those rules produce. A distilled ruleset is correct only if an AI following it produces identical output artifacts — same file names, same directory structure, same document sections, same detail level.
21+
22+
- Source says "create `inception/application-design/components.md`" → distilled MUST preserve that exact path
23+
- Source defines a template with specific sections → distilled MUST preserve that template verbatim
24+
- Source rules across multiple files specify different output artifacts → distilled MUST preserve ALL specs — merging rules ≠ merging outputs
25+
- Output file names, directory structure, document templates, section headers → PRESERVE_VERBATIM, not compressible
26+
27+
### Output Contract Checklist
28+
29+
Before finalizing any distilled file:
30+
1. Every output file path in source rules appears in distilled version
31+
2. Every document template/structure preserved verbatim
32+
3. Every required section header within output documents preserved
33+
4. Distinct output artifact count equals source count
34+
5. No two source output artifacts merged into one
35+
36+
### Common Mistakes
37+
38+
- Merging `components.md`, `services.md`, `component-methods.md`, `component-dependency.md` rules into one section → AI infers one file → WRONG. Must still produce four separate files
39+
- Compressing output template section headers → WRONG. Required format, not prose
40+
- Dropping explicit file path from "write this to X" → WRONG. Path is required output spec
41+
- Joining separate plan steps with `+` or commas (e.g., "Business Logic Generation + Unit Testing + Summary") → AI interprets as ONE combined step → WRONG. Each source step MUST remain a separate numbered item in the distilled output
42+
- Compressing enumerated sub-categories into a flat list (e.g., source lists "Functional Requirements: Core features, user interactions, system behaviors" → distilled drops the sub-items) → AI loses specificity cues that guide correct behavior → WRONG. Preserve enumerated sub-categories as inline lists
43+
- Flattening sequential numbered steps (e.g., "Step 1... Step 2... Step 3..." each producing a distinct artifact) into a single bullet list → AI treats artifacts as optional/low-priority → WRONG. Preserve step numbering when each step produces a distinct output artifact
44+
- Replacing a full markdown template code block with a summary of its section names (e.g., source has a complete ```markdown template for build-instructions.md → distilled replaces it with "template sections: Prerequisites, Build Steps, Troubleshooting") → AI generates shallow/incomplete artifacts because it doesn't know the expected structure → WRONG. Full markdown template code blocks that define the structure of output artifacts MUST be preserved verbatim in the distilled file
45+
46+
## Distilled File Organization
47+
48+
```
49+
aidlc-rules-distilled/
50+
├── core-workflow.md # top-level orchestrator (distilled last)
51+
├── common/workflow-rules.md # all common/ sources merged
52+
├── inception/inception-rules.md # all inception/ sources merged
53+
├── construction/construction-rules.md # all construction/ sources merged
54+
├── extensions/... # each extension standalone
55+
└── operations/... # standalone
56+
```
57+
58+
Merge strategy:
59+
- Same subdirectory, same phase → merge into one distilled file
60+
- Cross-cutting extensions → standalone
61+
- Placeholder/minimal files → standalone
62+
63+
Merging RULES into one file ≠ merging OUTPUTS. Merged rule file must still instruct AI to produce every individual output artifact.
64+
65+
## Processing Order
66+
67+
Deepest/leaf files first → root. Prevents broken cross-references.
68+
69+
Order: `extensions/``operations/``construction/``inception/``common/``core-workflow.md`
70+
71+
Within each directory: runtime execution order.
72+
73+
---
74+
75+
## PIPELINE
76+
77+
follow_order: each step builds on previous
78+
79+
### 1_SCAN
80+
- read ALL source files in `aidlc-rules/` recursively
81+
- group by directory (common, inception, construction, extensions/*, operations)
82+
- merge candidates: files in same dir serving same phase → single distilled file
83+
- standalone candidates: cross-cutting extensions, placeholders, isolated files
84+
- extract output manifest: catalog every output file path, document template, required section header
85+
86+
### 2_MERGE
87+
- combine merge candidates into single content block per distilled file
88+
- `## UPPERCASE_SECTION_NAME` headers to separate former files
89+
- preserve every output artifact specification from every merged file — never consolidate output specs
90+
91+
### 3_DEDUPLICATE
92+
- rules stated >1 time across sections → keep in most relevant section only
93+
- rule in summary + intro + body → appears exactly once
94+
- NEVER deduplicate output artifact specifications — different output files both remain even if surrounding instructions are similar
95+
96+
### 4_REORDER
97+
- sections in execution order (runtime sequence)
98+
- earlier sections MUST NOT reference concepts defined later
99+
- within phase: follow stage execution sequence from core-workflow.md
100+
101+
### 5_SEMANTIC_COMPRESSION
102+
Rewrite in terse imperative language:
103+
- drop articles, conjunctions, filler ("you should", "make sure to", "it is important that")
104+
- prose paragraphs → `key: value` lines
105+
- "if X then Y" → `condition→action`
106+
- keep ONLY examples that define a required format AI must reproduce exactly
107+
- drop all other examples, rationale, "why" explanations
108+
- strip ALL emoji from rule/instruction prose (preserve inside verbatim output templates)
109+
- strip decorative formatting: no bold/italic for emphasis, no horizontal rules, no emoji bullets
110+
- NEVER compress output file paths, document templates, or output section headers
111+
112+
#### Granularity Preservation Rules
113+
- when source lists N separate sequential steps that each produce a distinct artifact or plan item, distilled MUST preserve N separate numbered items — NEVER join with `+`, `&`, or commas into fewer items
114+
- when source enumerates sub-categories under a category (e.g., "Functional Requirements: Core features, user interactions, system behaviors"), distilled MUST preserve the sub-items as an inline list — dropping them removes specificity cues the AI needs to produce correct output
115+
- when source uses numbered steps (Step 1, Step 2...) where each step has its own artifact, distilled MUST preserve the step-per-artifact structure — flattening into a single list degrades artifact quality
116+
- when source lists explicit evaluation areas with sub-bullets (e.g., completeness analysis categories with their specific concerns), preserve the sub-bullets as terse inline items — these are behavioral cues, not prose
117+
118+
#### Template Preservation Rules
119+
- when a source step contains a markdown code block template (``` ```markdown ... ``` ```) that defines the structure of an output artifact, the ENTIRE template code block MUST be preserved verbatim in the distilled file — it is an output specification, not compressible prose
120+
- NEVER replace a full template with a summary of its section names (e.g., "template sections: X, Y, Z") — the AI needs the complete template to produce correct output
121+
- this applies to ALL artifact templates: build-instructions.md, unit-test-instructions.md, integration-test-instructions.md, performance-test-instructions.md, build-and-test-summary.md, execution-plan.md, aidlc-state.md, and any other template that defines output structure
122+
- the surrounding instructional prose (e.g., "Create this file with the following structure:") CAN be compressed — only the template code block itself is verbatim
123+
124+
### 6_SYMBOLIC_ENCODING
125+
Convert to structured notation:
126+
- sequences: `A → B → C`
127+
- conditional: `[COND]` or `condition→action`
128+
- required/mandatory: `[REQ]`
129+
- file references: `@filename` (relative to distilled root)
130+
- OR: `A | B`, AND: `A & B`
131+
- unordered lists: comma-separated inline
132+
- ordered steps: `1. 2. 3.`
133+
- section headers: `## UPPERCASE_NAME` (no emoji prefixes)
134+
- no emoji in encoded output (exception: inside verbatim output template code blocks)
135+
- output artifact paths and templates: preserve as-is, do not encode symbolically
136+
137+
---
138+
139+
## PRESERVE_VERBATIM
140+
- file names and paths (rule file references AND output artifact paths)
141+
- tag names (e.g. `[Answer]:`)
142+
- allowed/forbidden character sets
143+
- format patterns AI must reproduce exactly (question file structure, completion message templates)
144+
- markdown code blocks defining required output formats
145+
- output document templates including all section headers
146+
- output directory structure specifications
147+
- handoff message formats referencing specific file paths
148+
- enumerated sub-categories under evaluation/analysis areas (these are behavioral cues that guide output specificity, not compressible prose)
149+
- per-step artifact granularity in plan/generation sequences (source has N steps → distilled has N steps, never fewer)
150+
- FULL markdown template code blocks that define the structure of output artifacts (e.g., build-instructions.md template, unit-test-instructions.md template, build-and-test-summary.md template) — these are output specifications, NEVER replace with section name summaries
151+
152+
## DROP
153+
- worked examples unless example IS the required format
154+
- ALL emoji from rule/instruction prose (exception: emoji inside output template code blocks preserved)
155+
- decorative formatting: bold single-line rules, horizontal rules, italic emphasis, emoji bullet markers
156+
- summaries restating rules already in body
157+
- rationale and "why" context
158+
- section headers adding no information beyond content below them
159+
160+
---
161+
162+
## UPDATE_EXISTING
163+
164+
source_file_changed:
165+
1. read modified source file in full
166+
2. find corresponding section in distilled file (by section header)
167+
3. identify delta: new | removed | modified | reworded rule
168+
4. delta affects output artifacts → update output manifest
169+
5. apply delta using 5_SEMANTIC_COMPRESSION + 6_SYMBOLIC_ENCODING
170+
6. run QUALITY_CHECK
171+
172+
new_source_file_added:
173+
1. read new source file
174+
2. belongs in existing distilled file (new section) | needs own distilled file
175+
3. existing → full pipeline on new content, append as `## SECTION_NAME`
176+
4. new file → create `aidlc-rules-distilled/{folder}/{name}.md`, full pipeline
177+
5. extract new output artifact specs → add to output manifest
178+
6. run QUALITY_CHECK
179+
180+
source_file_removed:
181+
1. find corresponding section in distilled file
182+
2. remove section
183+
3. distilled file now empty → delete it
184+
4. run QUALITY_CHECK
185+
186+
## FULL_REBUILD
187+
188+
when: major restructuring, or distilled files suspected out of sync
189+
190+
1. delete all files in `aidlc-rules-distilled/`
191+
2. scan `aidlc-rules/` recursively → group by merge strategy
192+
3. extract complete output manifest from all source rules
193+
4. process backwards: extensions → operations → construction → inception → common → core-workflow
194+
5. each group: full pipeline (steps 1–6)
195+
6. QUALITY_CHECK on every distilled file
196+
7. OUTPUT_EQUIVALENCE_CHECK
197+
198+
---
199+
200+
## QUALITY_CHECK
201+
202+
after any update:
203+
- every behavior-changing rule present in distilled version
204+
- no distilled rule contradicts source
205+
- no rule appears >1 time across sections in a distilled file
206+
- distilled file shorter than combined source(s) — if not, re-compress
207+
- no examples unless they define a required format
208+
- sections in execution order
209+
- cross-references use correct `@path` and `## SECTION` targets
210+
- all `[REQ]` markers preserved
211+
- no emoji in rule/instruction prose (only inside verbatim output template code blocks)
212+
213+
### Step Granularity Check
214+
- count numbered plan/generation steps in source → count in distilled → must be equal
215+
- no `+` or `&` joining what were separate steps in source
216+
- each source step that produces a distinct artifact → separate numbered item in distilled
217+
218+
### Sub-Category Preservation Check
219+
- for every source section that lists evaluation areas with sub-items (e.g., completeness analysis, question categories), verify distilled preserves the sub-items as inline lists
220+
- missing sub-items = FAIL (they are behavioral cues, not droppable prose)
221+
222+
### Template Code Block Check
223+
- for every source step that contains a markdown code block template defining an output artifact's structure, verify the distilled version preserves the COMPLETE template code block verbatim
224+
- template replaced with section name summary = FAIL (AI cannot produce correct output without the full template)
225+
- count template code blocks in source → count in distilled → must be equal
226+
227+
## OUTPUT_EQUIVALENCE_CHECK
228+
229+
after any update:
230+
- every output file path in source rules → confirmed in distilled rules
231+
- every document template in source → preserved verbatim in distilled
232+
- every required section header in output templates → preserved
233+
- distinct output artifact count: source == distilled
234+
- any missing or merged → FAIL, fix before proceeding
235+
- output manifest from 1_SCAN vs distilled content → full coverage required

SETUP.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,9 @@ Set up AI-DLC in this project by doing the following:
2929
- Any other agent → create `AGENTS.md`
3030
3131
2. The file content should be:
32-
When the user invokes AI-DLC, read and follow `.aidlc/aidlc-rules/core-workflow.md`
32+
When the user invokes AI-DLC, read and follow `.aidlc/aidlc-rules-distilled/core-workflow.md`
3333
to start the workflow. All rule detail files referenced by the workflow are
34-
located under `.aidlc/aidlc-rules/`.
34+
located under `.aidlc/aidlc-rules-distilled/`.
3535
3636
3. Add `.aidlc` to `.gitignore` unless I explicitly ask you not to.
3737

0 commit comments

Comments
 (0)