Rule System Reform — 65% rule token reduction + 28% full prose compression #908
Replies: 13 comments 16 replies
-
|
Nice work. I'll watch with interest. Please post updates here when you have them. |
Beta Was this translation helpful? Give feedback.
-
|
Yeah, this looks really valuable. Perhaps can be turned into a |
Beta Was this translation helpful? Give feedback.
-
|
Good day @jlacour-git , I would definitely be interested. IF you were so kind and could share this, Id be happy to learn :) Thank you kindly! |
Beta Was this translation helpful? Give feedback.
-
|
Hey @Raazgar! The rule reform methodology is explained in the original post above. The short version: lateral reclassification (move rules to where they naturally belong instead of one giant rules file), priority hierarchy so rules don't conflict, and aggressive prose compression with two quality gates (Scenario Test + Walkthrough Test). For the practical tooling side — tracking local patches and surviving upgrades — check out #923 where I shared the tools as a gist: https://gist.github.com/jlacour-git/0e2ab62014dc5bcc3977be82ba26e68a |
Beta Was this translation helpful? Give feedback.
-
|
The "instruction fatigue" framing is the key insight. Rules written as prose all look the same to the model. Constraint, context, style preference, format spec: mixed together without semantic labels. The model has to re-infer what kind of rule each one is on every response, and at high density it stops tracking them reliably. Explicit block types solve this at the prompt level. A constraints block, an output_format block, a role block: each signals a different kind of instruction. The model doesn't have to classify before applying. Reduced token count and cleaner compliance follow naturally. I've been building flompt around exactly this, a visual prompt builder that decomposes prompts into 12 typed semantic blocks and compiles to Claude-optimized XML. Open-source: github.com/Nyrok/flompt |
Beta Was this translation helpful? Give feedback.
-
|
Hey @Nyrok! The "instruction fatigue" framing resonates — that's exactly what we observed. Rules as prose all look the same to the model, and at high density it just stops tracking them reliably. Your typed block concept is interesting. I've been looking at flompt and started experimenting with applying it to my steering rules. Specifically, I'm wrapping my highest-priority constraint rules in XML Early experiment — no results yet. But the decomposition principle (separate constraints from process instructions from identity directives) already made the rule structure cleaner even before considering the XML angle. Will share findings here once I have enough sessions to compare against the baseline! |
Beta Was this translation helpful? Give feedback.
-
|
Hey @virtualian @Drizzt321 @Raazgar @Nyrok — update as promised! The rule reform from this post was the foundation, but the adherence problem needed its own investigation. I ran a full Science → Council → RedTeam pipeline on why the AI keeps skipping rules despite the reformed structure. Short version: the Full writeup with methodology, before/after, and measurement plan: #945 Early data — not claiming victory yet. Will post measurement results once I have 20 sessions. |
Beta Was this translation helpful? Give feedback.
-
|
I honestly wonder if it is possible to use instructions to enforce agent behavior (agent as in Claude Code, Codex, OpenCode, etc). Today I was working in a super basic setup at work. Only skill is Anthropic frontend-design skill. CLAUDE.md file with project info and tech stack info in less than 10 lines, and a reference to 1 user stories file. The user stories file included 1 workflow, at a high level, implement story, lint, validate, if pass, commit, move to next story, again like 10 lines. There were 3 stories of about 3-4 lines each. Very lean, to-the-point context. I used Plan mode to generate a PLAN.md, reviewed it's proposals. Thought they looked OK, so told it to proceed. It didn't follow the workflow at all. It implemented all three stories in one go, didn't lint or test until the end, and never commited at all. I asked if the instructions were unclear so I could improve them. It said no, the instructions were clear. It just didn't follow them. It chose to implement a different way that it thought was faster. To be clear I've used this same setup before and it worked OK. Today the agent just went off-script. "Models struggle with complex instructions According to him, once the number of instructions crosses eight, the models start dropping some directives. For businesses that rely on precision and consistency, this behaviour poses a serious risk." In the next paragraph the speak about "AI drift" which Claude Code just implemented the /btw command to help address. Anthropic thinks this was worth addressing. Ignoring the clickbait, the nuggets of insight are from: Salesforce is still using plenty of AI. The point is even large corporations with big budgets and departments of AI engineers are seeing these same issues. |
Beta Was this translation helpful? Give feedback.
-
|
I think something must be broken because that's the exact opposite experience that I've had and most people have. I just tried a default Claude Code experience again recently, and it was nowhere near as good as staying on the rails.
…On Fri, Mar 13, 2026 at 11:20 AM, Drizzt321 < ***@***.*** > wrote:
In regards to implementing things in one go, etc, I've actually found
since installing PAI, it's degraded from my stock Claude Code with a
relatively small amount of memory guidance I built up over a few weeks of
doing some development.
I think doing this rule analysis/replacement is going to prove very
valuable, to reduce what is actually seen, and do a better job of priority
classification to interact with the PAI core rules/guidance.
—
Reply to this email directly, view it on GitHub (
#908 (reply in thread)
) , or unsubscribe (
https://github.com/notifications/unsubscribe-auth/AAAMLXW67U5PYVLKO5DKNGD4QRGQVAVCNFSM6AAAAACWHX7UGSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTMMJSGIZDKMY
).
You are receiving this because you were mentioned. Message ID: <danielmiessler/Personal_AI_Infrastructure/repo-discussions/908/comments/16122253
@ github. com>
|
Beta Was this translation helpful? Give feedback.
-
|
I find that PAI is a lot better at staying on the rails than vanilla. Also vanilla Claude Code with a subscription is better than other agents with vanilla configs. My example above was using VS Code with GitHub Copilot Chat with a GitHub Enterprise account which exposes a list of models from multiple vendors, and letting "Copilot" choose the model which usually means Open AI Codex. My point is, at some point, I've had every agent harness and model, including PAI, not follow instructions. I think it's part of what Daniel has talked about many times, scaffolding. PAI has a lot of scaffolding, and it helps. But every setup I've used has this issue. It's just a matter by what degree. My question to those following this discussion is if others think that instructions (prose in Markdown) can be guardrails versus suggestions, no matter how strongly worded. |
Beta Was this translation helpful? Give feedback.
-
|
Also make sure that your projects file and your steering rules are getting loaded by default on startup. That setting is inside of settings.json.
…On Fri, Mar 13, 2026 at 3:08 PM, Drizzt321 < ***@***.*** > wrote:
Now I recently had it move from a learnings/memory to ~/. claude/ PAI/ USER/
AISTEERINGRULES. md ( http://~/.claude/PAI/USER/AISTEERINGRULES.md ) an
extracted rule around not charging ahead for code/work, but that was just
last night so haven't really seen if that helps.
Or maybe I just hadn't run the needed "take the learnings and materialize
them" yet? Is that a thing? I need to do a good, full tour of the expected
flow/usage in those terms.
—
Reply to this email directly, view it on GitHub (
#908 (reply in thread)
) , or unsubscribe (
https://github.com/notifications/unsubscribe-auth/AAAMLXVVEYAQ7WZ2GVLY4H34QSBGPAVCNFSM6AAAAACWHX7UGSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTMMJSGQZTKOA
).
You are receiving this because you were mentioned. Message ID: <danielmiessler/Personal_AI_Infrastructure/repo-discussions/908/comments/16124358
@ github. com>
|
Beta Was this translation helpful? Give feedback.
-
|
Yeah, there's a workflow in PAI upgrade where you can ask for examples of how to upgrade your particular algorithm.
We don't do it by default because it might be custom for you.
…On Fri, Mar 13, 2026 at 4:54 PM, Drizzt321 < ***@***.*** > wrote:
Are LEARNINGS loaded by default? Hm. According to my instance, only the
most recent 3 system learnings, algorithm learnings, last 2 days of
relationship notes. So looks like if it's in LEARNINGS/, it might not
actually get loaded.
Exploring where I had those rules, they *weren't* being actually loaded,
since the only place they were referenced was MEMORY. md (
http://memory.md/ ) as an index file listing those others, not actually the
rules.
@ danielmiessler ( https://github.com/danielmiessler ) Is there supposed to
be a regular "evaluate learnings and move them to PAI/ USER/ AISTEERINGRULES.
md ( http://pai/USER/AISTEERINGRULES.md ) type process I'm supposed to be
doing?
—
Reply to this email directly, view it on GitHub (
#908 (reply in thread)
) , or unsubscribe (
https://github.com/notifications/unsubscribe-auth/AAAMLXW24XQXKWCQ5YZ7FST4QSNTHAVCNFSM6AAAAACWHX7UGSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTMMJSGUZDENI
).
You are receiving this because you were mentioned. Message ID: <danielmiessler/Personal_AI_Infrastructure/repo-discussions/908/comments/16125225
@ github. com>
|
Beta Was this translation helpful? Give feedback.
-
|
@virtualian Good questions. Here's how the loop actually works for me: How rules get created: The Digest skill scans
So yes — the digest output does generate actual rules. But it's not automatic. I review each proposal and decide whether it becomes a steering rule, a hook change, or gets rejected. Human in the loop every time. The key difference from vanilla Closing the loop on synthesis: The raw synthesis output is too verbose for a prompt — you're right about that. The Digest skill's fix is to extract atomic proposals from it. "Tool X failed because Y" becomes a specific rule: "Before X, verify Y." Each one is 1-2 lines in the steering rules file. I just shared the complete Digest skill files (SKILL.md + workflow) in #946 if you want to try it: https://gist.github.com/jlacour-git/bb9e8b6e88ce7e6afa20fd4251beca37 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey everyone!
I just ran a holistic review of my entire rule system (SYSTEM AISteering, USER AISteering, Algorithm, CLAUDE.md) and wanted to share the approach and results. Might be useful for anyone who's been adding rules over time and wondering if the system is getting too heavy.
The problem
My rule system grew organically to 41 rules across 4 layers, consuming ~8,200 always-loaded tokens (~13,100 with Algorithm). Every rule was born from a real failure. But performance was sitting at 3-4/10 on average — despite all those rules.
The hypothesis: the rules were competing with each other for attention. At that density, instruction fatigue degrades compliance rather than improving it. More rules ≠ better behavior.
The approach
Three frameworks made the difference:
1. Lateral reclassification — "Is this actually a rule?"
Not everything in a steering rules file should BE a steering rule. I found four types of content mixed together:
7 of my 23 USER rules turned out to not be rules at all. A learning digest workflow was a procedure. File routing paths were a routing entry. Session hygiene conventions were just facts.
2. Priority hierarchy — Trust > Correctness > Quality > Efficiency
The biggest structural issue was that all 41 rules had equal weight. "Check before classifying" (Trust) competed with "Minimize tokens" (Efficiency) with no way to resolve conflicts.
Now rules are grouped by priority level. Trust rules go at the top of the file (highest positional attention) and never yield. Efficiency rules go at the bottom and yield to everything above. A hierarchy preamble at the top and a priority reminder at the bottom counter the U-shaped attention curve — LLMs attend most to the beginning and end of text, so the closing reminder re-anchors the hierarchy right where Efficiency gets its recency boost. When "save tokens" conflicts with "spawn a specialized agent," the hierarchy resolves it: Quality > Efficiency, spawn the agent.
3. Algorithm capability enforcement (B+C)
Selected capabilities now become ISC criteria in the PRD (
ISC-C1: FirstPrinciples invoked via Skill tool). They get verified through the existing checkbox mechanism. The verbose capability selection guidance in the Algorithm (~250 tokens) got replaced with ~60 tokens that say "list them, they become criteria, invoke or remove."The results
Zero behavioral coverage lost. Every failure mode still has a rule — just expressed more concisely and in the right location.
What I'd do differently
The analysis used a 4-agent Council debate (Prompt Engineer, Cognitive Load Designer, Systems Architect, Status Quo Defender). The Status Quo Defender was essential — without that voice pushing back, I would have cut too aggressively. The phased approach (analyze → debate → implement) caught gaps that a single pass would have missed.
The biggest miss in the first pass was only asking "how to compress?" instead of "should this be here at all?" The lateral reclassification framework came from a follow-up question, not from the initial analysis.
Prose compression — the second pass
After the structural reform, I ran a separate prose-level compression pass — first on steering rules, then on all LLM-processed documents (Algorithm, CLAUDE.md, PRDFORMAT, CONTEXT_ROUTING, PROJECTS, MEMORY).
The key insight was that compression can be lossy — dropping a phrase like "to obscure negative information" from a passive-voice rule changes it from a scoped correction to a blanket style ban. Different behavioral effect.
The fix: a Scenario Test for each compression — "Can I construct a scenario where the original wording catches a mistake but the compressed version doesn't?" 7 of 13 initial steering rule compressions failed this test. After restoring the critical phrases, all 13 passed.
For the Algorithm (PAI's core workflow engine), the Scenario Test wasn't enough. Three additional checks were needed:
The takeaway: per-rule testing is necessary but not sufficient for systemic documents. A workflow engine has interaction effects between sections that single-rule tests can't catch.
Combined results
Biggest savings came from reducing verbose examples (35-line ISC decomposition example → 3 lines), templating repeated phase preambles (7 identical blocks → 1 template instruction), and removing informational-but-not-instructional sections (PRDFORMAT Design Rationale).
Early impressions
After running with the reformed + compressed system:
Happy to share the full analysis doc or the before/after steering rule files if anyone wants to try this on their own setup.
Beta Was this translation helpful? Give feedback.
All reactions