Operational playbook for sustained LLM projects
This repository is a companion to the blog post "Operational Discipline for LLM Projects: What It Actually Takes". It contains field-tested systems for managing Claude (or any LLM) on complex, multi-session projects where context management, scope control, and output verification become critical.
These aren't theoretical best practices — they're defensive infrastructure built in response to specific, recurring failure modes encountered during sustained LLM collaboration on multi-document projects spanning dozens of sessions.
Core operational documents for managing LLM projects:
- Claude_Context_Cheat_Sheet.md — Quick reference for context management commands, file structure conventions, and merge protection patterns
- Claude_Project_Instructions.md — Complete project instructions template. Paste this into your Claude project settings to establish baseline operational discipline
- Claude_Project_Setup_Guide.md — Step-by-step guide for setting up a new Claude project with proper context management from day one
- Document_Recovery_Prompts.md — Recovery procedures for common failure modes: merge damage, fabrication verification, project audits
- Research_Project_System_Prompt_v3.md — System prompt for evidence-based critical analysis projects. Five-step workflow with source classification (Primary/Secondary × Direct/Analogical/Contextual), source inventory, six decision checkpoints, and 16 standing rules. Built from documented failures during a real research session — see the companion blog post "When Your AI Research Partner Fails the Peer Review"
Guidelines for AI-assisted content drafting:
- faithful_narration_rules.md — 20 rules for instructing Claude to draft content in your voice without editorializing, fabricating scenes, attributing intent to the tool, or filling epistemic gaps with plausible-sounding content. Each rule grounded in a specific documented failure across five blog post projects
- Blog_From_Project_Instructions.md — Workflow for using a lightweight model to draft and a frontier model to QA blog posts from project materials
- copilot-instructions-template.md — Behavioral guardrails for GitHub Copilot (Chat + Agent Mode), derived from the same failure modes documented in the blog post. Covers scope protection, verification discipline, merge protection, and anti-sycophancy. Copy this template to
.github/copilot-instructions.mdin any repository where you use Copilot. Customize the Project Context section per repo; the behavioral sections remain stable across projects.
Architecture and design notes on the methodology:
- claude_code_architecture_mapping.md — Mapping between the current research methodology system (Claude Projects) and the Claude Code skills/agents architecture described by Shane Butler. Documents the structural parallel, where the two approaches diverge (syntactic vs. semantic error surfaces), and what a migration to Claude Code would look like.
Annotated external sources that validate and extend the operational discipline framework:
- hafner-beyond-the-vibes.md — Summary and key takeaways from Robert Hafner's comprehensive guide to AI coding assistants and agents (AGENTS.md, spec-driven development, anti-sycophancy guardrails)
- bullshitbench.md — Summary and relevance of Peter Gostev's empirical benchmark measuring model pushback on nonsensical prompts
- claude-code-project-template.md — Analysis of a recommended Claude Code project structure, with mapping to this repo's equivalents
- palantir-ontology-augmented-generation.md — Palantir (2025), "Connecting AI to Decisions with the Palantir Ontology." LLMs need access to logic assets (functions, models, optimizers), not just data (RAG). Their "Ontology-Augmented Generation" (OAG) framing = deterministic tools surfaced to non-deterministic LLM reasoning. Bullshit-detector's repo-as-agent-registry is a lightweight implementation of the same pattern. The enterprise orchestration layer is overkill for constrained analytical pipelines.
If you're setting up a new Claude project:
- Start with Claude_Project_Setup_Guide.md and follow the setup steps
- Paste Claude_Project_Instructions.md into your Claude project settings
- Keep Claude_Context_Cheat_Sheet.md open for reference during sessions
- Bookmark Document_Recovery_Prompts.md for when things go wrong
If you're doing AI-assisted research:
- Use Research_Project_System_Prompt_v3.md as your Claude project instructions
- The prompt enforces source classification, decision checkpoints, and evidence grading — read the standing rules before starting
If you're drafting content with Claude:
- Use faithful_narration_rules.md as project instructions for any drafting session
- Follow the Blog_From_Project_Instructions.md pipeline for blog posts
If you're rescuing an existing degraded project:
- Use the Project Audit Prompt in Document_Recovery_Prompts.md to assess context health
- Apply the Two-Step Merge Recovery protocol if Claude has damaged existing prose
This playbook was built to prevent and recover from specific, documented failure patterns:
- Compaction data corruption — Information loss and distortion when Claude summarizes project context during automatic compaction
- Scope violations / merge damage — Claude rewriting existing prose when instructed to add new content, creating inconsistency and regression
- Fabrication under questioning (Layer 2) — Claude inventing citations from source documents when challenged, requiring verification against actual file contents
- Sycophancy — Agreeing with user feedback even when it contradicts source material or project requirements
- Context bloat → premature compaction — Inefficient context usage forcing earlier-than-necessary summarization with attendant data loss
- Groundhog Day effect — Repeating resolved issues across sessions due to context degradation
- Evidence weight inflation — Citing real sources at higher evidential weight than they support, constructing arguments that look rigorous but aren't
- Faithful narration failures — Editorializing the user's experience, attributing intent to AI tools, upgrading speculative statements to declarative ones, fabricating scenes and timeframes
For detailed analysis of each failure mode and the specific mechanisms that address them, see the blog posts.
llm-operational-discipline/
├── README.md
├── LICENSE
├── notes/
│ └── claude_code_architecture_mapping.md
├── playbook/
│ ├── Claude_Context_Cheat_Sheet.md
│ ├── Claude_Project_Instructions.md
│ ├── Claude_Project_Setup_Guide.md
│ └── Document_Recovery_Prompts.md
├── references/
│ ├── hafner-beyond-the-vibes.md
│ ├── bullshitbench.md
│ ├── claude-code-project-template.md
│ └── palantir-ontology-augmented-generation.md
├── research-prompt/
│ └── Research_Project_System_Prompt_v3.md
├── templates/
│ └── copilot-instructions-template.md
└── writing/
├── faithful_narration_rules.md
└── Blog_From_Project_Instructions.md
External references that validate and extend the operational discipline framework from different angles. Full annotated summaries in the references/ directory.
- Beyond the Vibes: A Rigorous Guide to AI Coding Assistants and Agents (Robert Hafner, March 2026) — Covers AGENTS.md, spec-driven development, anti-sycophancy guardrails, and context window management from a software engineering perspective. Same problems, same solutions, different domain.
- BullshitBench (Peter Gostev) — Empirical benchmark measuring model pushback on nonsensical prompts. Anthropic models dominate; Sonnet pushes back harder than Opus. Direct evidence for the sycophancy failure mode documented in this repo.
- Claude Code Project Structure Template — Recommended repo layout formalizing the pattern of repository structure as AI instruction set. Introduces
hooks/(automated guardrails) and per-directoryCLAUDE.mdfiles. See references/claude-code-project-template.md.
This project is licensed under the MIT License - see the LICENSE file for details.
Matteo Niccoli
- Blog: mycartablog.com
- LinkedIn: Matteo Niccoli
Built collaboratively with Claude (Opus and Sonnet). The irony is noted.