Skip to content

mycarta/llm-operational-discipline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Operational Discipline

Operational playbook for sustained LLM projects


What This Is

This repository is a companion to the blog post "Operational Discipline for LLM Projects: What It Actually Takes". It contains field-tested systems for managing Claude (or any LLM) on complex, multi-session projects where context management, scope control, and output verification become critical.

These aren't theoretical best practices — they're defensive infrastructure built in response to specific, recurring failure modes encountered during sustained LLM collaboration on multi-document projects spanning dozens of sessions.


What's in the Repo

Playbook (4 documents)

Core operational documents for managing LLM projects:

Research Prompt

Writing (2 documents)

Guidelines for AI-assisted content drafting:

  • faithful_narration_rules.md — 20 rules for instructing Claude to draft content in your voice without editorializing, fabricating scenes, attributing intent to the tool, or filling epistemic gaps with plausible-sounding content. Each rule grounded in a specific documented failure across five blog post projects
  • Blog_From_Project_Instructions.md — Workflow for using a lightweight model to draft and a frontier model to QA blog posts from project materials

Templates

  • copilot-instructions-template.md — Behavioral guardrails for GitHub Copilot (Chat + Agent Mode), derived from the same failure modes documented in the blog post. Covers scope protection, verification discipline, merge protection, and anti-sycophancy. Copy this template to .github/copilot-instructions.md in any repository where you use Copilot. Customize the Project Context section per repo; the behavioral sections remain stable across projects.

Notes

Architecture and design notes on the methodology:

  • claude_code_architecture_mapping.md — Mapping between the current research methodology system (Claude Projects) and the Claude Code skills/agents architecture described by Shane Butler. Documents the structural parallel, where the two approaches diverge (syntactic vs. semantic error surfaces), and what a migration to Claude Code would look like.

References

Annotated external sources that validate and extend the operational discipline framework:

  • hafner-beyond-the-vibes.md — Summary and key takeaways from Robert Hafner's comprehensive guide to AI coding assistants and agents (AGENTS.md, spec-driven development, anti-sycophancy guardrails)
  • bullshitbench.md — Summary and relevance of Peter Gostev's empirical benchmark measuring model pushback on nonsensical prompts
  • claude-code-project-template.md — Analysis of a recommended Claude Code project structure, with mapping to this repo's equivalents

Minor References

  • palantir-ontology-augmented-generation.md — Palantir (2025), "Connecting AI to Decisions with the Palantir Ontology." LLMs need access to logic assets (functions, models, optimizers), not just data (RAG). Their "Ontology-Augmented Generation" (OAG) framing = deterministic tools surfaced to non-deterministic LLM reasoning. Bullshit-detector's repo-as-agent-registry is a lightweight implementation of the same pattern. The enterprise orchestration layer is overkill for constrained analytical pipelines.

Quick Start

If you're setting up a new Claude project:

  1. Start with Claude_Project_Setup_Guide.md and follow the setup steps
  2. Paste Claude_Project_Instructions.md into your Claude project settings
  3. Keep Claude_Context_Cheat_Sheet.md open for reference during sessions
  4. Bookmark Document_Recovery_Prompts.md for when things go wrong

If you're doing AI-assisted research:

  1. Use Research_Project_System_Prompt_v3.md as your Claude project instructions
  2. The prompt enforces source classification, decision checkpoints, and evidence grading — read the standing rules before starting

If you're drafting content with Claude:

  1. Use faithful_narration_rules.md as project instructions for any drafting session
  2. Follow the Blog_From_Project_Instructions.md pipeline for blog posts

If you're rescuing an existing degraded project:

  • Use the Project Audit Prompt in Document_Recovery_Prompts.md to assess context health
  • Apply the Two-Step Merge Recovery protocol if Claude has damaged existing prose

The Failure Modes This Addresses

This playbook was built to prevent and recover from specific, documented failure patterns:

  • Compaction data corruption — Information loss and distortion when Claude summarizes project context during automatic compaction
  • Scope violations / merge damage — Claude rewriting existing prose when instructed to add new content, creating inconsistency and regression
  • Fabrication under questioning (Layer 2) — Claude inventing citations from source documents when challenged, requiring verification against actual file contents
  • Sycophancy — Agreeing with user feedback even when it contradicts source material or project requirements
  • Context bloat → premature compaction — Inefficient context usage forcing earlier-than-necessary summarization with attendant data loss
  • Groundhog Day effect — Repeating resolved issues across sessions due to context degradation
  • Evidence weight inflation — Citing real sources at higher evidential weight than they support, constructing arguments that look rigorous but aren't
  • Faithful narration failures — Editorializing the user's experience, attributing intent to AI tools, upgrading speculative statements to declarative ones, fabricating scenes and timeframes

For detailed analysis of each failure mode and the specific mechanisms that address them, see the blog posts.


Repository Structure

llm-operational-discipline/
├── README.md
├── LICENSE
├── notes/
│   └── claude_code_architecture_mapping.md
├── playbook/
│   ├── Claude_Context_Cheat_Sheet.md
│   ├── Claude_Project_Instructions.md
│   ├── Claude_Project_Setup_Guide.md
│   └── Document_Recovery_Prompts.md
├── references/
│   ├── hafner-beyond-the-vibes.md
│   ├── bullshitbench.md
│   ├── claude-code-project-template.md
│   └── palantir-ontology-augmented-generation.md
├── research-prompt/
│   └── Research_Project_System_Prompt_v3.md
├── templates/
│   └── copilot-instructions-template.md
└── writing/
    ├── faithful_narration_rules.md
    └── Blog_From_Project_Instructions.md

Related Resources

External references that validate and extend the operational discipline framework from different angles. Full annotated summaries in the references/ directory.

  • Beyond the Vibes: A Rigorous Guide to AI Coding Assistants and Agents (Robert Hafner, March 2026) — Covers AGENTS.md, spec-driven development, anti-sycophancy guardrails, and context window management from a software engineering perspective. Same problems, same solutions, different domain.
  • BullshitBench (Peter Gostev) — Empirical benchmark measuring model pushback on nonsensical prompts. Anthropic models dominate; Sonnet pushes back harder than Opus. Direct evidence for the sycophancy failure mode documented in this repo.
  • Claude Code Project Structure Template — Recommended repo layout formalizing the pattern of repository structure as AI instruction set. Introduces hooks/ (automated guardrails) and per-directory CLAUDE.md files. See references/claude-code-project-template.md.

License

This project is licensed under the MIT License - see the LICENSE file for details.


Author

Matteo Niccoli


Acknowledgment

Built collaboratively with Claude (Opus and Sonnet). The irony is noted.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors