Building Reliable Multi-Agent Claude Code Plugins

An Empirical Investigation of Enforcement Mechanisms

An empirical investigation into what actually works when building multi-agent plugins for Claude Code — and what doesn't, despite what the documentation says.

Why This Exists

Prompts are suggestions. The LLM can ignore them silently.

When you build a multi-agent system that needs to guarantee execution, order, enforce TDD, or block unsafe operations, you need mechanical enforcement — not prose.
This report documents 6 empirical tests, 12 GitHub issues, and community patterns from systems with up to 112 agents in production.

Key Findings

Finding	Status
`exit code 2` in PreToolUse hooks blocks tool calls	Works
`exit code 1` blocks tool calls	Does not block (by design)
`permissionDecision: "deny"` blocks tool calls	Docs say yes, issue #4669 says no
Subagents can spawn subagents	No — absolute restriction
Blocked Write/Edit? Claude bypasses via Bash (`sed`, `echo >`, `python3 -c`)	Known vector — requires additional Bash hook
Hooks can trigger infinite loops without recursion guards	3 open issues (#10205, #9579, #9704)
Agent frontmatter (hooks, skills) works in teammates	No — bug #30703

Validated Patterns

Hub-and-spoke orchestration: A central "brain" agent as main
session (--agent namespace:brain) spawns specialist subagents
Recursion guards: Temporary flag files prevent infinite hook
loops when hooks propagate to subagents
Bash bypass defense: When Write/Edit are blocked, Claude uses sed, python3 -c, or echo > instead — a second hook on
Bash closes this vector
Prerequisite gates: Hooks verify artifact existence before
allowing specialist agents to execute

Empirical Validation

These patterns were validated through a complete Express-to-NestJS
migration of a multi-tenant production project:

11 agents spawned with real parallelism (up to 3 concurrent)
Quality gate rejected first pass (44% coverage), forced correction to 93%
422 tests, zero regressions, TypeScript strict with zero errors

Community References

Blake Crosley: 95 hooks in production over 9 months
kenryu42/claude-code-safety-net: semantic analysis + 5-level
recursive wrapper detection
wshobson/agents: 112 agents, 16 orchestrators
Issue #29795: 5-layer QA system built from 68 documented failures

Full Report

Credits

Author: Diego Cheloni https://github.com/Dich01/

Date: March 14, 2026

Environment: Claude Code CLI (March 2026), Claude Opus 4.6

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Building-Reliable-Multi-Agent-Claude-Code-Plugins-EN.md		Building-Reliable-Multi-Agent-Claude-Code-Plugins-EN.md
Investigación de mecanismos de Enforcement en Plugins Multi-Agente Confiables para Claude Code.md		Investigación de mecanismos de Enforcement en Plugins Multi-Agente Confiables para Claude Code.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building Reliable Multi-Agent Claude Code Plugins

An Empirical Investigation of Enforcement Mechanisms

Why This Exists

Prompts are suggestions. The LLM can ignore them silently.

Key Findings

Validated Patterns

Empirical Validation

Community References

Full Report

Credits

About

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Building Reliable Multi-Agent Claude Code Plugins

An Empirical Investigation of Enforcement Mechanisms

Why This Exists

Prompts are suggestions. The LLM can ignore them silently.

Key Findings

Validated Patterns

Empirical Validation

Community References

Full Report

Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Packages