Skip to content

Dich01/claude-code-multi-agent-research

Repository files navigation

Building Reliable Multi-Agent Claude Code Plugins

An Empirical Investigation of Enforcement Mechanisms

An empirical investigation into what actually works when building multi-agent plugins for Claude Code — and what doesn't, despite what the documentation says.

Why This Exists

Prompts are suggestions. The LLM can ignore them silently.

When you build a multi-agent system that needs to guarantee execution, order, enforce TDD, or block unsafe operations, you need mechanical enforcement — not prose.
This report documents 6 empirical tests, 12 GitHub issues, and community patterns from systems with up to 112 agents in production.

Key Findings

Finding Status
exit code 2 in PreToolUse hooks blocks tool calls Works
exit code 1 blocks tool calls Does not block (by design)
permissionDecision: "deny" blocks tool calls Docs say yes, issue #4669 says no
Subagents can spawn subagents No — absolute restriction
Blocked Write/Edit? Claude bypasses via Bash (sed, echo >, python3 -c) Known vector — requires additional Bash hook
Hooks can trigger infinite loops without recursion guards 3 open issues (#10205, #9579, #9704)
Agent frontmatter (hooks, skills) works in teammates No — bug #30703

Validated Patterns

  • Hub-and-spoke orchestration: A central "brain" agent as main
    session (--agent namespace:brain) spawns specialist subagents
  • Recursion guards: Temporary flag files prevent infinite hook
    loops when hooks propagate to subagents
  • Bash bypass defense: When Write/Edit are blocked, Claude uses sed, python3 -c, or echo > instead — a second hook on
    Bash closes this vector
  • Prerequisite gates: Hooks verify artifact existence before
    allowing specialist agents to execute

Empirical Validation

These patterns were validated through a complete Express-to-NestJS
migration of a multi-tenant production project:

  • 11 agents spawned with real parallelism (up to 3 concurrent)
  • Quality gate rejected first pass (44% coverage), forced correction to 93%
  • 422 tests, zero regressions, TypeScript strict with zero errors

Community References

  • Blake Crosley: 95 hooks in production over 9 months
  • kenryu42/claude-code-safety-net: semantic analysis + 5-level
    recursive wrapper detection
  • wshobson/agents: 112 agents, 16 orchestrators
  • Issue #29795: 5-layer QA system built from 68 documented failures

Full Report

Credits

Author: Diego Cheloni https://github.com/Dich01/

Date: March 14, 2026

Environment: Claude Code CLI (March 2026), Claude Opus 4.6

About

An empirical technical report on building reliable multi-agent Claude Code plugins. This research uncovers critical CLI discrepancies, documents why exit code 2 is the only reliable blocking hook, and provides a framework for complex orchestration.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors