Skip to content

Latest commit

 

History

History
533 lines (400 loc) · 24.4 KB

File metadata and controls

533 lines (400 loc) · 24.4 KB

Development Methodologies Reference

Confidence: Tier 2 — Validated by multiple production reports and official documentation.

Last updated: February 2026

This is a quick reference for 15 structured development methodologies that have emerged for AI-assisted development in 2025-2026. For hands-on practical workflows, see workflows/.


Table of Contents

  1. Decision Tree
  2. The 15 Methodologies
  3. SDD Tools Reference
  4. Writing Effective Specs
  5. Combination Patterns
  6. Sources

Decision Tree: What Do You Need?

┌─ "I want quality code" ────────────→ workflows/tdd-with-claude.md
│
├─ "I want to spec before code" ─────→ workflows/spec-first.md
│
├─ "I need to plan architecture" ────→ workflows/plan-driven.md
│
├─ "I'm iterating on something" ─────→ workflows/iterative-refinement.md
│
└─ "I need methodology theory" ──────→ Continue reading below

The 15 Methodologies

Organized in a 6-tier pyramid from strategic orchestration down to optimization techniques.

Tier 1: Strategic Orchestration

Name What Best For Claude Fit
BMAD Multi-agent governance with constitution as guardrail Enterprise 10+ teams, long-term projects ⭐⭐ Niche but powerful
GSD Meta-prompting 6-phase workflow with fresh contexts per task Solo devs, Claude Code CLI ⭐⭐ Similar to patterns in guide

BMAD (Breakthrough Method for Agile AI-Driven Development) inverts the traditional paradigm: documentation becomes the source of truth, not code. Uses specialized agents (Analyst, PM, Architect, Developer, QA) orchestrated with strict governance. Note: BMAD's role-based agent naming reflects their methodology; see §9.17 Agent Anti-Patterns for scope-focused alternatives.

  • Key concept: Constitution.md as strategic guardrail
  • When to use: Complex enterprise projects needing governance
  • When to avoid: Small teams, MVPs, rapid prototyping

GSD (Get Shit Done) addresses context rot through systematic 6-phase workflow (Initialize → Discuss → Plan → Execute → Verify → Complete) with fresh 200k-token contexts per task. Core concepts (multi-agent orchestration, fresh context management) overlap significantly with existing patterns like Ralph Loop, Gas Town, and BMAD. See resource evaluation for detailed comparison.

Emerging: Ralph Inferno implements autonomous multi-persona workflows (Analyst→PM→UX→Architect→Business) with VM-based execution and self-correcting E2E loops. Experimental but interesting for "vibe coding at scale".


Foundational Discipline: Plan-First Workflow

"Once the plan is good, the code is good." — Boris Cherny, creator of Claude Code

Not just a feature (/plan command) — a systematic discipline.

Context Engineering: Thoughtworks designates this broader approach "Context Engineering" in their Technology Radar (Nov 2025)1 — the systematic design of information provided to LLMs during inference. Three core techniques: context setup (minimal system prompts, few-shot examples), context management for long-horizon tasks (summarization, external memories, sub-agent architectures), and dynamic information retrieval (JIT context loading). Related patterns in Claude Code: AGENTS.md, MCP Context7, Plan Mode.

The Mental Model:

Planning isn't optional for complex tasks. It's the difference between:

  • ❌ 8 iterations of "try → fix → retry → fix again"
  • ✅ 1 iteration of "plan → validate → execute cleanly"

When to plan first:

Task Complexity Plan First? Why
>3 files modified ✅ Yes Cross-file dependencies need architecture
>50 lines changed ✅ Yes Enough complexity for mistakes
Architectural changes ✅ Yes Impact analysis required
Unfamiliar codebase ✅ Yes Need exploration before action
Typo/obvious fix ❌ No Planning overhead > task time
Single-line change ❌ No Just do it

How plan-first works:

  1. Exploration phase (/plan mode):

    • Claude reads files, explores architecture
    • No edits allowed → forces thinking before action
    • Proposes approach with trade-offs
  2. Validation phase (you review):

    • Plan exposes assumptions and gaps
    • Easier to correct direction now vs after 100 lines written
    • Plan becomes contract for execution
  3. Execution phase (/execute):

    • Plan → code becomes mechanical translation
    • Fewer surprises, cleaner implementation
    • Faster overall despite "slower" start

Boris Cherny workflow:

"I run many sessions, start in plan mode, then switch into execution once the plan looks right. The signature upgrade is verification—giving Claude a way to test and confirm its own output."

Benefits over "just start coding":

  • Fewer correction iterations: Plan catches issues before they become code
  • Better architecture: Forced to think about structure first
  • Clearer communication: Plan is shared understanding with team/Claude
  • Reduced cost: One clean iteration < multiple messy iterations (even if plan phase costs tokens)

Integration with CLAUDE.md:

Document your team's plan-first triggers:

## Planning Policy
- ALWAYS plan first: API changes, database migrations, new features
- OPTIONAL planning: Bug fixes <10 lines, test additions
- NEVER skip: Changes affecting >2 modules

See also: Plan Mode documentation for /plan command usage.


Tier 2: Specification & Architecture

Name What Best For Claude Fit
SDD Specs before code APIs, contracts ⭐⭐⭐ Core pattern
Doc-Driven Docs = source of truth Cross-team alignment ⭐⭐⭐ CLAUDE.md native
Req-Driven Rich artifact context (20+ artifacts) Complex requirements ⭐⭐ Heavy setup
DDD Domain language first Business logic ⭐⭐ Design-time

SDD (Spec-Driven Development) — Specifications BEFORE code. One well-structured iteration equals 8 unstructured ones. CLAUDE.md IS your spec file.

Doc-Driven Development — Living documentation versioned in git becomes the single source of truth. Changes to specs trigger implementation.

Requirements-Driven Development — Uses CLAUDE.md as comprehensive implementation guide with 20+ structured artifacts.

DDD (Domain-Driven Design) — Aligns software with business language through:

  • Ubiquitous Language: Shared vocabulary in code
  • Bounded Contexts: Isolated domain boundaries
  • Domain Distillation: Core vs Support vs Generic domains

Tier 3: Behavior & Acceptance

Name What Best For Claude Fit
BDD Given-When-Then scenarios Stakeholder collaboration ⭐⭐⭐ Tests & specs
ATDD Acceptance criteria first Compliance, regulated ⭐⭐ Process-heavy
CDD API contracts as interface Microservices ⭐⭐⭐ OpenAPI native

BDD (Behavior-Driven Development) — Beyond testing: a collaboration process.

  1. Discovery: Involve devs and business experts
  2. Formulation: Write Given-When-Then examples
  3. Automation: Convert to executable tests (Gherkin/Cucumber)
Feature: Order Management
  Scenario: Cannot buy without stock
    Given product with 0 stock
    When customer attempts purchase
    Then system refuses with error message

ATDD (Acceptance Test-Driven Development) — Acceptance criteria defined BEFORE coding, collaboratively ("Three Amigos": Business, Dev, Test).

CDD (Contract-Driven Development) — API contracts (OpenAPI specs) as executable interface between teams. Patterns: Contract as Test, Contract as Stub.


Tier 4: Feature Delivery

Name What Best For Claude Fit
FDD Feature-by-feature delivery Large teams 10+ ⭐⭐ Structure
Context Eng. Context as first-class design Long sessions ⭐⭐⭐ Fundamental

FDD (Feature-Driven Development) — Five processes:

  1. Develop Overall Model
  2. Build Features List
  3. Plan by Feature
  4. Design by Feature
  5. Build by Feature

Strict iteration: 2 weeks max per feature.

Context Engineering — Treat context as design element:

  • Progressive Disclosure: Let agent discover incrementally
  • Memory Management: Conversation vs persistent memory
  • Dynamic Refresh: Rewrite TODO list before response

Tier 5: Implementation

Name What Best For Claude Fit
TDD Red-Green-Refactor Quality code ⭐⭐⭐ Core workflow
Eval-Driven Evals for LLM outputs AI products ⭐⭐⭐ Agents
Multi-Agent Orchestrate sub-agents Complex tasks ⭐⭐⭐ Task tool

TDD (Test-Driven Development) — The classic cycle:

  1. Red: Write failing test
  2. Green: Minimal code to pass
  3. Refactor: Clean up, tests stay green

With Claude: Be explicit. "Write FAILING tests that don't exist yet."

Verification Loops — A formalized pattern for autonomous iteration (broader than TDD):

Core principle: Give Claude a mechanism to verify its own output.

Code generated → Verification tool → Feedback loop → Improvement

Why it works (Boris Cherny): "An agent that can 'see' what it has done produces better results."

Verification mechanisms by domain:

Domain Verification Tool What Claude "Sees"
Frontend Browser preview (live reload) Visual rendering, layout, interactions
Backend Tests (unit/integration) Pass/fail status, error messages
Types TypeScript compiler Type errors, incompatibilities
Style Linters (ESLint, Prettier) Style violations, formatting issues
Performance Profilers, benchmarks Execution time, memory usage
Accessibility axe-core, screen readers WCAG violations, navigation issues
Security Static analyzers (Semgrep) Vulnerability patterns
UX User testing, recordings Usability problems, confusion points

TDD as canonical example:

  1. Claude writes tests for the feature
  2. Claude iterates code until tests pass
  3. Continue until explicit completion criteria met

Official guidance: "Tell Claude to keep going until all tests pass. It will usually take a few iterations."Anthropic Best Practices

Implementation patterns:

  • Hooks: PostToolUse hook runs verification after each edit
  • Browser extension: Claude in Chrome sees rendered output
  • Test watchers: Jest/Vitest watch mode provides instant feedback
  • CI/CD gates: GitHub Actions runs full validation suite
  • Multi-Claude verification: One Claude codes, another reviews

Anti-pattern: Blind iteration without feedback. Without verification mechanism, Claude can't converge toward correct solution—it guesses.

Eval-Driven Development — TDD for LLMs. Test agent behaviors via evals:

  • Code-based: output == golden_answer
  • LLM-based: Another Claude evaluates
  • Human grading: Reference, slow

Eval Harness — The infrastructure that runs evaluations end-to-end: providing instructions and tools, running tasks concurrently, recording steps, grading outputs, and aggregating results.

See Anthropic's comprehensive guide: Demystifying Evals for AI Agents

Multi-Agent Orchestration — From single assistant to orchestrated team:

Meta-Agent (Orchestrator)
├── Analyst (requirements)
├── Architect (design)
├── Developer (code)
└── Reviewer (validation)

ADR-Driven Development

Pattern: Write plain English ADRs → Feed to implement-adr skill → Execute natively

Architecture Decision Records (ADRs) combined with Claude Code skills create a workflow where architectural decisions drive implementation directly.

Workflow Steps:

  1. Document decision in ADR format (context, decision, consequences)
  2. Create implementation skill (generic or implement-adr specialized)
  3. Feed ADR as prompt to skill with clear acceptance criteria
  4. Claude executes based on architectural guidance in ADR

Example ADR Template:

# ADR-001: Database Migration Strategy

## Context
Legacy MySQL schema needs migration to PostgreSQL for better JSON support.

## Decision
Use incremental dual-write pattern with feature flags.

## Consequences
- Positive: Zero-downtime migration
- Negative: Temporary code complexity during transition

Implementation Workflow:

# 1. Write ADR (plain English)
vim docs/adr/001-database-migration.md

# 2. Feed to implementation skill
/implement-adr docs/adr/001-database-migration.md

# 3. Claude executes based on ADR guidance
# → Creates migration scripts
# → Updates ORM configuration
# → Adds feature flags
# → Implements dual-write logic

Benefits:

  • Documentation-driven: Architecture and code stay synchronized
  • Native execution: No external frameworks needed
  • Traceable decisions: Clear audit trail from decision to implementation
  • Team alignment: ADRs communicate intent to both humans and AI

Source: Gur Sannikov embedded engineering workflow


Tier 6: Optimization

Name What Best For Claude Fit
Iterative Loops Autonomous refinement Optimization ⭐⭐⭐ Core
Fresh Context Reset per task, state in files Long autonomous sessions ⭐⭐⭐ Power users
Prompt Engineering Technique foundation Everything ⭐⭐⭐ Prerequisite

Iterative Refinement Loops — Autonomous convergence:

  1. Execute prompt
  2. Observe result
  3. If result ≠ "DONE" → refine and repeat

Prompt Engineering — Foundations for ALL Claude usage:

  • Zero-Shot Chain of Thought: "Think step by step"
  • Few-Shot Learning: 2-3 examples of expected pattern
  • Structured Prompts: XML tags for organization
  • Position Matters: For long docs, place question at end

Fresh Context Pattern (Ralph Loop) — Solves context rot by spawning fresh agent instances per task. State persists in git + progress files, not chat history. Ideal for long autonomous sessions (migrations, overnight runs). See Ultimate Guide - Fresh Context Pattern for implementation.


SDD Tools Reference

Three tools have emerged to formalize Spec-Driven Development:

Tool Use Case Official Docs Claude Integration
Spec Kit Greenfield, governance github.blog/spec-kit /speckit.constitution, /speckit.specify, /speckit.plan
OpenSpec Brownfield, changes github.com/Fission-AI/OpenSpec /openspec:proposal, /openspec:apply, /openspec:archive
Specmatic API contract testing specmatic.io MCP agent available

Spec Kit (Greenfield)

5-phase workflow:

  1. Constitution: /speckit.constitution → guardrails
  2. Specify: /speckit.specify → requirements
  3. Plan: /speckit.plan → architecture
  4. Tasks: /speckit.tasks → decomposition
  5. Implement: /speckit.implement → code

OpenSpec (Brownfield)

Two-folder architecture:

openspec/
├── specs/      ← Current truth (stable)
└── changes/    ← Proposals (temporary)

Workflow: Proposal → Review → Apply → Archive

Specmatic (API Contracts)

  • Contract as Test: Auto-generates 1000s of tests from OpenAPI spec
  • Contract as Stub: Mock server for parallel development
  • Backward Compatibility: Detects breaking changes

Writing Effective Specs

Based on analysis of 2,500+ agent configuration files. Source: Addy Osmani

The Six Essential Components

Component What to Include Example
Commands Executable with flags npm test -- --coverage
Testing Framework, coverage, locations vitest, 80%, tests/
Project structure Explicit directories src/, lib/, tests/
Code style One example > paragraphs Show a real function
Git workflow Branch, commit, PR format feat/name, conventional commits
Boundaries Permission tiers See below

Permission Tiers

Tier Symbol Use For
Always do Safe actions, no approval (lint, format)
Ask first ⚠️ High-impact changes (delete, publish)
Never do 🚫 Hard stops (commit secrets, force push main)

Curse of Instructions

⚠️ Research shows more instructions = worse adherence to each one.

Solution: Feed only relevant spec sections per task, not the entire document.

Monolithic vs Modular Specs

Project Size Approach
Small (<10 files) Single spec file
Medium (10-50 files) Sectioned spec, feed per task
Large (50+ files) Sub-agent routing by domain

Combination Patterns

Recommended stacks by situation:

Situation Recommended Stack Notes
Solo MVP SDD + TDD Minimal overhead, quality focus
Team 5-10, greenfield Spec Kit + TDD + BDD Governance + quality + collaboration
Microservices CDD + Specmatic Contract-first, parallel dev
Existing SaaS (100+ features) OpenSpec + BDD Change tracking, no spec drift
Enterprise 10+ BMAD + Spec Kit + Specmatic Full governance + contracts
LLM-native product Eval-Driven + Multi-Agent Self-improving systems

Quick Reference Table

Methodology Level Primary Focus Team Size Learning Curve
BMAD Orchestration Governance 10+ High
SDD Specification Contracts Any Medium
Doc-Driven Specification Alignment Any Low
Req-Driven Specification Context 5+ Medium
DDD Specification Domain 5+ Very High
BDD Behavior Collaboration 5+ Medium
ATDD Behavior Compliance 5+ Medium
CDD Behavior APIs 5+ Medium
FDD Delivery Features 10+ Medium
Context Eng. Delivery AI sessions Any Low
TDD Implementation Quality Any Low
Eval-Driven Implementation AI outputs Any Medium
Multi-Agent Implementation Complexity Any Medium
Iterative Optimization Refinement Any Low
Prompt Eng. Optimization Foundation Any Very Low

Sources

Official Documentation (Tier 1)

Methodology References (Tier 2)

SDD & Spec-First

BMAD

TDD with AI

BDD & DDD

Context Engineering

Eval-Driven & Multi-Agent

Tools Documentation (Tier 1)

Additional References


See Also

Footnotes

  1. Thoughtworks Technology Radar Vol 33, Nov 2025. PDF. See also: Macro trends blog post.