|
| 1 | +# Research: Reverse Engineering & Codebase Analysis Patterns |
| 2 | + |
| 3 | +**Last Updated:** 2025-01-21 |
| 4 | +**Status:** Research Complete - Implementation Phase 1 Complete |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +## Overview |
| 9 | + |
| 10 | +This directory contains research and analysis conducted to improve our MCP spec-driven development prompts. The research synthesizes patterns from: |
| 11 | + |
| 12 | +1. **Claude Code feature-dev plugin** - Production-tested 7-phase workflow |
| 13 | +2. **Existing research files** - code-analyst, information-analyst, context_bootstrap patterns |
| 14 | +3. **Best practices** - Evidence-based analysis, confidence assessment, interactive questioning |
| 15 | + |
| 16 | +**Primary Goal:** Enhance prompts with battle-tested patterns for better feature development outcomes. |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## Research Documents |
| 21 | + |
| 22 | +### 1. Claude Code Feature-Dev Comparison |
| 23 | +**File:** [`claude-code-feature-dev-comparison.md`](./claude-code-feature-dev-comparison.md) |
| 24 | +**Size:** 18,287 words |
| 25 | +**Purpose:** Comprehensive analysis of Claude Code's feature-dev plugin |
| 26 | + |
| 27 | +**Contents:** |
| 28 | +- Complete 7-phase workflow breakdown |
| 29 | +- Agent specifications (code-explorer, code-architect, code-reviewer) |
| 30 | +- Comparison with our current MCP prompts |
| 31 | +- Gap analysis with priority ratings (Critical/Important/Minor) |
| 32 | +- Implementation roadmap (3 sprints) |
| 33 | +- Updated workflow diagrams |
| 34 | +- Detailed recommendations |
| 35 | + |
| 36 | +**Key Findings:** |
| 37 | +- ❌ Missing mandatory clarifying questions phase |
| 38 | +- ❌ No architecture options comparison |
| 39 | +- ❌ No quality review before completion |
| 40 | +- ✅ Good: Document-based artifacts |
| 41 | +- ✅ Good: Explicit sequencing |
| 42 | +- ✅ Good: Comprehensive analysis |
| 43 | + |
| 44 | +**Use This For:** |
| 45 | +- Understanding Claude Code's proven workflow |
| 46 | +- Identifying gaps in our current approach |
| 47 | +- Planning future enhancements |
| 48 | +- Architecture decision justification |
| 49 | + |
| 50 | +--- |
| 51 | + |
| 52 | +### 2. Research Synthesis |
| 53 | +**File:** [`research-synthesis.md`](./research-synthesis.md) |
| 54 | +**Size:** 8,000+ words |
| 55 | +**Purpose:** Actionable integration plan combining all research sources |
| 56 | + |
| 57 | +**Contents:** |
| 58 | +- Core philosophy: Code (WHAT/HOW) vs Docs (WHY) vs User (Intent) |
| 59 | +- Two-agent specialization pattern (code-analyst + information-analyst) |
| 60 | +- Manager orchestration pattern (context_bootstrap) |
| 61 | +- Comparison matrix: Our approach vs Research best practices |
| 62 | +- Actionable recommendations with priority matrix |
| 63 | +- Specific enhancements for each prompt |
| 64 | +- Implementation roadmap (3 sprints) |
| 65 | +- Success metrics |
| 66 | + |
| 67 | +**Key Recommendations:** |
| 68 | +- 🔴 HIGH: Evidence citation standards (file:line, path#heading) |
| 69 | +- 🔴 HIGH: Confidence assessment (High/Medium/Low) |
| 70 | +- 🔴 HIGH: Mandatory clarifying phase in spec generation |
| 71 | +- 🔴 HIGH: Architecture options prompt (new) |
| 72 | +- 🔴 HIGH: Implementation review prompt (new) |
| 73 | +- 🟡 MEDIUM: Interactive phased questioning |
| 74 | +- 🟡 MEDIUM: ADR template creation |
| 75 | + |
| 76 | +**Use This For:** |
| 77 | +- Planning specific prompt enhancements |
| 78 | +- Understanding priority of improvements |
| 79 | +- Implementation guidance with examples |
| 80 | +- Success criteria for each enhancement |
| 81 | + |
| 82 | +--- |
| 83 | + |
| 84 | +### 3. Code Analyst Pattern |
| 85 | +**File:** [`code-analyst.md`](./code-analyst.md) |
| 86 | +**Source:** Existing research file (cataloged) |
| 87 | +**Purpose:** Specialized agent for discovering WHAT and HOW from code |
| 88 | + |
| 89 | +**Responsibilities:** |
| 90 | +- Discover WHAT system does (features, workflows, business rules) |
| 91 | +- Discover HOW it's structured (architecture, patterns, communication) |
| 92 | +- Identify WHAT technologies are used |
| 93 | + |
| 94 | +**Key Principles:** |
| 95 | +- Code is ground truth - report what exists |
| 96 | +- Be specific - reference exact file:line |
| 97 | +- Distinguish fact from inference |
| 98 | +- Flag feature toggles and dormant code |
| 99 | +- **Stay in lane** - don't infer WHY |
| 100 | + |
| 101 | +**What NOT to include:** |
| 102 | +- ❌ Internal data models (implementation detail) |
| 103 | +- ❌ Missing/planned features (belongs in roadmap) |
| 104 | +- ❌ Code quality judgments |
| 105 | +- ❌ Specific versions (too volatile) |
| 106 | +- ❌ Testing infrastructure details |
| 107 | + |
| 108 | +**Applied To:** `generate-codebase-context` Phase 3 (Code Analysis) |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +### 4. Information Analyst Pattern |
| 113 | +**File:** [`information-analyst.md`](./information-analyst.md) |
| 114 | +**Source:** Existing research file (cataloged) |
| 115 | +**Purpose:** Specialized agent for extracting WHY from documentation |
| 116 | + |
| 117 | +**Primary Job:** Extract decision rationale from docs (not discoverable from code) |
| 118 | + |
| 119 | +**Responsibilities:** |
| 120 | +- Discover WHY system was built this way |
| 121 | +- Extract rationale from documentation |
| 122 | +- Find decision context and trade-offs |
| 123 | +- Capture historical evolution |
| 124 | + |
| 125 | +**What to Look For:** |
| 126 | +- Why was [technology X] chosen? |
| 127 | +- Why [pattern Y] over alternatives? |
| 128 | +- What constraints drove decisions? |
| 129 | +- What trade-offs were considered? |
| 130 | + |
| 131 | +**Key Principles:** |
| 132 | +- Direct quotes for "why" |
| 133 | +- Source everything (path#heading) |
| 134 | +- Attach metadata (timestamps) |
| 135 | +- Flag conflicts, don't resolve |
| 136 | +- Distinguish explicit vs implicit |
| 137 | +- Focus on rationale (unique value) |
| 138 | + |
| 139 | +**Applied To:** `generate-codebase-context` Phase 2 (Documentation Audit) |
| 140 | + |
| 141 | +--- |
| 142 | + |
| 143 | +### 5. Context Bootstrap Pattern |
| 144 | +**File:** [`context_bootstrap.md`](./context_bootstrap.md) |
| 145 | +**Source:** Existing research file (cataloged) |
| 146 | +**Purpose:** Manager orchestration pattern for coordinating specialized agents |
| 147 | + |
| 148 | +**Core Philosophy:** |
| 149 | +> "Code explains HOW the system currently behaves; the user supplies WHAT it is supposed to achieve and WHY choices were made." |
| 150 | +
|
| 151 | +**Six-Phase Workflow:** |
| 152 | +1. Analyze repository structure |
| 153 | +2. Audit existing documentation |
| 154 | +3. Deep code analysis (subprocess: Code Analyst) |
| 155 | +4. User collaboration (fill gaps, resolve conflicts) |
| 156 | +5. Draft documentation set (PRDs, ADRs, SYSTEM-OVERVIEW) |
| 157 | +6. Review with user |
| 158 | + |
| 159 | +**Key Pattern:** "Keep dialog interactive. Ask focused follow-up questions instead of long questionnaires." |
| 160 | + |
| 161 | +**Deliverables:** |
| 162 | +- PRDs (Product Requirements) |
| 163 | +- ADRs (Architecture Decision Records in MADR format) |
| 164 | +- SYSTEM-OVERVIEW.md |
| 165 | +- README.md updates |
| 166 | + |
| 167 | +**Applied To:** Overall `generate-codebase-context` structure and phasing |
| 168 | + |
| 169 | +--- |
| 170 | + |
| 171 | +## How Research Was Applied |
| 172 | + |
| 173 | +### Phase 1 (Completed) ✅ |
| 174 | + |
| 175 | +**Enhanced `generate-codebase-context` Prompt:** |
| 176 | + |
| 177 | +From **code-analyst.md:** |
| 178 | +- ✅ File:line evidence citations for all code findings |
| 179 | +- ✅ Confidence levels (High/Needs Validation/Unknown) |
| 180 | +- ✅ "Stay in your lane" - don't infer WHY from code |
| 181 | +- ✅ Flag feature toggles and dormant paths |
| 182 | +- ✅ Technology names only (NO versions) |
| 183 | +- ✅ Focus on working features, not missing ones |
| 184 | + |
| 185 | +From **information-analyst.md:** |
| 186 | +- ✅ Documentation audit phase (scan + timestamp + inventory) |
| 187 | +- ✅ Rationale extraction with direct quotes |
| 188 | +- ✅ Source references with path#heading format |
| 189 | +- ✅ Conflict detection between docs |
| 190 | +- ✅ Distinguish explicit vs implicit knowledge |
| 191 | + |
| 192 | +From **context_bootstrap.md:** |
| 193 | +- ✅ Repository structure detection (workspace/monorepo/single) |
| 194 | +- ✅ User collaboration phase (interactive, not batch) |
| 195 | +- ✅ Capture user answers as direct quotes for citation |
| 196 | + |
| 197 | +From **Claude Code feature-dev:** |
| 198 | +- ✅ Essential files list with line ranges (5-10 files) |
| 199 | +- ✅ Execution path traces (step-by-step flows) |
| 200 | +- ✅ Interactive short questions (not batch questionnaires) |
| 201 | + |
| 202 | +--- |
| 203 | + |
| 204 | +### Phase 2 (Planned for Next PR) |
| 205 | + |
| 206 | +**Enhancements Planned:** |
| 207 | + |
| 208 | +1. **`generate-spec` Enhancement:** |
| 209 | + - Mandatory clarifying phase (Claude Code Phase 3) |
| 210 | + - Phased interactive questioning (context_bootstrap pattern) |
| 211 | + - WHY questions (information-analyst focus) |
| 212 | + |
| 213 | +2. **`generate-architecture-options` (NEW):** |
| 214 | + - Based on Claude Code code-architect agent |
| 215 | + - Generate 2-3 approaches with trade-offs |
| 216 | + - User must choose before proceeding |
| 217 | + |
| 218 | +3. **`review-implementation` (NEW):** |
| 219 | + - Based on Claude Code code-reviewer agent |
| 220 | + - Multi-focus review (bugs, quality, conventions) |
| 221 | + - Confidence-based filtering (≥80%) |
| 222 | + |
| 223 | +See [`../../PROGRESS.md`](../../PROGRESS.md) for detailed roadmap. |
| 224 | + |
| 225 | +--- |
| 226 | + |
| 227 | +## Key Insights |
| 228 | + |
| 229 | +### 1. Separation of Concerns |
| 230 | +**Discovery:** Code, docs, and users each provide different information |
| 231 | + |
| 232 | +- **Code → WHAT + HOW:** Features, architecture, patterns (observable facts) |
| 233 | +- **Docs → WHY:** Decisions, rationale, trade-offs (recorded intent) |
| 234 | +- **User → Goals + Intent:** Purpose, value, strategic fit (current direction) |
| 235 | + |
| 236 | +**Application:** Don't conflate these sources - keep them separate and clearly attributed |
| 237 | + |
| 238 | +--- |
| 239 | + |
| 240 | +### 2. Evidence-Based Analysis |
| 241 | +**Discovery:** Every claim needs proof |
| 242 | + |
| 243 | +- Code findings: `file.ts:45-67` (line ranges) |
| 244 | +- Doc findings: `doc.md#heading` (section anchors) |
| 245 | +- User input: `[User confirmed: YYYY-MM-DD]` (dated quotes) |
| 246 | + |
| 247 | +**Application:** Traceability and accountability for all findings |
| 248 | + |
| 249 | +--- |
| 250 | + |
| 251 | +### 3. Confidence Assessment |
| 252 | +**Discovery:** Distinguish facts from inferences |
| 253 | + |
| 254 | +- High: Strong evidence from working code or explicit docs |
| 255 | +- Medium: Inferred from context, feature flags, implied |
| 256 | +- Low: Cannot determine, conflicts, unknowns |
| 257 | + |
| 258 | +**Application:** Flag gaps explicitly rather than guessing |
| 259 | + |
| 260 | +--- |
| 261 | + |
| 262 | +### 4. Interactive Collaboration |
| 263 | +**Discovery:** Short focused conversations > long questionnaires |
| 264 | + |
| 265 | +- Ask 3-5 questions, wait for answers |
| 266 | +- Use answers to inform next round of questions |
| 267 | +- Capture direct quotes for later citation |
| 268 | + |
| 269 | +**Application:** Better engagement, more thoughtful answers |
| 270 | + |
| 271 | +--- |
| 272 | + |
| 273 | +### 5. Mandatory Checkpoints |
| 274 | +**Discovery:** Critical decisions need explicit user approval |
| 275 | + |
| 276 | +- ⛔ STOP after clarifying questions (don't proceed without answers) |
| 277 | +- ⛔ STOP after architecture options (user must choose) |
| 278 | +- ⛔ STOP after implementation (user decides what to fix) |
| 279 | + |
| 280 | +**Application:** User control at key decision points |
| 281 | + |
| 282 | +--- |
| 283 | + |
| 284 | +## Success Metrics |
| 285 | + |
| 286 | +### Phase 1 Metrics ✅ |
| 287 | +- ✅ 100% of code findings have file:line citations |
| 288 | +- ✅ 100% of findings categorized by confidence level |
| 289 | +- ✅ Documentation audit phase included |
| 290 | +- ✅ Interactive questioning approach (3-5 questions per round) |
| 291 | +- ✅ Essential files list structure (5-10 files with ranges) |
| 292 | +- ✅ Execution path traces included in examples |
| 293 | + |
| 294 | +### Phase 2 Metrics (Target) |
| 295 | +- [ ] Clarifying questions are mandatory (cannot proceed without) |
| 296 | +- [ ] Architecture options always present 2-3 approaches |
| 297 | +- [ ] User explicitly chooses architecture before tasks |
| 298 | +- [ ] Review catches common issues before PR |
| 299 | +- [ ] All prompts use consistent evidence standards |
| 300 | + |
| 301 | +--- |
| 302 | + |
| 303 | +## References |
| 304 | + |
| 305 | +### External Sources |
| 306 | +- [Claude Code Repository](https://github.com/anthropics/claude-code) |
| 307 | +- [Feature-Dev Plugin](https://github.com/anthropics/claude-code/tree/main/plugins/feature-dev) |
| 308 | +- [Feature-Dev README](https://github.com/anthropics/claude-code/blob/main/plugins/feature-dev/README.md) |
| 309 | +- [Code Explorer Agent](https://github.com/anthropics/claude-code/blob/main/plugins/feature-dev/agents/code-explorer.md) |
| 310 | +- [Code Architect Agent](https://github.com/anthropics/claude-code/blob/main/plugins/feature-dev/agents/code-architect.md) |
| 311 | +- [Code Reviewer Agent](https://github.com/anthropics/claude-code/blob/main/plugins/feature-dev/agents/code-reviewer.md) |
| 312 | +- [MADR Format](https://adr.github.io/madr/) |
| 313 | + |
| 314 | +### Internal Documents |
| 315 | +- [Progress Tracking](../../PROGRESS.md) |
| 316 | +- [Main README](../../../README.md) |
| 317 | + |
| 318 | +--- |
| 319 | + |
| 320 | +## Next Steps |
| 321 | + |
| 322 | +1. **Review Phase 1 PR:** `add-reverse-engineer-codebase-prompt` branch |
| 323 | +2. **Plan Phase 2 PR:** After Phase 1 merge |
| 324 | +3. **Implement remaining enhancements:** Per roadmap in PROGRESS.md |
| 325 | + |
| 326 | +--- |
| 327 | + |
| 328 | +**Research Status:** Complete and applied to Phase 1 |
| 329 | +**Next Research:** None planned - focus on implementation |
| 330 | +**Last Updated:** 2025-01-21 |
0 commit comments