Welcome to the refined SPARC-SAPPO Agentic Development Framework, a highly structured system for AI-assisted software engineering built on Roo Code. This framework leverages meticulous planning, ontological guidance, a robust Targeted TDD cycle, and strategic AI research to deliver high-quality, cost-effective results using Anthropic's Claude 3.7 Sonnet. Inspired by Rueven Cohen's original SPARC/Boomerang concepts, this evolution emphasizes Micro-Tasking, Research Driven Development, and is backed by my custom-made Software Architecture Problem Prediction Ontology(SAPPO).
Key Pillars:
- SPARC Principles: Core symbolic guidelines (Clarity, Extensibility, Testing, Collaboration) embedded within agent instructions.
- SAPPO Ontology: A custom ontology (
:Problem,:Solution,:Context,:ArchitecturalPattern,:TechnologyVersion) explicitly used by agents to structure tasks, anticipate issues, and document decisions. - Micro-Task Orchestration & "Boomerang" TDD Cycle: A central
⚡️ SAPPO Orchestratorbreaks down plans into single micro-tasks. Crucially, it manages an immediate Code -> Test -> Fix -> Re-Test cycle for every implementation unit, driven by explicit PASS/FAIL reporting from the Tester. - Tiered Research-Driven Development (RDD): Specialist agents strategically use Perplexity MCP tools (
search,get_documentation, etc.) based on defined tiers (MUST/SHOULD/MAY/DO NOT USE), optimizing API usage and ensuring research is applied only when truly necessary. - Targeted Test-Driven Development (TDD): A dedicated
🎯 Tester(@tester-core) employs a precise two-part strategy within the immediate TDD cycle:- Core Logic Testing: Focused unit tests verifying the internal correctness of the newly implemented code, including specific checks for
:RecursiveAlgorithmpatterns (base cases, steps, edge cases targeting:LogicErroror:StackOverflowError). - Contextual Integration Testing: Using
:Contextprovided by the Orchestrator (e.g., feature, interacting components), the tester writes a small number of focused tests verifying only the most critical, direct interactions of the new code with its immediate collaborators. This catches integration issues like:InterfaceMismatchearly.
- Core Logic Testing: Focused unit tests verifying the internal correctness of the newly implemented code, including specific checks for
- Unified LLM, Dual-Mode Strategy (Claude 3.7 Sonnet): Utilizes Anthropic's Claude 3.7 Sonnet (~200k token context window) with distinct temperature settings:
- 🧠 Thinking Mode (Temp ~0.7): For planning, design, architecture, complex reasoning.
- 🛠️ Instruct Mode (Temp ~0.25): For precise code, test generation, debugging following instructions.
The Core Idea: Provide a detailed plan. The Orchestrator (Thinking Mode) delegates one micro-task, framed with SAPPO. The Specialist (e.g., Coder - Instruct Mode) executes, using Tiered RDD for targeted research. The Tester (@tester-core - Instruct Mode) then immediately applies the Targeted Testing Strategy (Core Logic + Contextual Integration based on Orchestrator-provided context). Based on PASS/FAIL, the Orchestrator either initiates a fix micro-task (Coder/Debugger -> Tester) or proceeds with the plan. Broader regression testing is handled by the Integrator later. This methodology drastically reduces the need for massive context windows, making development robust, fast, AND cost-effective. We prioritize structure and focused testing over excessive context, proving that ~200k tokens, used wisely, is highly effective and economical.
👆 Click the image above for a walkthrough! (Note: This video demonstrates the previous "Dual Testing" strategy. While setup remains similar, the testing methodology has evolved to the "Targeted Testing Strategy" described here for faster, more focused feedback in the immediate loop. Key principles of planning and micro-tasking remain.) This README reflects the current Targeted Testing approach.
Leveraging Anthropic's Claude 3.7 Sonnet (~200k context) with role-based temperature control:
-
🧠 Thinking Mode (Temperature: ~0.7)
- Model: Claude 3.7 Sonnet
- Purpose: Planning, design, architecture, specs, documentation, complex reasoning.
- Agents: Orchestrator, Spec Writer, Architect, Docs Writer, Ask Guide, Security Reviewer (uses edit).
-
🛠️ Instruct Mode (Temperature: ~0.25)
- Model: Claude 3.7 Sonnet
- Purpose: Precise code/test generation, debugging, integration, operations following strict guidance.
- Agents: Coder, Tester, Debugger, Integrator, Optimizer, DevOps.
Why Claude 3.7 Sonnet & the ~200k Window Strategy (Cost-Efficiency by Design):
- Peak Capability, Optimized Cost: Claude 3.7 Sonnet offers leading performance within a generous yet manageable ~200k context.
- Methodology Trumps Raw Context: This framework intentionally avoids multi-million token dependency. Our rigorous Targeted TDD cycle acts as the immediate quality gate. Combined with micro-tasking, focused RDD, and later comprehensive integration tests by
@integrator, vast context becomes methodologically unnecessary and economically wasteful for the core loop. - The Targeted TDD Advantage: Core Logic testing ensures unit correctness. Contextual Integration testing validates key interactions early, guided by Orchestrator context. The LLM doesn't need to reread everything constantly in the fastest loop; the targeted tests confirm the unit works and fits its immediate purpose, while
@integratorverifies broader system health later. - Dramatic Cost Savings: Extremely large context windows incur significantly higher, often quadratic, API costs (potentially $0.30 to $2.00+ per call). Our micro-tasking and Targeted TDD keep Instruct calls lean (often 50k-150k, focusing on the unit + immediate context), staying well below the 200k limit and slashing LLM expenses without sacrificing initial quality checks.
- Structure Over Size: Success hinges on your detailed plan and the framework's structured execution (SAPPO, Targeted TDD Cycle), enabling Claude 3.7 Sonnet to excel within its efficient ~200k window. Rigor beats brute force.
- Ontology Mandate: SAPPO terms (
:Problem,:Solution,:TechnologyVersion,:Context, etc.) structure tasks and completion summaries. - Granularity: Orchestrator breaks plans into minimal, single logical units.
- Specialization: Agents collaborate via the Orchestrator, using appropriate Claude 3.7 modes.
- Strategic Perplexity Use: Specialist agents use MCP tools (
search,get_documentation, etc.) based on clear tiers:- MUST USE: For critical unknowns, unfamiliar APIs, complex errors, vulnerability checks.
- SHOULD USE: For best practices, specific technology/pattern nuances.
- MAY USE: For examples, broader context understanding.
- DO NOT USE: For basic knowledge, straightforward tasks.
- Optimized & Justified Research: Avoids unnecessary API calls while ensuring accuracy when needed.
- Immediate Testing Cycle ("Boomerang"):
🎯 Tester(@tester-core) runs immediately afterCodercompletion. - Focused Validation (Targeted Strategy):
- Core Logic Testing: Unit tests verify the implemented code's internal correctness, including specific handling for patterns like
:RecursiveAlgorithm(base/step/edge cases targeting relevant SAPPO:Problems like:LogicError,:StackOverflowError). - Contextual Integration Testing: Orchestrator provides
:Context(e.g., 'Part of Feature X', 'Consumed by Service Y'). Tester writes a few critical tests verifying direct interactions based on this context, preventing early integration:Problems (:InterfaceMismatch,:CompatibilityIssue) related to the unit's role.
- Core Logic Testing: Unit tests verify the implemented code's internal correctness, including specific handling for patterns like
- PASS/FAIL Driven Workflow: Tester's explicit PASS/FAIL result for the targeted tests dictates the Orchestrator's next step (initiate fix or proceed).
- Later Comprehensive Checks: The
🔗 Integratoragent runs broader, potentially system-wide tests after a unit has successfully passed its targeted TDD cycle and is ready for merge. - Plan Integration: Your plan should anticipate these targeted testing steps and provide necessary
:Contexthints for the Orchestrator to pass to the Tester.
- User Responsibility: Success requires a well-structured, phased plan. The AI executes your strategy. Clearly defining expected interactions helps the Orchestrator provide good
:Contextfor testing. - Methodology > Model Size: Good planning + strong TDD (Targeted loop + Comprehensive integration checks) are essential for quality and cost-efficiency, regardless of LLM context size.
- You: Provide a detailed plan to the
Orchestrator, including hints about component interactions (:Context). - Orchestrator (Claude 3.7 Sonnet @ ~0.7): Identifies next micro-task, frames with SAPPO, delegates (e.g.,
new_task @coder Implement X... using :RecursiveAlgorithm... CONTEXT: Part of 'User Profile Update', sends data to :AuditService.). - Coder (Claude 3.7 Sonnet @ ~0.25): Receives task, performs Tiered RDD (if needed), implements the single unit.
- Coder: Reports
attempt_completion(mentions pattern, SAPPO considerations, RDD usage). States "Code complete, ready for immediate targeted testing via @tester-core. Returning control to Orchestrator." - Orchestrator: Immediately assigns testing task with context:
new_task @tester-core Apply TARGETED TESTING STRATEGY to X. CONTEXT: Part of 'User Profile Update', sends data to :AuditService. Test core logic (esp. recursion if any) AND context (interaction w/ :AuditService). Report PASS/FAIL. - Tester (
@tester-core, Claude 3.7 Sonnet @ ~0.25): Writes/executes Targeted Tests: (1) Core Logic tests (unit tests, recursive checks). (2) Contextual Integration tests (mocks:AuditService, checks ifXcalls it correctly based on context). - Tester: Reports
attempt_completionwith explicit "Targeted testing complete. Result: PASS" or "Result: FAIL [details on Core/Contextual test failure...]", confirms strategy application. - Orchestrator (Analyzes Test Result):
- IF PASS: Proceeds to the next step in your plan (e.g., documentation, pass to integrator).
- IF FAIL: Initiates fix cycle: assigns fix task to
@coderor@debugger(new_task @coder Fix :InterfaceMismatch in X identified by Contextual Integration test...), awaits fixattempt_completion, then loops back to Step 5 to re-assign testing to@tester-corewith the same context.
- Cycle Repeats until the plan phase involving implementation/unit testing is complete. Later steps might involve
@integratorfor merging and broader tests.
- 🏛️ Structured & Predictable: SAPPO + Micro-tasking + Targeted TDD cycle create discipline.
- 💡 Accurate & Efficient Research: Tiered RDD ensures up-to-date info without waste.
- ✅ Robust & Fast Feedback: Targeted TDD catches core logic and critical integration issues immediately. Comprehensive checks by
@integratorensure broader stability before release. - 🧠 Optimized AI Usage: Leverages Claude 3.7 Sonnet's strengths via Thinking/Instruct modes within its efficient ~200k window.
- 💰 MASSIVELY Cost-Effective: Methodology (Micro-tasks + Targeted TDD loop) makes >200k context unnecessary for the rapid iteration cycle. Keeps API calls lean (often 50k-150k Instruct), slashing LLM costs versus huge context models. Avoid budget blowouts!
- 📈 Superior Quality: Leads to modular, rigorously unit-tested and integration-aware, secure, documented code.
- 🤝 Clear Collaboration: Explicit SAPPO terms, Targeted TDD cycle steps, Context passing, and PASS/FAIL reports improve traceability.
- SPARC-SAPPO Agent Army (.roomodes): JSON definitions with SAPPO/Targeted TDD/RDD instructions, Thinking/Instruct roles, Tiered MCP rules. (You are reading the description for V3 with Targeted TDD).
- SPARC Syntax: Underlying symbolic principles. See Overview.
- SAPPO Ontology: Conceptual framework.
- Perplexity Research Tools (MCP): Integrated via
search,get_documentation, etc., used according to tiers. - Unified LLM Engine (Claude 3.7 Sonnet): Configured with two temperature profiles (~200k context limit).
User Plan Snippet: "Phase 3: Implement recursive file search for report generator. Step 3.1: Specify findFilesRecursive. Step 3.2: Code findFilesRecursive. CONTEXT: Consumed by :ReportGeneratorService, must handle '.txt' files only. Step 3.3: Apply Targeted TDD. Step 3.4: Document."
new_task @spec-writer Specify findFilesRecursive(dir, pattern)using :RecursiveAlgorithm. Detail base case, recursive step. Target:.txtfiles. Potential :StackOverflowError :Problem. Define TDD anchors (Core Logic: base, recursion, edge cases like symlinks; Contextual Integration: check filter logic for.txt)."
(Later) 📝 Spec Writer (Claude @ 0.7): "attempt_completion Spec written. Notes :RecursiveAlgorithm, targets .txt files, potential :StackOverflowError. Provides TDD Anchors: Core Logic: empty dir, nested .txt find, symlink loop avoidance. Contextual Integration: Ensure only .txt files are returned when called with various inputs. Used MCP search (SHOULD USE tier) for robust symlink loop detection patterns. Returning."
new_task @coder Implement findFilesRecursiveper spec... Use :NodeJS v20... Handle symlink loops (:LogicError). Add depth limit (:StackOverflowError mitigation). Ensure filter for.txt pattern passed in."
(Later) 🧠 Coder (Claude @ 0.25): "attempt_completion Implemented findFilesRecursive. Used :RecursiveAlgorithm, added depth limit, visited set for symlinks, respects input pattern (expects .txt). Used MCP get_documentation (MUST USE tier) for fs.promises.readdir withFileTypes option. Code complete, ready for immediate targeted testing via @tester-core. Returning."
new_task @tester-core Apply TARGETED TESTING STRATEGY to findFilesRecursive. CONTEXT: Must return only '.txt' files for :ReportGeneratorService. TEST: (1) CORE LOGIC (base/step/edges/recursion). (2) CONTEXTUAL INTEGRATION (verify .txt filter works, interacts as expected by context). Report PASS/FAIL."
(Later) 🎯 Tester (@tester-core, Claude @ 0.25): "attempt_completion Applied Targeted Testing Strategy. Context: .txt filter for :ReportGeneratorService. (1) Core Logic Testing: Base case (empty dir) passed. Recursive step (nested file find) passed. Edge case (symlink loop) passed. (2) Contextual Integration Testing: Test with pattern *.txt correctly returned only .txt files. Test with other patterns returned empty list as expected per context. Result: PASS. All targeted tests passed. Returning control."
(If FAIL: ... Result: FAIL. Contextual Integration test 'test_txt_filter_ignores_other_extensions' failed. Returned '.log' files unexpectedly. Suspected :LogicError in pattern filtering... Returning control. The orchestrator would then trigger the fix cycle.)
Ready for structured, Targeted TDD-powered, cost-effective AI development?
- VS Code + Roo Code extension.
- Anthropic API Key (Claude 3.7 Sonnet access).
- Perplexity API Key.
-
Perplexity MCP Setup (RDD Tools):
- Use cline (recommended): Install "Perplexity AI" provider, enter key, copy MCP URL/Header.
- Paste URL/Header into Roo Code's Perplexity MCP settings.
-
Dual-Mode LLM Configuration (CRITICAL!):
- Create TWO model profiles in Roo Code settings, BOTH pointing to your Anthropic Claude 3.7 Sonnet endpoint:
- Profile 1 (Thinking): Name:
claude-3.7-sonnet-thinking. Endpoint: [Your Claude 3.7 Sonnet Endpoint]. Default Temperature: ~0.7. - Profile 2 (Instruct): Name:
claude-3.7-sonnet-instruct. Endpoint: [Your Claude 3.7 Sonnet Endpoint]. Default Temperature: ~0.25.
- Profile 1 (Thinking): Name:
- Why two profiles? To easily switch between high-temp Thinking (Orchestrator, Architect, Spec Writer, etc.) and low-temp Instruct (Coder, Tester, Debugger, etc.) as required by the framework roles, optimizing Claude 3.7 Sonnet's performance within its ~200k context.
- Manual Switching Required: The
.roomodesdoes NOT auto-assign profiles. You MUST manually select the correct profile (thinkingorinstruct) in the Roo Code UI before invoking or allowing delegation to an agent. Selectthinkingfor Orchestrator interactions, switch toinstructwhen an Instruct agent (Coder, Tester) takes over, then switch back tothinkingwhen control returns to Orchestrator. Diligence is key here.
- Create TWO model profiles in Roo Code settings, BOTH pointing to your Anthropic Claude 3.7 Sonnet endpoint:
-
Install SPARC-SAPPO RooModes:
- Copy the entire JSON object from the
*.roomodesfile source representing the Targeted Testing Strategy (V3). - Go to Roo Code settings -> "Edit Global Modes".
- Paste the JSON, replacing existing content. Save.
- Reload Roo Code (
Roo Code: Reload Custom Modesor restart VS Code).
- Copy the entire JSON object from the
- Open a Roo Code chat.
- Manually select the
claude-3.7-sonnet-thinkingprofile. - Select the
⚡️ SAPPO Orchestratormode. - Provide your detailed, phased plan (mentioning testing needs, :Context hints for interaction, recursive parts).
- Observe the micro-tasking and Targeted Boomerang TDD Cycle.
- Remember to manually switch profiles (Thinking <-> Instruct) based on which agent is active. Keep Instruct context minimal, focused on the current micro-task + immediate context.
- Use
❓ Ask Guide(Thinking profile) for help structuring plans or understanding the Targeted TDD/cost-efficiency rationale.
The symbolic syntax (Φ•Ω, Γ•Μ•Υ, etc.) represents core principles encoded in agent instructions. Key concepts relevant to the Targeted TDD approach:
- Φ•Ω [Core•Flow]: Workflow clarity, systematic extension, code quality (
§͟qual), confirmation (⊦confirm). - Γ•Μ•Υ [Context]: Doc/context-driven action (
⊦⟨doc⟩ ⟨ctx⟩ → ⟨action⟩), arch boundaries (⊤⟨arch⟩), tech management (⊢{...} ⊥newΔ). Orchestrator passes necessary :Context for targeted testing. - Τ•Ρ [Tasks]: Micro-tasks (
⊦⟨micro⟩), tiered RDD for uncertainty (‼️Ρ{pMCP tiers}→{search|...}✓findings), self-verify (⊗self{...}→⊦complete). - Κ•Σ [Code]: Best practices (
⊢{bestPractice}), conventions (≡⟨conventions⟩), modularity (⊤⟨module⟩), size limits (⟨file⟩≤350Λ), DRY (¬⟨duplication⟩). - Χ [Refactor]: Improve code based on specific triggers (
⋉{...}), verify via tests (initially@tester-corefor targeted checks, potentially@integratorlater) (⊨{⋈intact}⇒{⊦tester-core}). - Δ [Testing]: Test-driven (
⊢⟨test→code⟩), targeted coverage (🎯⟨coverage:core+context⟩), completion gated by passing targeted tests (‼️✓⟨tests pass:targeted⟩→⊦complete). Includes core recursive validation. Later comprehensive testing via@integrator. - Β [Debug]: SAPPO root cause (
⊙{⟨root⟩:SAPPO}), precise logs (⊢⟨log⟩). Supports Targeted TDD cycle. - Ξ [Security]: Server-logic (
⊤{server-logic}), validation/sanitization (‼️⊤⟨validate⟩), no hardcoded secrets (¬⟨hardcode⟩). - Ψ•Ε [VCS•Env]: Git usage (
⊤⟨git⟩), env-agnostic code (⟨code⟩→⟨agnostic⟩). - Λ [Docs]: Accurate (
⊤⟨mirror-reality⟩), including Targeted Test strategy explanation. Aids context recall. - Θ [Limits]: File size (
⟨file⟩≤350Λ), abstract credentials (¬⟨credentials⟩).
Builds heavily on Reuven Cohen's SPARC methodology and 'Boomerang Tasks' concept.
- Reference: 🪃 Boomerang Tasks by Reuven Cohen
- SAPPO is a custom ontology for this framework.
MIT License - see LICENSE file.
Found this framework valuable? Consider supporting its development.
Your support helps offset API costs (even with efficient models/methods) and allows for further refinement.
Thank you!
- Roo Code Docs
- Perplexity API Docs
- Anthropic API Docs (Claude)
- Software Architecture Problem Prediction Ontology (SAPPO)
- cline MCP Installer
- Requestly (Useful for API mocking/testing)
- MCP Standard
Embrace structured, ontology-driven, Targeted TDD-powered, and cost-effective AI-accelerated development. Plan meticulously, execute precisely, test smartly!
