Time: 10 minutes | Level: Beginner | Prerequisites: Installation complete
Welcome! In this quickstart, you will build a calculator module using Test-Driven Development (TDD) with Babysitter. By the end, you will have experienced:
- Automatic quality convergence (iterate until quality target met)
- The TDD workflow (tests first, then implementation)
- Journal-based persistence (everything is recorded)
Note: TDD Quality Convergence is the full name; we use "TDD" as shorthand throughout this guide.
Let's get started!
A simple calculator module with:
add(a, b)- Add two numberssubtract(a, b)- Subtract two numbersmultiply(a, b)- Multiply two numbersdivide(a, b)- Divide two numbers (with error handling)
The result will include:
- Working implementation
- Test suite with multiple test cases
- 80%+ quality score achieved through automatic iteration
If you haven't already, configure your personal preferences:
/babysitter:user-installThis personalizes Babysitter for your workflow - breakpoint frequency, communication style, and expertise areas.
In your project directory:
/babysitter:project-installThis analyzes your codebase and configures project-specific settings.
Quick check that everything is working:
# In your terminal
npx -y @a5c-ai/babysitter-sdk@latest --version
# Or run diagnostics
/babysitter:doctorYou should see a version number. If not, revisit the installation guide.
Navigate to your project directory (or create a new one):
# Create a new project directory
mkdir my-babysitter-project
cd my-babysitter-project
# Initialize npm (optional but recommended)
npm init -yOpen Claude Code in your project directory and enter this command:
/babysitter:call create a calculator module with add, subtract, multiply, and divide functions using TDD with 80% quality target
Alternative (natural language):
Use the babysitter skill to build a calculator module with TDD and 80% quality target
Babysitter will start and show output like:
Creating new babysitter run: calculator-20260125-143012
Process: TDD Quality Convergence
Target Quality: 80%
Run ID: 01KFFTSF8TK8C9GT3YM9QYQ6WG
Run Directory: .a5c/runs/01KFFTSF8TK8C9GT3YM9QYQ6WG/
Babysitter is now orchestrating your TDD workflow!
Sit back and observe as Babysitter works through the TDD methodology:
[Phase 1] Research
- Analyzing project structure... done
- Checking existing patterns... done
- Identifying test framework... done
Babysitter examines your codebase to understand the context.
[Phase 2] Specifications
- Defining calculator interface...
- Specifying test cases...
- Creating implementation plan...
Specifications complete:
- 4 functions defined
- 12 test cases planned
- Jest test framework selected
Babysitter creates a clear specification before coding.
This is where the magic happens. Babysitter iterates until quality is achieved:
[Iteration 1/5] Starting TDD implementation...
Writing tests:
- add.test.js: 3 test cases
- subtract.test.js: 3 test cases
- multiply.test.js: 3 test cases
- divide.test.js: 3 test cases (including error handling)
Implementing code:
- calculator.js: add, subtract, multiply, divide functions
Quality checks:
- Tests: 11/12 passing
- Coverage: 75%
- Linting: 2 warnings
Quality Score: 72/100 (target: 80)
Below target, continuing...
[Iteration 2/5] Refining implementation...
Fixes:
- Fixed divide by zero test
- Improved edge case handling
- Resolved lint warnings
Quality checks:
- Tests: 12/12 passing
- Coverage: 92%
- Linting: 0 warnings
Quality Score: 88/100 (target: 80)
Target achieved!
When Babysitter completes, you'll see a summary:
Run completed successfully!
Summary:
- Iterations: 2 of 5
- Final Quality Score: 88/100
- Test Coverage: 92%
- Tests: 12 passing
- Duration: 3m 45s
Files created:
- calculator.js
- calculator.test.js
Run ID: 01KFFTSF8TK8C9GT3YM9QYQ6WG
Journal: .a5c/runs/01KFFTSF8TK8C9GT3YM9QYQ6WG/journal/*.json
Check your project directory:
ls -laYou should see new files:
calculator.js # Your calculator implementation
calculator.test.js # Test suite
.a5c/ # Babysitter run data
Open calculator.js:
// calculator.js - Created by Babysitter TDD workflow
function add(a, b) {
return a + b;
}
function subtract(a, b) {
return a - b;
}
function multiply(a, b) {
return a * b;
}
function divide(a, b) {
if (b === 0) {
throw new Error('Cannot divide by zero');
}
return a / b;
}
module.exports = { add, subtract, multiply, divide };npm test
# or
npx jestExpected output:
PASS ./calculator.test.js
Calculator
add
✓ adds two positive numbers
✓ adds negative numbers
✓ adds zero
subtract
✓ subtracts two numbers
...
Test Suites: 1 passed, 1 total
Tests: 12 passed, 12 total
Every action Babysitter took is recorded in the journal. Let's explore:
# View the journal files
ls .a5c/runs/01KFFTSF8TK8C9GT3YM9QYQ6WG/journal/*.jsonSample events from journal JSON files:
{"type":"RUN_CREATED","recordedAt":"2026-01-25T14:30:12Z","data":{"runId":"01KFFTSF8TK8C9GT3YM9QYQ6WG"},"checksum":"a1b2c3"}
{"type":"EFFECT_REQUESTED","recordedAt":"2026-01-25T14:30:13Z","data":{"effectId":"research-001","effectType":"agent"},"checksum":"d4e5f6"}
{"type":"EFFECT_RESOLVED","recordedAt":"2026-01-25T14:30:38Z","data":{"effectId":"research-001","duration":25000},"checksum":"g7h8i9"}
{"type":"EFFECT_REQUESTED","recordedAt":"2026-01-25T14:31:00Z","data":{"effectId":"tdd-impl-001","effectType":"agent","iteration":1},"checksum":"j0k1l2"}
{"type":"EFFECT_RESOLVED","recordedAt":"2026-01-25T14:33:00Z","data":{"effectId":"tdd-impl-001","iteration":1},"checksum":"m3n4o5"}
{"type":"EFFECT_REQUESTED","recordedAt":"2026-01-25T14:33:01Z","data":{"effectId":"tdd-impl-002","effectType":"agent","iteration":2},"checksum":"p6q7r8"}
{"type":"EFFECT_RESOLVED","recordedAt":"2026-01-25T14:34:30Z","data":{"effectId":"tdd-impl-002","iteration":2},"checksum":"s9t0u1"}
{"type":"RUN_COMPLETED","recordedAt":"2026-01-25T14:34:45Z","data":{"status":"success"},"checksum":"v2w3x4"}This is the audit trail. Every effect request, every resolution - all recorded. The five SDK event types are: RUN_CREATED, EFFECT_REQUESTED, EFFECT_RESOLVED, RUN_COMPLETED, and RUN_FAILED.
Let's see how easy it is to extend your calculator. Ask Babysitter to add more features:
/babysitter:call add a power function and square root function to the calculator with TDD
Babysitter will:
- Analyze the existing calculator
- Write new tests for power and sqrt
- Implement the new functions
- Iterate until quality is achieved
Let's recap what Babysitter did for you:
- You: "Claude, write tests for a calculator"
- You: "Now implement the calculator"
- You: "Run the tests... 2 failed. Fix them."
- You: "Check coverage... too low. Add more tests."
- You: "Run tests again... passed!"
- You: (repeat if you want higher quality)
Time: 20-30 minutes with multiple back-and-forth interactions
- You: "/babysitter:call create calculator with TDD, 80% quality"
- (Babysitter handles everything automatically)
- Done!
Time: ~5 minutes, hands-free
- Quality Convergence: You set 80% target, Babysitter iterated until it achieved 88%
- TDD Methodology: Tests were written before implementation
- Complete Audit Trail: Every action logged in the journal
- No Context Loss: If interrupted, you can resume exactly where you left off
You just used /babysitter:call — the default interactive mode. Babysitter has four modes, each with different levels of autonomy:
| Mode | Command | When to Use |
|---|---|---|
| Interactive | /babysitter:call |
What you just used. Pauses for approval. |
| YOLO | /babysitter:yolo |
Full auto. Ship while you sleep. |
| Forever | /babysitter:forever |
Never-ending loops for monitoring tasks. |
| Plan | /babysitter:plan |
Review the process before executing. |
Try YOLO mode for a trusted task:
/babysitter:yolo add input validation to all form fields
No breakpoints, no questions. Babysitter handles everything autonomously.
Full reference: Slash Commands Reference
One of Babysitter's superpowers is persistence. Let's try it:
/babysitter:call build a REST API for task management with authentication, using TDD with 85% quality target and max 10 iterations
Close Claude Code or press Ctrl+C while it's running.
Open Claude Code again and run:
/babysitter:call resume the babysitter run
or
/babysitter:call resume
Babysitter will:
- Find the interrupted run
- Replay the journal to restore state
- Continue from exactly where it stopped
No work lost!
Cause: Plugin may not be loaded.
Solution:
- Check
/skillsshows "babysit" - Restart Claude Code if needed
- Verify plugin is enabled:
claude plugin list
Cause: You may have missed the question in the chat or the session timed out.
Solution:
- Scroll up to find the breakpoint question and respond
- Or resume the run:
claude "/babysitter:call resume the babysitter run"
Cause: Target may be too high for the task complexity.
Solution:
- Lower the target (try 70% instead of 90%)
- Increase max iterations:
--max-iterations 10 - Be more specific in your request
Cause: Waiting for breakpoint approval.
Solution:
- Look for a question from Claude in your chat
- Respond to approve and continue the workflow
Congratulations! You've completed your first Babysitter run. Here's what to explore next:
- First Run Deep Dive - Understand exactly what happened in detail
- Try different prompts:
/babysitter:call refactor the calculator for better error handling/babysitter:call add comprehensive documentation to the calculator/babysitter:call increase test coverage to 95%
-
Explore methodologies:
- TDD (Test-Driven Development) - what you just used
- GSD (Get Shit Done) - faster, less formal
- Spec-Kit - specification-driven development
-
Configure breakpoints for approval workflows
- Custom quality targets and scoring criteria
- Parallel execution for faster runs
- Custom process definitions (for power users)
Commands used in this quickstart:
# Start a TDD run with quality target (in Claude Code)
/babysitter:call <description> with TDD and <X>% quality target
# Resume an interrupted run
/babysitter:call resume
# View run journal files
ls .a5c/runs/<runId>/journal/*.json
# List all runs
ls .a5c/runs/In just 10 minutes, you:
- Built a calculator module with TDD methodology
- Achieved automatic quality convergence (set target, iterate until met)
- Explored the event journal (complete audit trail)
- Learned how to resume interrupted sessions
Babysitter turns complex AI workflows into single commands with deterministic, resumable execution.
Ready to go deeper? Continue to First Run Deep Dive to understand exactly what happened under the hood.