Instructions for AI agents working on the Paperize codebase.
Paperize is a CLI that distills unstructured text sources (notes, markdown, research) into qualified goals via AI. It has an interactive TUI (Ink/React) and a headless mode. Output: JSON, markdown, or YAML.
See CLAUDE.md for build commands and architecture details. See doc/ROADMAP.md for planned features.
- Runtime: Node.js 20+, ESM
- UI: Ink 6 + React 19 (terminal UI)
- Build: esbuild → single
dist/cli.mjsbundle - Test:
node:test(built-in runner), no framework deps - Lint/Format: ESLint 9 flat config + Prettier
- AI: Anthropic Messages API (direct
fetch, no SDK)
- JSX for Ink components (
.jsx), plain JS for logic (.js) - No TypeScript — plain JavaScript with JSDoc annotations where helpful
- Functional style: no classes. Logic files export pure functions.
- Components are one per file, named
Step<Name>.jsx - Logic lives in
src/logic/, components insrc/components/ - Single entry point:
src/cli.jsxroutes to<App>(interactive) orrunHeadless()(headless)
The interactive app flows through steps. Each step is a React component that calls onComplete(data) to advance:
SOURCE → SCAN → CONTEXT → ANALYZE → GOALS → DONE
State lives in src/app.jsx as useState hooks. No external state management.
Two modes, auto-selected by total char count:
- Single-shot (< 150K chars): one API call,
SINGLE_SHOT_SYSTEMprompt - Map-reduce (>= 150K chars):
buildBatches()splits files into ~100K char batches- Phase 1:
EXTRACT_SYSTEMprompt per batch (parallel, up to 3) - Phase 2:
SYNTHESIZE_SYSTEMprompt on merged ideas
Each API call uses callClaudeWithTicker() which wraps the call with a 1-second timer emitting onStatus events for UI progress.
- All prompts end with "You MUST respond with a valid JSON array"
- Prompts explicitly handle multilingual content (German, English, etc.)
- Response parsing:
parseIdeasJson()andparseGoalsJson()extract JSON from markdown fences or raw text - When 0 ideas are extracted, the raw AI response is logged for debugging
src/logic/scan.js — recursive file discovery with two paths:
- Node.js path:
readdir+readFile(default) - Shell fallback:
find+catviaexecSync(when Node.js is blocked by macOS TCC on iCloud/protected dirs)
Key constants: SUPPORTED_EXTENSIONS, SKIP_DIRS, MAX_FILE_SIZE (512 KB).
The analysis pipeline uses two callback patterns:
onProgress(line: string)— permanent log lines (batch results, phase headers)onStatus({ phase, label, elapsed })— ephemeral status (spinner/ticker during API calls). Phase'idle'clears the status.
Both the Ink UI (StepAnalyze.jsx) and headless mode (headless.js) consume these differently — Ink renders React state, headless uses in-place terminal overwrite.
- esbuild banner injects
#!/usr/bin/env nodeshebang +createRequireshim for CJS deps react-devtools-corealiased tosrc/shims/empty.js(Ink imports it, not needed in production)- The bundle is a single ESM file — all deps are inlined
- Tests use
node:test+node:assert(no external test framework) - Test files:
*.test.jsnext to the module they test - Current coverage:
scan.test.js(12 tests: scanDirectory, buildDocument, buildBatches, summarizeScan) - Use
tmpdir+ real filesystem for scan tests (not mocks) - For analyze tests: inject a mock
callFnor interceptfetch— don't call the real API
paperize/
├── AGENTS.md # You are here
├── CLAUDE.md # Claude Code instructions
├── CHANGELOG.md # Version history
├── LICENSE # MIT
├── README.md # Public-facing docs
├── package.json # paperize
├── esbuild.config.mjs # Build config
├── eslint.config.mjs # Lint config
├── .prettierrc # Format config
├── .gitignore
├── doc/
│ └── ROADMAP.md # Feature roadmap (v0.2–v0.5)
├── src/
│ ├── cli.jsx # Entry point + flag parsing
│ ├── app.jsx # Ink state machine
│ ├── headless.js # Non-interactive mode
│ ├── shims/
│ │ └── empty.js # Empty shim for react-devtools-core
│ ├── components/
│ │ ├── Header.jsx # Progress bar with step labels
│ │ ├── StepSource.jsx # Folder input
│ │ ├── StepScan.jsx # File discovery + strategy preview
│ │ ├── StepContext.jsx # Optional guiding context
│ │ ├── StepAnalyze.jsx# AI analysis with live progress
│ │ ├── StepGoals.jsx # Multi-select goal reviewer
│ │ └── StepDone.jsx # Output summary
│ └── logic/
│ ├── scan.js # File scanner + batching
│ ├── scan.test.js # Scanner tests (12)
│ └── analyze.js # AI pipeline (extract → synthesize)
└── dist/
└── cli.mjs # Built bundle
Do:
- Keep the single-bundle architecture — everything bundles into one
dist/cli.mjs - Use
onProgress/onStatusfor any long-running operation - Test with real filesystem fixtures in
tmpdir, not mocks, for scan logic - Support both interactive and headless modes for every feature
- Handle multilingual content in prompts (user's vault may be mixed German/English)
Don't:
- Add Python dependencies — stay pure Node.js
- Add TypeScript — the project uses plain JS intentionally
- Use external test frameworks (jest, vitest, mocha) —
node:testis sufficient - Call the real Anthropic API in tests
- Assume TTY availability — headless mode must work in pipes and CI