Rule: Never move to the next stage until every success criterion in the current stage passes.
Goal: Repo structure, tooling, containerized infrastructure, multi-tenant schema.
-
1.1 — Monorepo Setup & Tooling
-
go build ./...compiles -
cd api/gateway && pnpm install && pnpm buildsucceeds - Structure matches layout
- README mentions B2B, 50M/day scale, and high-throughput patterns
- Every Go package has doc.go
- Both Go binaries start and shut down gracefully
- Gateway responds to GET /health
-
-
1.2 — Docker & Infrastructure
-
make docker-up— all services healthy including TigerBeetle and PgBouncer - PgBouncer accepts connections on 6433, 6434, 6435
- TigerBeetle responds on port 3001
- NATS JetStream enabled
- Application env vars point to PgBouncer, not raw Postgres
-
make docker-resetgives clean slate
-
-
1.3 — Database Migrations & SQLC Setup
- All migrations apply cleanly
- 6 monthly partitions created per partitioned table
- Tenant isolation: UNIQUE constraints are per-tenant
- Capacity comments in migration files
- SQLC generates and compiles
- Seed creates both demo tenants
Goal: Domain types, dual-backend ledger, settlement engine, in-memory treasury, smart router.
-
2.1 — Domain Types & Interfaces
- All tests pass (56/56), zero infrastructure imports
- Ledger interface documented for dual-backend
- TreasuryManager uses Reserve/Release (not Lock/Unlock)
- All types include TenantID
- Coverage 81.7% (>80%)
-
2.2 — Settla Ledger (Dual Backend)
-
go test ./ledger/... -v -race— all 22 tests pass - PostEntries writes to TigerBeetle, not Postgres
- GetBalance reads from TigerBeetle (authoritative)
- GetEntries reads from Postgres (query layer)
- Sync consumer populates Postgres from TigerBeetle
- Idempotency works end-to-end
- Write batching reduces round-trips (batch test confirms fewer TB calls)
- System degrades gracefully if Postgres read-side is down
-
-
2.3 — Settla Core (Settlement Engine)
- All tests pass with
-race - Tenant validation enforced
- Uses Reserve/Release (not Lock)
- Ledger entries use tenant account codes
-
go listconfirms no imports of concrete modules
- All tests pass with
-
2.4 — Settla Treasury (In-Memory Reservation)
-
go test ./treasury/... -v -race— all pass - Reserve takes <1μs (benchmark)
- 10,000 concurrent reserves: no over-reservation
- Complete tenant isolation
- Background flush writes to DB
- Crash recovery works (restart from DB state)
-
-
2.5 — Settla Rail & Mock Providers
- Routes sorted by score, insufficient liquidity filtered
- Different tenants get different fees
- Mock providers support GBP↔NGN corridor via USDT
- All tests pass with
-race
Goal: Real blockchain transactions on testnets. Fiat simulated, crypto real.
-
2.5.1 — Wallet Management (WP-1 through WP-3)
- BIP-44 HD derivation: Tron, Solana, Ethereum, Base
- AES-256-GCM key encryption at rest
- System wallets (
system/{chain}/hot) and tenant wallets (tenant/{slug}/{chain}) - Faucet integration: Tron Nile (automated), Solana Devnet (automated), Sepolia/Base (manual)
- Private keys never appear in logs or error messages
-
2.5.2 — Blockchain Clients (WP-4 through WP-7)
- Tron Nile client: TRX + TRC20 balance, send, get tx, subscribe
- Ethereum Sepolia + Base Sepolia client: ETH/ERC20, gas estimation, nonce management
- Solana Devnet client: SOL + SPL token transfers, ATA creation
- Blockchain registry:
GetClient(chain), RPC failover, circuit breaker - Explorer URL generation for all four testnets
-
2.5.3 — Settla Provider: FX Oracle & Fiat Simulator (WP-8, WP-9)
- FX oracle: NGN/GBP/EUR/GHS/USD with ±0.15% jitter, cross rates, thread-safe
- Fiat simulator: collection (PENDING → PROCESSING → COLLECTED) + payout (PAYOUT_INITIATED → COMPLETED)
- Per-currency delays: NGN 3–5s, GBP 5–10s, USD 10–30s, EUR/GHS 5–10s
- Configurable failure rate (default 2%)
-
2.5.4 — On-Ramp Provider (WP-10)
-
ID() → "settla-onramp", fiat → stablecoin pairs (GBP/NGN/USD/EUR/GHS → USDT/USDC) - 30bps spread + minimum fee applied to quotes
- Async flow: fiat collection → real blockchain send →
GetStatuspolling - Explorer URL in all transaction metadata
- USDT defaults to Tron, USDC defaults to Ethereum/Base
-
-
2.5.5 — Off-Ramp Provider (WP-11)
-
ID() → "settla-offramp", stablecoin → fiat pairs (USDT/USDC → GBP/NGN/USD/EUR/GHS) - 30bps spread applied (rate < 1 for provider profit on stablecoin→fiat)
- Async flow: crypto receipt verification → fiat payout simulation
- Falls back to simulated receipt when RPC unavailable (graceful degradation)
- Deposit address (system hot wallet) returned on Execute
- Explorer URL in all transaction metadata
-
-
2.5.6 — Provider Registry & Router Integration (WP-12, WP-13)
-
SETTLA_PROVIDER_MODEenv var:mock | testnet | live - Registry wires Settla on/off-ramp based on mode
- Transfer API response includes
blockchain_transactionswith explorer URLs - Router includes explorer URLs in
RouteInfo
-
-
2.5.7 — Testnet Setup & Makefile (WP-14)
-
make testnet-setupinitialises and funds wallets -
make provider-mode-mock/make provider-mode-testnet -
.env.exampleupdated with all testnet variables -
docker-compose.ymlupdated with testnet env vars
-
Goal: Partitioned NATS for parallel processing, Redis with local cache.
-
3.1 — Settla Node (Partitioned NATS Workers)
- Events partitioned by tenant hash
- Same tenant's events always route to same partition
- Different tenants processed in parallel
- Full saga works through partitioned routing
- Dev mode: single instance handles all partitions
-
3.2 — Redis & Local Cache
- Local cache auth lookup <1μs (benchmark) — 107ns measured
- Two-level cache: local → Redis → DB
- Rate limits approximate but correct over 5-second windows
- Tenant isolation on all cache operations
Goal: Fastify gateway with local tenant cache, gRPC connection pool, per-tenant webhooks.
-
4.1 — Protocol Buffers & gRPC
-
make protogenerates Go + TypeScript - gRPC server starts with high-throughput config
- All tenant-scoped RPCs include tenant_id
-
-
4.2 — Settla API Gateway (Fastify)
- gRPC connection pool working (not per-request)
- Auth resolves from local cache in <1ms on cache hit
- Tenant isolation verified
- Response serialization uses schema (not JSON.stringify)
- OpenAPI spec valid
-
4.3 — Webhook Dispatcher
- Correct tenant's URL and HMAC secret
- Retry and dead letter work
- Worker pool handles concurrent delivery
Goal: Ops console with capacity monitoring, per-tenant metrics.
-
5.1 — Settla Dashboard
- Capacity page shows live throughput metrics
- TigerBeetle write rate visible
- Treasury flush lag visible
- NATS partition queue depths visible
- Per-tenant volume vs limit
-
5.2 — Observability
- Structured logging: slog (Go) with JSON/text handler, pino (TS) — service, version, tenant_id on every log
- Prometheus metrics: Go (settla-server :8080/metrics, settla-node :9091/metrics), TS (gateway :3000/metrics, webhook :3001/metrics)
- TigerBeetle write metrics (settla_ledger_tb_writes_total, _write_latency, _batch_size)
- Treasury reservation latency metric (settla_treasury_reserve_latency_seconds, sub-microsecond buckets)
- Treasury flush metrics (settla_treasury_flush_lag_seconds, _flush_duration)
- Treasury balance/locked gauges per tenant/currency/location
- PG sync lag metric (settla_ledger_pg_sync_lag_seconds)
- NATS partition metrics (settla_nats_messages_total, _partition_queue_depth)
- Transfer metrics (settla_transfers_total, _transfer_duration_seconds) with tenant/status/corridor labels
- Provider metrics (settla_provider_requests_total, _latency_seconds)
- gRPC interceptor metrics (settla_grpc_requests_total, _request_latency_seconds)
- Gateway HTTP metrics (settla_gateway_requests_total, _request_duration_seconds, auth cache hits/misses)
- Webhook delivery metrics (settla_webhook_deliveries_total, _delivery_duration_seconds)
- Docker: Prometheus (prom/prometheus:v2.51.0, :9092) + Grafana (grafana:10.4.1, :3002)
- 5 provisioned Grafana dashboards: Overview, Capacity Planning, Treasury Health, API Performance, Tenant Health
- No PII in logs, metrics use judiciously low-cardinality labels
Goal: Wire everything, E2E tests, demo, capacity documentation.
-
6.1 — End-to-End Integration
- Both corridors work end-to-end (GBP→NGN, NGN→GBP)
- TigerBeetle receives ledger writes, Postgres has synced data
- Treasury reservations work under concurrent load
- Complete tenant isolation
- Per-tenant fees and limits enforced
- 100 concurrent transfers: no over-reservation
- Import boundaries enforced
-
6.2 — Demo Script & Documentation
-
make demoruns all 5 scenarios - Burst scenario shows concurrent handling
- README leads with B2B positioning and 50M/day scale
- Capacity planning doc has real math
- All 13 ADRs present with threshold-driven reasoning
-
Goal: Prove 50M txn/day with measured results. Numbers for README and articles.
-
7.1 — Component Benchmarks (Go)
-
make benchruns all benchmarks and produces bench-results.txt - All targets met (threshold comparison script shows all PASS — 76/76)
- Treasury Reserve ~1.5-2μs measured (>500K/sec, 100x above 5K TPS needed)
- Ledger batch throughput measured with mock TB (real TB: 1M+ TPS)
- Concurrent reservation: no over-reservation detected
- All benchmarks include allocation reporting (
-benchmem) - Results reproducible across runs (targets set with variance headroom)
-
-
7.2 — Integration Load Tests
-
make loadtest-quickcompletes in <5 minutes, all checks pass - Peak load (5,000 TPS): sustained for 10 min with p99 <50ms
- Post-test verification: all consistency checks pass
- Live dashboard shows real-time metrics during test
- Report generated with throughput, latency percentiles, error rates
- No goroutine leaks after test completion
- Single tenant flood: no over-reservation detected
-
-
7.3 — Soak Test & Profiling
-
make soak-short(15 min) passes all stability checks - No memory leaks detected (RSS growth <50MB)
- No goroutine leaks (count stable ±5%)
- No PgBouncer connection exhaustion
- p99 latency degradation <20% from baseline
- Report generated with all metrics
- Profile comparison shows stable CPU/heap patterns
-
-
7.4 — Chaos Testing
- TigerBeetle restart: no money lost, transfers fail/refund cleanly
- Postgres pause: system continues, catches up after recovery
- NATS restart: no duplicates, all transfers complete eventually
- Redis down: transfers still work (degraded caching)
- Server crash: recovery from DB state, no over-reservation
- PgBouncer saturation: queues but doesn't crash
- ALL scenarios: ledger balanced, treasury consistent after recovery
-
7.5 — Benchmark Report & Capacity Documentation
-
make reportgenerates complete benchmark report - All sections show measured data (not estimates)
- Extrapolation math is sound (measured peak → daily capacity)
- README updated with real numbers
- Capacity planning doc has measured vs required comparison
- Report is reproducible
-
git clone && cp .env.example .env # 1. Clean clone
make build # 2. Build
make docker-up && sleep 25 # 3. Infrastructure
make migrate-up && make db-seed # 4. Database + tenants
make test # 5. Unit tests
make test-integration # 6. Integration tests
make bench # 7. Component benchmarks
make loadtest-quick # 8. Load test (quick)
make soak-short # 9. Soak test (short)
make chaos # 10. Chaos tests
make report # 11. Full benchmark report
make demo # 12. Demo
# 13. API verification (curl gateway)
# 14. Tenant isolation proof (cross-tenant 404)
# 15. Observability (Prometheus metrics)
# 16. Dashboard (capacity page)
make lint && go test -race ./... # 17. Code quality
# 18. Module boundaries (no core→concrete imports)