chore: tidy up readme and docs

m0n0x41d · m0n0x41d · commit 96fdce02d710 · 2025-12-22T21:22:48.000+04:00
diff --git a/README.md b/README.md
@@ -149,9 +149,22 @@ This creates:
 /q1-hypothesize "Your problem..."  # Generate hypotheses
 ```
 
-> **Pro tip:** For best results, see [Advanced Setup](docs/advanced.md#agent-configuration) to optimize your AI's understanding of the reasoning process.
+Here is a library of some [workflow examples](docs/workflow_example/) that might help you kick off with probing.
 
-## How It Works
+But really, it would be better to hack into it straight away and feel the flow. Shash commands have a numeric prefix for your convenience.
+
+### Recommended: Add FPF Context to Your Agent Rules
+
+For best results, we highly recommend using the [`CLAUDE.md`](CLAUDE.md) from this repository as a reference for your own project's agent instructions. It's optimized for software engineering work with FPF.
+
+At minimum, copy the **FPF Glossary** section to your:
+- `CLAUDE.md` (Claude Code)
+- `.cursorrules` or `AGENTS.md` (Cursor)
+- Agent system prompts (other tools)
+
+This helps the AI understand FPF concepts like L0/L1/L2 layers, WLNK, R_eff, and the Transformer Mandate without re-explanation each session.
+
+## How Quint Code Works
 
 Quint Code implements the **[First Principles Framework (FPF)](https://github.com/ailev/FPF)** by Anatoly Levenchuk — a methodology for rigorous, auditable reasoning. The killer feature is turning the black box of AI reasoning into a transparent, evidence-backed audit trail.
 
@@ -184,6 +197,7 @@ See [docs/fpf-engine.md](docs/fpf-engine.md) for the full breakdown.
 
 ## Documentation
 
+- [Workflow Examples](docs/workflow_example/) — Step-by-step walkthroughs
 - [Quick Reference](docs/fpf-engine.md) — Commands and workflow
 - [Advanced: FPF Deep Dive](docs/advanced.md) — Theory, glossary, tuning
 - [Architecture](docs/architecture.md) — How it works under the hood
diff --git a/docs/workflow_example/README.md b/docs/workflow_example/README.md
@@ -0,0 +1,9 @@
+# Workflow Examples
+
+Step-by-step walkthroughs showing Quint in real scenarios.
+
+| Example | Scenario |
+|---------|----------|
+| [Payment Webhooks](payment-webhooks.md) | Handling unreliable external events |
+| [CI/CD Strategy](cicd-strategy.md) | Choosing deployment infrastructure |
+| [LLM Pipeline Debugging](llm-pipeline-debugging.md) | Improving ML/AI accuracy with empirical testing |
diff --git a/docs/workflow_example/cicd-strategy.md b/docs/workflow_example/cicd-strategy.md
@@ -0,0 +1,136 @@
+# Example: Choosing a CI/CD Strategy
+
+Your legacy deployments work — SSH into server, git pull, build in place.
+But every deployment is a prayer. No rollbacks, no consistency, no audit trail.
+
+You're building a new service and want to do it right this time.
+
+## The Problem
+
+Current state:
+- Git clone via SSH directly to EC2
+- Build happens on the server
+- No rollback mechanism
+- "Works on my machine" is the deployment strategy
+
+Requirements:
+- Build/release idempotency
+- Cost-effective (no Kubernetes)
+- Must scale to other services later
+- Private repos, AWS infrastructure
+
+## With Quint
+
+```bash
+$ /q1-hypothesize "CICD strategy for new service - no k8s, cost-effective, idempotent"
+```
+
+AI generates competing approaches:
+
+| # | Approach | Complexity | Cost | Rollback |
+|---|----------|------------|------|----------|
+| H1 | GitHub Actions + SSH deploy | Low | Free | Manual |
+| H2 | Docker Swarm + ECR | Medium | ~$5/mo registry | Built-in |
+| H3 | ECS Fargate | Medium-High | ~$30+/mo | Built-in |
+| H4 | Kamal (DHH's tool) | Medium | Free | Built-in |
+
+```bash
+$ /q2-verify
+```
+
+AI checks constraints:
+- **H1 fails:** "No idempotency — same problem as current setup"
+- **H3 partial:** "Overkill for B2B fintech with <100 RPS"
+- **H4 partial:** "Requires Ruby runtime on deploy machine"
+- **H2 passes:** "Native Docker, no external dependencies, handles multi-service"
+
+```bash
+$ /q1-add "concerned about blast radius — all services on one Swarm node?"
+```
+
+AI generates sub-hypotheses:
+
+| # | Approach | Blast Radius | Cost |
+|---|----------|--------------|------|
+| H2a | Single Swarm node, all services | High | Low (~$50/mo) |
+| H2b | Swarm node per service | Low | Higher (~$150/mo) |
+| H2c | Hybrid: Swarm for new, legacy stays | Medium | Medium |
+
+```bash
+$ /q3-validate
+```
+
+AI researches evidence:
+
+```
+H2 — Docker Swarm + ECR
+
+Evidence collected:
+1. [Docker docs] Swarm handles 1000s of containers per node — CL2
+2. [AWS pricing] ECR ~$0.10/GB, t3.medium ~$30/mo — CL3 (our context)
+3. [HN thread] "Swarm dead?" — maintainers confirm active, just feature-complete — CL1
+
+R_eff = 0.68 (limited by HN thread uncertainty)
+```
+
+```bash
+$ /q4-audit
+```
+
+```
+Bias check: Initial preference was Kamal (H4) — rejected on Ruby dependency
+
+Trust scores:
+- H2b (Swarm per service): R_eff = 0.68
+
+Cost analysis:
+- Staging: Single Swarm node acceptable (blast radius OK for non-prod)
+- Production: Swarm per service for isolation
+
+Recommendation: H2b for prod, H2a for staging. Revisit if RPS > 1000.
+```
+
+```bash
+$ /q5-decide
+```
+
+```
+Decision: Docker Swarm + ECR
+
+Architecture:
+- Staging: Single t3.medium, all services in Swarm
+- Production: t3.medium per service, Swarm mode
+- Registry: AWS ECR (private, ~$5/mo)
+- CI: GitHub Actions (free tier sufficient)
+
+Rationale:
+- Idempotent deploys via image tags
+- Rolling updates with automatic rollback
+- No Ruby/external dependencies
+- Migration path for legacy services
+
+Review trigger: If any service needs >1000 RPS or HA across AZs
+```
+
+When new developer asks: *"Why aren't we using Kubernetes? Everyone uses it."*
+
+```bash
+$ /q-query "CICD architecture decision"
+```
+
+```
+Decision: 2024-02-15 — Docker Swarm selected over ECS/K8s
+
+Key evidence:
+- B2B fintech, <100 RPS typical load
+- K8s operational overhead not justified
+- Swarm provides 90% of benefits at 10% complexity
+- Cost: ~$80/mo vs ~$300/mo for ECS
+
+Recommendation: Revisit only if:
+- Multi-AZ HA required
+- RPS exceeds 1000 sustained
+- Team grows beyond 5 engineers
+```
+
+**The decision survives team changes. No tribal knowledge required.**
diff --git a/docs/workflow_example/payment-webhooks.md b/docs/workflow_example/payment-webhooks.md
@@ -0,0 +1,108 @@
+# Example: Handling Payment Confirmations
+
+Your checkout works. Stripe charges the card.
+But three weeks later, finance finds $12,000 in "ghost payments" —
+customers charged but never got access.
+
+The webhook endpoint returned 200. Logs look clean.
+What went wrong?
+
+## Without Quint
+
+Your AI suggests: *"Just add a webhook endpoint that activates the subscription"*
+
+You ship it. It works in testing. Production looks fine.
+
+Until it doesn't. Webhooks fail silently. Your endpoint timed out during a DB hiccup. Stripe retried, you processed it twice. A network blip ate three webhooks completely.
+
+Now you're debugging production with no record of why you built it this way.
+
+## With Quint
+
+```bash
+$ /q1-hypothesize "handle stripe payment confirmation"
+```
+
+AI generates competing approaches:
+
+| # | Approach | Risk | Recovery |
+|---|----------|------|----------|
+| H1 | Webhook-only | Silent failures, no detection | None without manual audit |
+| H2 | Webhook + sync processing | Timeout = lost event, retries = duplicates | Stripe retry (3 days) |
+| H3 | Webhook → Queue + Polling backup | Complex, two code paths | Self-healing |
+
+```bash
+$ /q2-verify
+```
+
+AI checks each hypothesis for logical consistency:
+- **H1 fails:** "No mechanism detects missed webhooks"
+- **H2 partial:** "Idempotency key needed, still misses network failures"
+- **H3 passes:** "Polling catches what webhooks miss, queue handles spikes"
+
+```bash
+$ /q3-validate
+```
+
+AI gathers evidence for surviving hypotheses:
+
+```
+H3 — Webhook → Queue + Polling backup
+
+Evidence collected:
+1. [Stripe docs] "Webhooks are not guaranteed" — CL2 (external docs)
+2. [Internal test] Queue handles 10k events/sec — CL3 (same context)
+3. [Case study] Company X lost $50k before adding polling — CL1 (different context)
+
+R_eff = 0.72 (limited by CL1 case study)
+```
+
+```bash
+$ /q4-audit
+```
+
+```
+Bias check: No anchoring detected (H1 was initial suggestion, rejected)
+
+Trust scores:
+- H3: R_eff = 0.72 (WLNK: external case study)
+
+Recommendation: H3 is viable. Consider refreshing case study
+with internal metrics after 30 days in production.
+```
+
+```bash
+$ /q5-decide
+```
+
+```
+Decision: H3 — Async queue + 15-min polling reconciliation
+
+Rationale:
+- Webhook acknowledges immediately (200 in <100ms)
+- Background job processes with idempotency check
+- Polling job catches silent failures
+- Accepted tradeoff: 15-min max delay for edge cases
+
+Evidence: Stripe docs recommend polling backup.
+Review trigger: If webhook success rate drops below 99%
+```
+
+## 3 weeks later
+
+Finance asks: *"Why do we poll every 15 minutes? Can we remove it?"*
+
+```bash
+$ /q-query "payment confirmation architecture"
+```
+
+```
+Decision: 2024-01-15 — H3 selected over webhook-only
+
+Key evidence:
+- Stripe admits webhook delivery "not guaranteed"
+- Polling catches ~0.3% of transactions (measured)
+- Removing polling = ~$400/month in silent failures
+
+Recommendation: Keep polling. Document in runbook.
+```