docs: add infrastructure documentation and security guidelines

ciro-maciel · ciro-maciel · commit 63ceae65ebf6 · 2025-08-19T11:13:47.000-03:00
diff --git a/README.md b/README.md
@@ -1,27 +1,103 @@
-# spamguard-mlops
+# SpamGuard MLOps
 
-## Referências
+Production-minded, didactic MLOps template for spam detection. Service-oriented structure with clear seams to evolve from a baseline to production.
 
-- [Elysia](https://elysiajs.com/)
-- [Drizzle](https://drizzle-orm.com/)
-- [Mantine](https://mantine.dev/)
-- [Recharts](https://recharts.org/)
-- [Bun](https://bun.sh/)
-- [Fly.io](https://fly.io/)
-- [GitHub Actions](https://github.com/features/actions)
+## Start here
 
-## Tools
+- Executive summary (1‑min): `infra/ONE-PAGER.md`
+- Quickstart: see the next section below
 
-- [DVC](https://github.com/iterative/dvc)
-- [MLflow](https://github.com/mlflow/mlflow)
+## Value proposition
 
-## Fly.io
+- Fast: minutes from dataset to serving a model
+- Clear: simple stack for onboarding and interviews
+- Extensible: upgrade DB, swap model, add observability without rewrites
+- Portable: Bun-native dev, Docker/K8s-ready
 
-```bash
-fly auth login
-fly apps create spamguard-mlflow
+## Repository structure
+
+```
+artifacts/                 # Centralized model artifacts (JSON)
+data/raw/dataset.csv       # Sample dataset (CSV)
+dashboard/                 # React + Mantine UI (demo/dashboard)
+inference/                 # API (Bun + Elysia) + Drizzle ORM + SQLite
+training/                  # Training job (Natural Naive Bayes)
+infra/                     # Dockerfiles, Compose, K8s manifests, docs
+```
+
+## Quickstart (local)
+
+Prereq: Bun installed (https://bun.sh/)
+
+Install dependencies (workspaces):
+
+```sh
+bun install
+```
+
+Run DB migrations (if needed):
+
+```sh
+bun --cwd inference run db:generate
+bun --cwd inference run db:migrate
+```
+
+Train and promote a model:
+
+```sh
+bun --cwd training run train
+ls -l artifacts
+```
+
+Start services (inference API + dashboard):
+
+```sh
+bun run dev
+# API: http://localhost:3001
+# UI:  http://localhost:5173 (Vite default) or as configured in dashboard
 ```
 
-```bash
-fly deploy
+## Quickstart (Docker Compose)
+
+```sh
+docker compose -f infra/docker-compose.yml up -d --build inference dashboard
+# Optional on-demand training job
+docker compose -f infra/docker-compose.yml --profile training run --rm training
 ```
+
+## API (short)
+
+- POST `/predict` -> `{ prediction: [{ label, value }, ...] }`
+- GET `/dashboard` -> list of runs with metrics
+  Details: see `infra/API.md`.
+
+## CI/CD
+
+GitHub Actions pipeline (`.github/workflows/ci-cd.yml`):
+
+1. Install deps
+2. Generate + apply DB migrations
+3. Train and (if better) promote model, saving artifact to `artifacts/`
+4. Build dashboard and upload artifact (deploy step is a placeholder)
+
+## Tech stack
+
+- Runtime: Bun (JavaScript ESM)
+- API: Elysia, @elysiajs/cors
+- ORM/DB: Drizzle ORM + SQLite
+- ML: Natural (Naive Bayes)
+- UI: React, Mantine, Recharts, Vite
+- CI: GitHub Actions
+- Infra: Docker, Docker Compose, Kubernetes manifests
+
+## Architecture & Ops
+
+- One-pager (simple): `infra/ONE-PAGER.md`
+- API reference: `infra/API.md`
+- Runbook (local, Compose, K8s, troubleshooting): `infra/RUNBOOK.md`
+- Security checklist: `infra/SECURITY.md`
+- Infra index: `infra/README.md`
+
+## Notes
+
+- This is an educational template. For production, upgrade DB (e.g., Postgres), add auth/rate limiting, observability, and persistent volumes.
diff --git a/infra/API.md b/infra/API.md
@@ -0,0 +1,55 @@
+# API
+
+Base URL (local): http://localhost:3001
+
+## POST /predict
+- Description: Returns ranked spam/ham classifications for a given message.
+- Request
+  - Headers: `Content-Type: application/json`
+  - Body:
+    ```json
+    { "message": "congratulations! you won a prize" }
+    ```
+- Response (200)
+  ```json
+  {
+    "prediction": [
+      { "label": "spam", "value": 0.92 },
+      { "label": "ham", "value": 0.08 }
+    ]
+  }
+  ```
+- Error
+  - 400: invalid body
+  - 500: internal error
+
+- cURL
+  ```bash
+  curl -s -X POST http://localhost:3001/predict \
+    -H "Content-Type: application/json" \
+    -d '{"message":"congratulations! you won a prize"}' | jq
+  ```
+
+## GET /dashboard
+- Description: Returns a list of training runs (newest first) with metrics.
+- Response (200)
+  ```json
+  [
+    {
+      "id": 1739999999999,
+      "experimentId": 1,
+      "createdAt": "2025-08-19T12:34:56.000Z",
+      "gitCommit": "a1b2c3d",
+      "metrics": { "accuracy": 0.90, "f1Score": 0.90 },
+      "modelArtifactPath": "artifacts/model_1739999999999.json",
+      "isProduction": true
+    }
+  ]
+  ```
+
+## CORS
+- CORS is enabled in the inference service (`@elysiajs/cors`). The dashboard (Vite) can call the API directly in local dev.
+
+## Notes
+- If you get `"Model is not loaded"`, run a training job: `bun --cwd training run train`.
+- Artifacts are JSON files saved under `artifacts/` and referenced from the DB.
diff --git a/infra/ONE-PAGER.md b/infra/ONE-PAGER.md
@@ -0,0 +1,42 @@
+# SpamGuard MLOps — Executive Summary (1‑minute read)
+
+## What it is
+
+A minimal, production‑minded template for spam detection with three services:
+
+- Inference API (Bun + Elysia) serving a Naive Bayes model
+- Training job that promotes the best model automatically
+- Dashboard (React + Mantine) to view runs/metrics
+
+State is simple and local by default:
+
+- SQLite DB at `inference/main.db` via Drizzle ORM
+- Model artifacts (JSON) in `artifacts/`
+
+## Why it matters
+
+- Ship fast: minutes from dataset → trained → served
+- Stay clear: small, readable stack for onboarding and demos
+- Evolve safely: promotion only when metrics improve
+
+## Run in 60 seconds
+
+```sh
+bun install
+bun --cwd inference run db:generate && bun --cwd inference run db:migrate
+bun --cwd training run train
+bun run dev  # API: http://localhost:3001  UI: http://localhost:5173
+```
+
+## KPIs
+
+- Accuracy vs. dataset (proxy for F1)
+- Promotion rule: only promote if accuracy improves
+- Lead time to change: single CI run from train → promote → serve
+
+## Upgrade paths
+
+- DB: SQLite → Postgres
+- Tracking: add MLflow (experiments/artifacts)
+- Ops: add auth, rate limiting, and observability (Prometheus/Grafana)
+- Deploy: use provided Docker/K8s scaffolding and your registry
diff --git a/infra/README.md b/infra/README.md
@@ -3,6 +3,7 @@
 This directory contains infrastructure artifacts to run the monorepo in containerized or orchestrated environments.
 
 Structure:
+
 - docker/
   - Dockerfile.inference: image for the inference service (Elysia/Bun)
   - Dockerfile.dashboard: image for the dashboard (Vite -> Nginx)
@@ -13,6 +14,16 @@ Structure:
   - dashboard-deployment.yaml, dashboard-service.yaml
 
 Notes:
+
 - Model artifacts are centralized under `artifacts/` at the repo root and are mounted into containers.
 - The SQLite database (`inference/main.db`) is shared between training and inference. In compose, the DB is ephemeral for simplicity; you can bind-mount if needed.
 - Images in the k8s directory are placeholders; update them with your registry.
+
+## Docs index
+
+### Start here (essential)
+
+- Executive summary (simple): `infra/ONE-PAGER.md`
+- API reference (quick check): `infra/API.md`
+- Security checklist: `infra/SECURITY.md`
+- Quickstart: see root `README.md` (Local and Docker Compose)
diff --git a/infra/RUNBOOK.md b/infra/RUNBOOK.md
@@ -0,0 +1,49 @@
+# Runbook
+
+Operational guidance to run, observe, and troubleshoot the system.
+
+## Quick run (local)
+- Install deps:
+  ```sh
+  bun install
+  ```
+- Migrate DB (first run or schema changes):
+  ```sh
+  bun --cwd inference run db:generate && bun --cwd inference run db:migrate
+  ```
+- Start API + dashboard:
+  ```sh
+  bun run dev
+  # API: http://localhost:3001  UI: http://localhost:5173
+  ```
+- Train and promote model:
+  ```sh
+  bun --cwd training run train && ls -l artifacts
+  ```
+
+## Docker Compose
+- Up API + dashboard:
+  ```sh
+  docker compose -f infra/docker-compose.yml up -d --build inference dashboard
+  ```
+- One-off training:
+  ```sh
+  docker compose -f infra/docker-compose.yml --profile training run --rm training
+  ```
+- Down:
+  ```sh
+  docker compose -f infra/docker-compose.yml down
+  ```
+
+## Kubernetes (optional)
+- Update images in `infra/k8s/*.yaml` and apply manifests. Replace `emptyDir` with PVCs for persistence.
+
+## Health
+- API: `POST /predict` returns classifications; `GET /dashboard` returns runs
+- UI: open the dashboard URL and check charts load
+
+## Troubleshooting
+- Missing model: run training; verify `isProduction` and artifact path in DB
+- CORS: ensure API at http://localhost:3001 and CORS enabled in `inference/src/index.js`
+- Migrations: rerun generate/migrate; inspect `inference/drizzle/` and `inference/main.db`
+- Artifacts in containers: ensure `../artifacts:/app/artifacts` volume or PVC
diff --git a/infra/SECURITY.md b/infra/SECURITY.md
@@ -0,0 +1,16 @@
+# Security (checklist)
+
+Quick checklist for minimal hardening. Deep-dive: `infra/deep-dive/SECURITY.DEEP.md`.
+
+- AuthN/Z: Protect inference API (API key/JWT). Limit roles for dashboard.
+- Rate limiting: Prevent abuse on `/predict`.
+- Secrets: Never commit. Store in CI/CD secrets/manager.
+- TLS: Encrypt in transit (HTTPS/ingress). Avoid plain HTTP over public networks.
+- Data at rest: If using managed DB/object storage, enable encryption + backups.
+- Logs: Redact PII/tokens. Use structured logs.
+- CORS: Restrict origins in production.
+- Dependencies: Pin/update; enable Dependabot/Renovate.
+- Containers: Minimal images; add `.dockerignore`; scan images.
+- CI/CD: Branch protections, PR reviews, SAST/dep/container scans.
+- Observability: Metrics + alerts for latency, errors, model load failures.
+- Privacy: Document retention/deletion if handling user data.