|
| 1 | +# Deployment & Operations Guide |
| 2 | + |
| 3 | +This guide outlines supported deployment patterns for Provenance, from local Docker runs to production-ready Kubernetes clusters, and highlights operational concerns such as scaling detectors, managing secrets, and monitoring. |
| 4 | + |
| 5 | +## Architecture Overview |
| 6 | + |
| 7 | +Core services: |
| 8 | + |
| 9 | +- **API** – FastAPI application (ASGI) served by `uvicorn`. |
| 10 | +- **Redis** – Primary datastore for analyses, findings, and decisions. |
| 11 | +- **Optional analytics sinks** – ClickHouse, Snowflake, BigQuery, or file-based JSONL exports. |
| 12 | +- **Optional observability** – Prometheus/OTLP exporters for metrics. |
| 13 | + |
| 14 | +Background work (detector execution, governance, analytics) happens inline today; no external workers are required. |
| 15 | + |
| 16 | +## Local & Docker Compose |
| 17 | + |
| 18 | +Use Docker when validating changes locally or running against mocked dependencies: |
| 19 | + |
| 20 | +```bash |
| 21 | +docker compose up --build |
| 22 | +``` |
| 23 | + |
| 24 | +The compose stack includes: |
| 25 | + |
| 26 | +- API container (`provenance-api`) exposing `8000`. |
| 27 | +- Redis (`redis:7-alpine`) with a persistent volume. |
| 28 | +- Optional ClickHouse (if you enable `docker-compose.clickhouse.yml`). |
| 29 | + |
| 30 | +Override environment variables in `.env` or `docker-compose.override.yml`. See the [Configuration Reference](configuration.md) for available settings. |
| 31 | + |
| 32 | +## Container Image |
| 33 | + |
| 34 | +Build a production image with: |
| 35 | + |
| 36 | +```bash |
| 37 | +docker build -t your-registry/provenance:<tag> . |
| 38 | +``` |
| 39 | + |
| 40 | +Key build arguments: |
| 41 | + |
| 42 | +- `UV_LOCKFILE=uv.lock` – Install pinned dependencies. |
| 43 | +- `TARGET_ENV=production` – (Optional) adjust if you customize the Dockerfile stages. |
| 44 | + |
| 45 | +Run the container: |
| 46 | + |
| 47 | +```bash |
| 48 | +docker run --rm \ |
| 49 | + -p 8000:8000 \ |
| 50 | + -e PROVENANCE_REDIS_URL=redis://host.docker.internal:6379/0 \ |
| 51 | + your-registry/provenance:<tag> |
| 52 | +``` |
| 53 | + |
| 54 | +## Kubernetes (Helm/Manifests) |
| 55 | + |
| 56 | +There is no bundled Helm chart yet, but a basic deployment involves: |
| 57 | + |
| 58 | +```yaml |
| 59 | +apiVersion: apps/v1 |
| 60 | +kind: Deployment |
| 61 | +metadata: |
| 62 | + name: provenance-api |
| 63 | +spec: |
| 64 | + replicas: 2 |
| 65 | + selector: |
| 66 | + matchLabels: |
| 67 | + app: provenance-api |
| 68 | + template: |
| 69 | + metadata: |
| 70 | + labels: |
| 71 | + app: provenance-api |
| 72 | + spec: |
| 73 | + containers: |
| 74 | + - name: api |
| 75 | + image: your-registry/provenance:<tag> |
| 76 | + imagePullPolicy: IfNotPresent |
| 77 | + ports: |
| 78 | + - name: http |
| 79 | + containerPort: 8000 |
| 80 | + envFrom: |
| 81 | + - configMapRef: |
| 82 | + name: provenance-config |
| 83 | + - secretRef: |
| 84 | + name: provenance-secrets |
| 85 | + readinessProbe: |
| 86 | + httpGet: |
| 87 | + path: /healthz |
| 88 | + port: http |
| 89 | + initialDelaySeconds: 10 |
| 90 | + periodSeconds: 10 |
| 91 | + livenessProbe: |
| 92 | + httpGet: |
| 93 | + path: /healthz |
| 94 | + port: http |
| 95 | + initialDelaySeconds: 30 |
| 96 | + periodSeconds: 30 |
| 97 | + resources: |
| 98 | + requests: |
| 99 | + cpu: 250m |
| 100 | + memory: 512Mi |
| 101 | + limits: |
| 102 | + cpu: 1 |
| 103 | + memory: 1Gi |
| 104 | +``` |
| 105 | +
|
| 106 | +- Provide Redis as a managed service (e.g., AWS Elasticache) and set `PROVENANCE_REDIS_URL` accordingly. |
| 107 | +- Mount ConfigMaps/Secrets for policy thresholds, API tokens, signing keys, and GitHub credentials. |
| 108 | +- Use a HorizontalPodAutoscaler to scale API pods based on CPU or custom metrics. |
| 109 | + |
| 110 | +### Ingress & TLS |
| 111 | + |
| 112 | +- Expose the API via an ingress controller (NGINX, Traefik, ALB). |
| 113 | +- Terminate TLS at the ingress or use a service mesh (Linkerd, Istio). Ensure `PROVENANCE_SERVICE_BASE_URL` matches the external HTTPS endpoint. |
| 114 | + |
| 115 | +## Scaling Considerations |
| 116 | + |
| 117 | +- **Detector Throughput** – Detector execution happens synchronously per request. Increase pod count to parallelize analyses, or shard workflows by repo/team. Monitoring request latency via Prometheus helps identify bottlenecks. |
| 118 | +- **Redis Capacity** – Tune persistence and memory policy. For large analyses, configure snapshotting and `maxmemory-policy` (e.g., `volatile-lru`) to avoid eviction of hot keys. |
| 119 | +- **Background Tasks** – FastAPI `BackgroundTasks` are used for asynchronous operations (analytics writes). Ensure pods have enough CPU headroom to handle background work without delaying responses. |
| 120 | +- **Analytics Warehouse** – When using ClickHouse/Snowflake/BigQuery, provision connectivity (service accounts, network policies) and monitor ingest failure logs. |
| 121 | + |
| 122 | +## Observability |
| 123 | + |
| 124 | +- Enable Prometheus exporter by installing the `opentelemetry-exporter-prometheus` package and setting `PROVENANCE_OTEL_ENABLED=true`, `PROVENANCE_OTEL_EXPORTER=prometheus`. |
| 125 | +- Scrape `/metrics` and create alerts on: |
| 126 | + - Request latency (P95 > SLO). |
| 127 | + - Detector capability mismatches. |
| 128 | + - Decision outcome imbalance (e.g., spike in `block`). |
| 129 | +- For OTLP, configure `PROVENANCE_OTEL_ENDPOINT` and deploy a collector. |
| 130 | + |
| 131 | +## Secrets Management |
| 132 | + |
| 133 | +- Store API tokens, signing keys, and GitHub credentials in Kubernetes Secrets, HashiCorp Vault, AWS Secrets Manager, etc. |
| 134 | +- Encode Ed25519 signing keys in base64 before storing (matches app expectations). |
| 135 | +- Rotate secrets regularly and redeploy pods to refresh environment variables. |
| 136 | + |
| 137 | +## Disaster Recovery |
| 138 | + |
| 139 | +- Redis is the system of record for analyses. Enable AOF/RDB snapshots and backup to durable storage. |
| 140 | +- Export DSSE decision bundles to long-term storage (S3, GCS) via CI to preserve audit trails. |
| 141 | +- For analytics warehouses, rely on built-in backups; events can always be regenerated by replaying DSSE bundles and analysis inputs if needed. |
| 142 | + |
| 143 | +## Deployment Checklist |
| 144 | + |
| 145 | +1. Configure `PROVENANCE_*` variables (see [Configuration Reference](configuration.md)). |
| 146 | +2. Provision Redis with sufficient memory and persistence. |
| 147 | +3. Deploy API (Docker/K8s) with health checks and readiness probes. |
| 148 | +4. Configure ingress/TLS and update `PROVENANCE_SERVICE_BASE_URL`. |
| 149 | +5. Wire CI to submit analyses (see [CI Integration Guide](ci-integration.md)). |
| 150 | +6. Enable observability exporters and set up dashboards/alerts. |
| 151 | +7. Archive DSSE bundles and SARIF outputs for compliance/audits. |
0 commit comments