Skip to content

Commit 507b45b

Browse files
committed
added doc about ci/cd
1 parent 378a9bb commit 507b45b

File tree

2 files changed

+177
-0
lines changed

2 files changed

+177
-0
lines changed

docs/operations/cicd.md

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
# CI/CD Pipeline
2+
3+
The project uses GitHub Actions to automate code quality checks, security scanning, testing, and documentation
4+
deployment. Every push to `main` or `dev` and every pull request triggers the pipeline, with each workflow running in
5+
parallel to provide fast feedback.
6+
7+
## Pipeline overview
8+
9+
```mermaid
10+
graph LR
11+
subgraph "Code Quality"
12+
Ruff["Ruff Linting"]
13+
MyPy["MyPy Type Check"]
14+
end
15+
16+
subgraph "Security"
17+
Bandit["Bandit SAST"]
18+
Trivy["Trivy Container Scan"]
19+
end
20+
21+
subgraph "Testing"
22+
Integration["Integration Tests"]
23+
end
24+
25+
subgraph "Documentation"
26+
Docs["MkDocs Build"]
27+
Pages["GitHub Pages"]
28+
end
29+
30+
Push["Push / PR"] --> Ruff
31+
Push --> MyPy
32+
Push --> Bandit
33+
Push --> Trivy
34+
Push --> Integration
35+
Push --> Docs
36+
Docs -->|main only| Pages
37+
```
38+
39+
All workflows trigger on pushes to `main` and `dev` branches, pull requests against those branches, and can be triggered
40+
manually via `workflow_dispatch`. The documentation workflow additionally filters on path changes to avoid unnecessary
41+
rebuilds.
42+
43+
## Linting and type checking
44+
45+
Two lightweight workflows run first since they catch obvious issues quickly.
46+
47+
The linting workflow installs dependencies with [uv](https://docs.astral.sh/uv/) and
48+
runs [Ruff](https://docs.astral.sh/ruff/) against the backend codebase. Ruff checks for style violations, import
49+
ordering, and common bugs in a single pass. The configuration lives in `pyproject.toml` under `[tool.ruff]`, selecting
50+
rules from the E, F, B, I, and W categories.
51+
52+
The type checking workflow runs [mypy](https://mypy.readthedocs.io/) with strict settings. It catches type mismatches,
53+
missing return types, and incorrect function signatures before they reach production. Both workflows use uv's dependency
54+
caching to skip reinstallation when the lockfile hasn't changed.
55+
56+
## Security scanning
57+
58+
Security runs in two places. The security workflow uses [Bandit](https://bandit.readthedocs.io/) to perform static
59+
analysis on Python source files, flagging issues like hardcoded credentials, SQL injection patterns, and unsafe
60+
deserialization. It excludes the test directory and reports only medium-severity and above findings.
61+
62+
The Docker workflow builds the backend image and scans it with [Trivy](https://trivy.dev/). Trivy checks the image
63+
layers for known vulnerabilities in OS packages and Python dependencies, failing the build if it finds any critical or
64+
high severity issues that have available fixes. This catches supply chain problems that static analysis would miss.
65+
66+
## Docker build
67+
68+
The Docker workflow builds images using a two-stage approach to optimize layer caching. First it builds a shared base
69+
image (`Dockerfile.base`) containing Python, system dependencies, and all pip packages. Then it builds the main backend
70+
image on top of that base, copying only the application code. This separation means dependency changes rebuild the base
71+
layer while code changes only rebuild the thin application layer.
72+
73+
The base image includes gcc, curl, and compression libraries needed by some Python packages. It
74+
uses [uv](https://docs.astral.sh/uv/) to install dependencies from the lockfile, ensuring reproducible builds across
75+
environments. The pinned uv version (currently 0.9.17) prevents unexpected behavior from upstream changes.
76+
77+
## Integration tests
78+
79+
The integration test workflow is the most complex. It spins up the entire stack on a GitHub Actions runner to verify
80+
that services work together correctly.
81+
82+
```mermaid
83+
sequenceDiagram
84+
participant GHA as GitHub Actions
85+
participant K3s as K3s Cluster
86+
participant Docker as Docker Compose
87+
participant Tests as pytest
88+
89+
GHA->>K3s: Install k3s
90+
GHA->>Docker: Pre-pull base images
91+
GHA->>Docker: Build services (bake)
92+
GHA->>Docker: Start compose stack
93+
Docker->>Docker: Wait for health checks
94+
GHA->>Tests: Run pytest
95+
Tests->>Docker: HTTP requests
96+
Tests-->>GHA: Coverage report
97+
GHA->>GHA: Upload to Codecov
98+
```
99+
100+
The workflow starts by installing [k3s](https://k3s.io/), a lightweight Kubernetes distribution, so the backend can
101+
interact with a real cluster during tests. It pre-pulls container images in parallel to avoid cold-start delays during
102+
the build step.
103+
104+
Before building, the workflow modifies `docker-compose.yaml` using [yq](https://github.com/mikefarah/yq) to create a
105+
CI-specific configuration. These modifications disable SASL authentication on Kafka and Zookeeper (unnecessary for
106+
isolated CI), remove volume mounts that cause permission conflicts, inject test credentials for MongoDB, and disable
107+
OpenTelemetry export to avoid connection errors. The result is a `docker-compose.ci.yaml` that works reliably in the
108+
ephemeral CI environment.
109+
110+
The [docker/bake-action](https://github.com/docker/bake-action) builds all services with GitHub Actions cache support.
111+
It reads cache layers from previous runs and writes new layers back, so unchanged dependencies don't rebuild. The cache
112+
scopes are branch-specific with a fallback to main, meaning feature branches benefit from the main branch cache even on
113+
their first run.
114+
115+
Once images are built, `docker compose up` starts the stack and waits for health checks. The backend needs MongoDB,
116+
Redis, Kafka, Schema Registry, and the cert-generator to be ready before it can serve requests. After stabilization, the
117+
workflow runs pytest against the integration and unit test suites with coverage reporting. Test isolation uses
118+
per-worker database names and schema registry prefixes to avoid conflicts when pytest-xdist runs tests in parallel.
119+
120+
Coverage reports go to [Codecov](https://codecov.io/) for tracking over time. The workflow always collects container
121+
logs and Kubernetes events as artifacts, which helps debug failures without reproducing them locally.
122+
123+
## Documentation
124+
125+
The docs workflow builds this documentation site using [MkDocs](https://www.mkdocs.org/) with
126+
the [Material theme](https://squidfunk.github.io/mkdocs-material/). It triggers only when files under `docs/`,
127+
`mkdocs.yml`, or the workflow itself change, avoiding rebuilds for unrelated commits.
128+
129+
Before building, the workflow fetches the current OpenAPI spec from the production API and injects it into the docs
130+
directory. The [swagger-ui-tag](https://github.com/blueswen/mkdocs-swagger-ui-tag) plugin renders this spec as an
131+
interactive API reference.
132+
133+
On pushes to main, the workflow deploys the built site to GitHub Pages. Pull requests only build without deploying, so
134+
you can verify the build succeeds before merging. The deployment uses GitHub's native Pages action with artifact
135+
uploads, which handles cache invalidation and atomic deployments automatically.
136+
137+
## Running locally
138+
139+
You can run most checks locally before pushing.
140+
141+
```bash
142+
cd backend
143+
144+
# Linting
145+
uv run ruff check .
146+
147+
# Type checking
148+
uv run mypy .
149+
150+
# Security scan
151+
uv tool run bandit -r . -x tests/ -ll
152+
153+
# Unit tests only (fast)
154+
uv run pytest tests/unit -v
155+
156+
# Full integration tests (requires docker compose up)
157+
uv run pytest tests/integration tests/unit -v
158+
```
159+
160+
For the full integration test experience, start the stack with `docker compose up -d`, wait for services to stabilize,
161+
then run pytest. The CI workflow's yq modifications aren't necessary locally since your environment likely has the
162+
expected configuration already.
163+
164+
## Workflow files
165+
166+
| Workflow | File | Purpose |
167+
|---------------------|----------------------------------|------------------------------|
168+
| Ruff Linting | `.github/workflows/ruff.yml` | Code style and import checks |
169+
| MyPy Type Checking | `.github/workflows/mypy.yml` | Static type analysis |
170+
| Security Scanning | `.github/workflows/security.yml` | Bandit SAST |
171+
| Docker Build & Scan | `.github/workflows/docker.yml` | Image build and Trivy scan |
172+
| Integration Tests | `.github/workflows/tests.yml` | Full stack testing |
173+
| Documentation | `.github/workflows/docs.yml` | MkDocs build and deploy |
174+
175+
All workflows use [uv](https://docs.astral.sh/uv/) for Python dependency management, with caching enabled via
176+
`astral-sh/setup-uv`. The lockfile at `backend/uv.lock` ensures reproducible installs across CI runs.

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,7 @@ nav:
120120
- Schema Manager: components/schema-manager.md
121121

122122
- Operations:
123+
- CI/CD Pipeline: operations/cicd.md
123124
- Tracing: operations/tracing.md
124125
- Metrics:
125126
- Context Variables: operations/metrics-contextvars.md

0 commit comments

Comments
 (0)