|
| 1 | +# Container Compliance Tooling |
| 2 | + |
| 3 | +Scripts for generating attribution CSVs from built container images, listing all installed dpkg and Python packages with their SPDX license identifiers where known. |
| 4 | + |
| 5 | +## Output format |
| 6 | + |
| 7 | +Each run produces up to two CSV files: |
| 8 | + |
| 9 | +| Column | Description | |
| 10 | +|--------|-------------| |
| 11 | +| `package_name` | Package name as reported by dpkg or pip | |
| 12 | +| `version` | Installed version | |
| 13 | +| `type` | `dpkg` or `python` | |
| 14 | +| `spdx_license` | SPDX identifier (e.g. `MIT`, `Apache-2.0`) or `UNKNOWN` | |
| 15 | + |
| 16 | +Files are sorted by `(type, package_name)` for stable diffs. |
| 17 | + |
| 18 | +When a base image is provided, a second `_diff.csv` file is written containing only packages that are new or version-changed relative to the base — i.e. what Dynamo's build layers added on top of the upstream image. |
| 19 | + |
| 20 | +## Usage |
| 21 | + |
| 22 | +```bash |
| 23 | +# Full scan, output to stdout |
| 24 | +python container/compliance/generate_attributions.py <image:tag> |
| 25 | + |
| 26 | +# Write to file |
| 27 | +python container/compliance/generate_attributions.py <image:tag> -o attribution.csv |
| 28 | + |
| 29 | +# With base image diff — auto-resolved from context.yaml |
| 30 | +python container/compliance/generate_attributions.py <image:tag> \ |
| 31 | + --framework vllm \ |
| 32 | + --cuda-version 12.9 \ |
| 33 | + -o attribution-vllm-cuda12-amd64.csv |
| 34 | +# Produces: attribution-vllm-cuda12-amd64.csv (full) |
| 35 | +# attribution-vllm-cuda12-amd64_diff.csv (delta from base) |
| 36 | + |
| 37 | +# With explicit base image override |
| 38 | +python container/compliance/generate_attributions.py <image:tag> \ |
| 39 | + --base-image nvcr.io/nvidia/cuda:12.9.1-runtime-ubuntu24.04 \ |
| 40 | + -o attribution.csv |
| 41 | + |
| 42 | +# Frontend image |
| 43 | +python container/compliance/generate_attributions.py <image:tag> \ |
| 44 | + --framework dynamo \ |
| 45 | + --target frontend \ |
| 46 | + -o attribution-frontend-amd64.csv |
| 47 | + |
| 48 | +# dpkg only |
| 49 | +python container/compliance/generate_attributions.py <image:tag> \ |
| 50 | + --types dpkg \ |
| 51 | + -o attribution-dpkg.csv |
| 52 | +``` |
| 53 | + |
| 54 | +### All flags |
| 55 | + |
| 56 | +| Flag | Default | Description | |
| 57 | +|------|---------|-------------| |
| 58 | +| `image` | *(required)* | Container image to scan | |
| 59 | +| `--output`, `-o` | stdout | Output CSV path | |
| 60 | +| `--framework` | — | Auto-resolve base image from `context.yaml` (`vllm`, `sglang`, `trtllm`, `dynamo`) | |
| 61 | +| `--target` | `runtime` | Build target for base resolution (`runtime` or `frontend`) | |
| 62 | +| `--cuda-version` | — | CUDA version for base resolution (e.g. `12.9`, `13.0`, `13.1`) | |
| 63 | +| `--base-image` | — | Explicit base image URI (overrides `--framework` auto-resolve) | |
| 64 | +| `--context-yaml` | `container/context.yaml` | Path to context.yaml | |
| 65 | +| `--types` | `dpkg,python` | Comma-separated list of types to extract | |
| 66 | +| `--docker-cmd` | `docker` | Docker binary to use | |
| 67 | +| `--verbose`, `-v` | — | Enable verbose logging to stderr | |
| 68 | + |
| 69 | +## Base image reference |
| 70 | + |
| 71 | +| Framework | CUDA | Base image | |
| 72 | +|-----------|------|------------| |
| 73 | +| `vllm` | 12.9 | `nvcr.io/nvidia/cuda:12.9.1-runtime-ubuntu24.04` | |
| 74 | +| `vllm` | 13.0 | `nvcr.io/nvidia/cuda:13.0.2-runtime-ubuntu24.04` | |
| 75 | +| `sglang` | 12.9 | `lmsysorg/sglang:v0.5.9-runtime` | |
| 76 | +| `sglang` | 13.0 | `lmsysorg/sglang:v0.5.9-cu130-runtime` | |
| 77 | +| `trtllm` | 13.1 | `nvcr.io/nvidia/cuda-dl-base:25.12-cuda13.1-runtime-ubuntu24.04` | |
| 78 | +| `dynamo` frontend | — | `nvcr.io/nvidia/base/ubuntu:noble-20250619` | |
| 79 | + |
| 80 | +These values are sourced from `container/context.yaml` at runtime; the table above reflects the current defaults. |
| 81 | + |
| 82 | +## How it works |
| 83 | + |
| 84 | +The script runs two lightweight helper scripts **inside the container** via `docker run --rm -v`: |
| 85 | + |
| 86 | +- **dpkg extractor** — runs `dpkg-query` to list packages, then reads `/usr/share/doc/<pkg>/copyright` files for license info. Only DEP-5 machine-readable copyright files are parsed; ambiguous cases return `UNKNOWN`. |
| 87 | +- **Python extractor** — uses `importlib.metadata.distributions()` to iterate installed packages. License is read from `License-Expression` (PEP 639), then `License` metadata, then trove classifiers. Ambiguous cases return `UNKNOWN`. |
| 88 | + |
| 89 | +Both helpers are self-contained and have no external dependencies — they run with whatever Python is in the container. |
| 90 | + |
| 91 | +## License detection |
| 92 | + |
| 93 | +Detection is intentionally conservative: only unambiguous matches are assigned SPDX identifiers. The `UNKNOWN` entries are expected; they can be resolved with additional analysis against the raw copyright files. |
| 94 | + |
| 95 | +## CI integration |
| 96 | + |
| 97 | +Attribution CSVs are generated automatically as part of CI after every successful image build. Artifacts are available in the GitHub Actions workflow run under: |
| 98 | +- `compliance-{framework}-cuda{major}-{platform}` — runtime images |
| 99 | +- `compliance-frontend-{arch}` — frontend image |
| 100 | + |
| 101 | +The scan runs as a separate lightweight job (`prod-default-small-v2`) in parallel with tests, so it does not extend pipeline wall time. |
| 102 | + |
| 103 | +## Requirements |
| 104 | + |
| 105 | +- Python 3.11+ |
| 106 | +- `docker` (or compatible CLI) with access to the target registry |
| 107 | +- `pyyaml` — only required on the host when using `--framework`/`--cuda-version` base image auto-resolution (`pip install pyyaml`) |
0 commit comments