Skip to content

Commit 883fa29

Browse files
Add sbom generation tooling (#2232)
1 parent 24fe90f commit 883fa29

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+12233
-2
lines changed

.bazelrc

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
common --registry=https://raw.githubusercontent.com/eclipse-score/bazel_registry/main/
2+
common --registry=https://bcr.bazel.build
3+
4+
build --java_language_version=17
5+
build --tool_java_language_version=17
6+
build --java_runtime_version=remotejdk_17
7+
build --tool_java_runtime_version=remotejdk_17

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
bazel-*
2+
MODULE.bazel.lock
3+
4+
__pycache__

BUILD.bazel

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# SBOM Generation Package
2+
#
3+
# This package provides Bazel-native SBOM (Software Bill of Materials) generation
4+
# using module extensions and aspects.
5+
#
6+
# Public API:
7+
# - load("@score_sbom//:defs.bzl", "sbom")
8+
# - use_extension("@score_sbom//:extensions.bzl", "sbom_metadata")
9+
10+
load("@rules_python//python:defs.bzl", "py_library")
11+
12+
package(default_visibility = ["//visibility:public"])
13+
14+
exports_files([
15+
"defs.bzl",
16+
"extensions.bzl",
17+
"cpp_metadata.json",
18+
"crates_metadata.json",
19+
])
20+
21+
# Filegroup for all SBOM-related bzl files
22+
filegroup(
23+
name = "bzl_files",
24+
srcs = [
25+
"defs.bzl",
26+
"extensions.bzl",
27+
"//internal:bzl_files",
28+
],
29+
)
30+
31+
# npm wrapper (uses system-installed npm from PATH)
32+
sh_binary(
33+
name = "npm_wrapper",
34+
srcs = ["npm_wrapper.sh"],
35+
)

MODULE.bazel

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# *******************************************************************************
2+
# Copyright (c) 2025 Contributors to the Eclipse Foundation
3+
#
4+
# See the NOTICE file(s) distributed with this work for additional
5+
# information regarding copyright ownership.
6+
#
7+
# This program and the accompanying materials are made available under the
8+
# terms of the Apache License Version 2.0 which is available at
9+
# https://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# SPDX-License-Identifier: Apache-2.0
12+
# *******************************************************************************
13+
14+
module(
15+
name = "score_sbom",
16+
version = "0.0.1",
17+
compatibility_level = 1,
18+
)
19+
20+
bazel_dep(name = "rules_python", version = "1.4.1")
21+
22+
PYTHON_VERSION = "3.12"
23+
24+
python = use_extension("@rules_python//python/extensions:python.bzl", "python")
25+
python.toolchain(
26+
is_default = True,
27+
python_version = PYTHON_VERSION,
28+
)
29+
30+
# score_tooling provides score_py_pytest test infrastructure (dev only)
31+
bazel_dep(name = "score_tooling", dev_dependency = True)
32+
33+
local_path_override(
34+
module_name = "score_tooling",
35+
path = "../tooling",
36+
)

README.md

Lines changed: 213 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,213 @@
1-
# sbom-tool
2-
Home of the SBOM generation tool
1+
# About
2+
3+
SBOM tooling gives a set of bazel rules that generates a Software Bill of Materials
4+
in SPDX 2.3 and CycloneDX 1.6 format for a given Bazel target.
5+
6+
# Setup
7+
8+
## 1. Configure MODULE.bazel
9+
10+
Add the SBOM metadata extension in your **root** MODULE.bazel:
11+
12+
```starlark
13+
sbom_ext = use_extension("@score_sbom//:extensions.bzl", "sbom_metadata")
14+
use_repo(sbom_ext, "sbom_metadata")
15+
```
16+
17+
**For modules using `local_path_override` or `git_override`**, also add a `track_module` tag for each such module. Without this, their versions cannot be auto-detected and will appear as `unknown` in the SBOM. The extension reads the version directly from the module's own `MODULE.bazel` file:
18+
19+
```starlark
20+
sbom_ext = use_extension("@score_sbom//:extensions.bzl", "sbom_metadata")
21+
sbom_ext.track_module(name = "score_baselibs")
22+
sbom_ext.track_module(name = "score_kyron")
23+
use_repo(sbom_ext, "sbom_metadata")
24+
```
25+
26+
## 2. Add SBOM Target in BUILD
27+
28+
```starlark
29+
load("@score_sbom//:defs.bzl", "sbom")
30+
31+
sbom(
32+
name = "my_sbom",
33+
targets = ["//my/app:binary"],
34+
component_name = "my_application",
35+
component_version = "1.0.0",
36+
module_lockfiles = [
37+
"@score_crates//:MODULE.bazel.lock",
38+
":MODULE.bazel.lock",
39+
],
40+
auto_crates_cache = True,
41+
auto_cdxgen = True,
42+
)
43+
```
44+
45+
### Parameters
46+
47+
| Parameter | Default | Description |
48+
|---|---|---|
49+
| `name` | *(required)* | Rule name; also used as the output filename prefix (e.g. `my_sbom``my_sbom.spdx.json`). |
50+
| `targets` | *(required)* | Bazel targets whose transitive dependencies are included in the SBOM. |
51+
| `component_name` | rule `name` | Name of the root component written into the SBOM; defaults to the rule name if omitted. |
52+
| `component_version` | `None` | Version string for the root component; auto-detected from the module graph when omitted. |
53+
| `module_lockfiles` | `[]` | One or more `MODULE.bazel.lock` files used to extract dependency versions and SHA-256 checksums; C++ projects need only the workspace lockfile (`:MODULE.bazel.lock`), Rust projects should also pass `@score_crates//:MODULE.bazel.lock` to cover crate versions and checksums. |
54+
| `auto_crates_cache` | `True` | Runs `generate_crates_metadata_cache` at build time (requires network) to fetch Rust crate license and supplier data from dash-license-scan and crates.io; set to `False` only as a workaround for air-gapped or offline build environments — doing so produces a non-compliant SBOM where all Rust crates show `NOASSERTION` for license, supplier, and description. Has no effect when no lockfiles are provided (pure C++ projects). |
55+
| `cargo_lockfile` | `None` | Path to a `Cargo.lock` file for crate enumeration; not needed when `module_lockfiles` is provided, as a synthetic `Cargo.lock` is generated from it automatically. **Deprecated — will be removed in a future release.** |
56+
| `cdxgen_sbom` | `None` | Label to a pre-generated cdxgen CycloneDX JSON file; alternative to `auto_cdxgen` for C++ projects where cdxgen cannot run inside the Bazel build (e.g. CI environment without npm). Run cdxgen manually and pass its output here. Ignored for pure Rust projects. |
57+
| `auto_cdxgen` | `False` | Runs cdxgen automatically inside the Bazel build (requires npm + `@cyclonedx/cdxgen` installed on the build machine); alternative to `cdxgen_sbom` for C++ projects. Uses `no-sandbox` execution to scan the source tree. Ignored for pure Rust projects. |
58+
| `output_formats` | `["spdx", "cyclonedx"]` | List of output formats to generate; valid values are `"spdx"` and `"cyclonedx"`. |
59+
| `producer_name` | `"Eclipse Foundation"` | Organisation name recorded as the SBOM producer. |
60+
| `producer_url` | Eclipse S-CORE URL | URL of the SBOM producer organisation. |
61+
| `sbom_authors` | `None` | List of author strings written into SBOM metadata; defaults to `producer_name` when omitted. |
62+
| `namespace` | `https://eclipse.dev/score` | URI used as the SPDX document namespace and CycloneDX serial number base. |
63+
| `generation_context` | `None` | CycloneDX lifecycle phase label (e.g. `"build"`, `"release"`). |
64+
| `sbom_tools` | `None` | List of tool name strings recorded in SBOM metadata alongside the generator itself. |
65+
| `exclude_patterns` | `None` | List of repo name substrings to exclude from the dependency graph (e.g. build tools, test frameworks). |
66+
| `dep_module_files` | `None` | `MODULE.bazel` files from dependency modules used for additional automatic version extraction. |
67+
| `metadata_json` | `@sbom_metadata//:metadata.json` | Label to the metadata JSON produced by the `sbom_metadata` Bazel extension; rarely needs changing. |
68+
69+
## 3. Install Prerequisites
70+
71+
**Rust crate metadata** (`auto_crates_cache = True`):
72+
73+
```bash
74+
curl -LsSf https://astral.sh/uv/install.sh | sh
75+
sudo apt install openjdk-11-jre-headless # or equivalent for your distro
76+
```
77+
78+
**C++ dependency scanning** (`auto_cdxgen = True`):
79+
80+
```bash
81+
nvm install 20
82+
npm install -g @cyclonedx/cdxgen
83+
```
84+
85+
Set `auto_cdxgen = False` if cdxgen is not available.
86+
87+
## 4. Build
88+
89+
```bash
90+
bazel build //:my_sbom
91+
```
92+
93+
## 5. Output
94+
95+
Generated in `bazel-bin/`:
96+
97+
- `my_sbom.spdx.json` — SPDX 2.3
98+
- `my_sbom.cdx.json` — CycloneDX 1.6
99+
- `my_sbom_crates_metadata.json` — Rust crate cache (if `auto_crates_cache = True`)
100+
- `my_sbom_cdxgen.cdx.json` — C++ scan output (if `auto_cdxgen = True`)
101+
102+
---
103+
104+
## Architecture
105+
106+
```
107+
┌──────────────────┐
108+
│ Bazel build │
109+
└────────┬─────────┘
110+
111+
┌───────────────┼───────────────┐
112+
│ │ │
113+
v v v
114+
MODULE.bazel Bazel targets Lockfiles
115+
│ │ │
116+
v v v
117+
metadata.json _deps.json License + metadata
118+
(module versions) (dep graph, (dash-license-scan
119+
dep edges) + crates.io API
120+
│ │ + cdxgen)
121+
└───────────────┼───────────────┘
122+
123+
v
124+
┌──────────────────┐
125+
│ sbom_generator │
126+
│ (match & resolve)│
127+
└────────┬─────────┘
128+
129+
┌────────┴────────┐
130+
v v
131+
.spdx.json .cdx.json
132+
```
133+
134+
**Data sources:**
135+
- **Bazel module graph** — version, PURL, and registry info for `bazel_dep` modules
136+
- **Bazel aspect** — transitive dependency graph and external repo dependency edges
137+
- **dash-license-scan** — licenses data
138+
- **crates.io API** — description and supplier for Rust crates
139+
- **cdxgen** — C++ dependency licenses, descriptions, and suppliers
140+
141+
### Automated Metadata Sources
142+
143+
All license, hash, supplier, and description values are derived from automated sources: `MODULE.bazel.lock`, `http_archive` rules, dash-license-scan (Rust), crates.io API (Rust), and cdxgen (C++). Cache files such as `cpp_metadata.json` must never be hand-edited.
144+
145+
CPE, aliases, and pedigree are the only fields that may be set manually via `sbom_ext.license()`, as they represent identity and provenance annotations that cannot be auto-deduced.
146+
147+
### Required SBOM Fields (CISA 2025)
148+
149+
Every component entry in the generated SBOM must include the following fields, as mandated by CISA 2025 minimum elements:
150+
151+
| Field | SPDX 2.3 | CycloneDX 1.6 | Source | Description |
152+
|---|---|---|---|---|
153+
| Component name | `name` | `components[].name` | Extracted | Human-readable name of the dependency (e.g. `serde`, `boost.mp11`). |
154+
| Component version | `versionInfo` | `components[].version` | Extracted | Exact released version string used in the build. |
155+
| Component hash (SHA-256) | `checksums[SHA256]` | `components[].hashes` | Extracted | SHA-256 digest of the downloaded archive, sourced from `MODULE.bazel.lock` or the `http_archive` `sha256` field. |
156+
| Software identifier (PURL) | `externalRefs[purl]` | `components[].purl` | Extracted | Package URL uniquely identifying the component by ecosystem, name, and version (e.g. `pkg:cargo/serde@1.0.228`). |
157+
| License expression | `licenseConcluded` | `components[].licenses` | Extracted | SPDX license expression concluded for this component (e.g. `Apache-2.0 OR MIT`). |
158+
| Dependency relationships | `relationships[DEPENDS_ON]` | `dependencies` | Extracted | Graph edges recording which component depends on which, enabling consumers to reason about transitive exposure. |
159+
| Supplier | `supplier` | `components[].supplier.name` | Extracted | Organisation or individual that distributes the component (e.g. the crates.io publisher name). |
160+
| Component description | `description` | `components[].description` | Extracted | Short human-readable summary of what the component does; set to `"Missing"` when no source can provide it. |
161+
| SBOM author | `creationInfo.creators` | `metadata.authors` | Configured | Entity responsible for producing this SBOM document; set via `producer_name` in the `sbom()` rule (default: Eclipse Foundation). |
162+
| Tool name | `creationInfo.creators` | `metadata.tools` | Auto-generated | Name and version of the tool that generated the SBOM. |
163+
| Timestamp | `creationInfo.created` | `metadata.timestamp` | Auto-generated | ISO-8601 UTC timestamp recording when the SBOM was generated. |
164+
| Generation context (lifecycle) || `metadata.lifecycles` | Auto-generated | CycloneDX lifecycle phase at which the SBOM was produced (e.g. `build`). |
165+
166+
Legend: **Extracted** — derived automatically from the Bazel dependency graph, lockfiles, or external registries (crates.io, cdxgen). **Configured** — comes from an `sbom()` rule parameter with a sensible default. **Auto-generated** — computed at build time with no user input required.
167+
168+
Fields are populated automatically from the sources described in [Automated Metadata Sources](#automated-metadata-sources) and [License Data by Language](#license-data-by-language). If a source cannot provide a value (e.g. cdxgen cannot resolve a C++ component), the field is omitted rather than filled with incorrect data — except for description, which is set to `"Missing"` to make the gap visible.
169+
170+
### Component Scope
171+
172+
Only transitive dependencies of the declared build targets are included. Build-time tools (compilers, build systems, test frameworks) are excluded via `exclude_patterns`.
173+
174+
### Component Hash Source
175+
176+
SHA-256 checksums come exclusively from `MODULE.bazel.lock` `registryFileHashes` (BCR modules) or the `sha256` field of `http_archive` rules. If neither provides a checksum, the hash field is omitted rather than emitting an incorrect value.
177+
178+
### License Data by Language
179+
180+
- **Rust**: Licenses via dash-license-scan (Eclipse Foundation + ClearlyDefined); descriptions and suppliers from crates.io API. Crates with platform-specific suffixes (e.g. `iceoryx2-bb-lock-free-qnx8`) fall back to the base crate name for lookup.
181+
- **C++**: Licenses, descriptions, and suppliers via cdxgen source tree scan. There is no dash-license-scan integration for C++ — it does not support `pkg:generic/...` PURLs used by BCR modules. If cdxgen cannot resolve a component, its description is set to `"Missing"` and its license field is empty.
182+
183+
### Output Format Versions
184+
185+
- **SPDX 2.3**: Migration to SPDX 3.0 is deferred until supported in production by at least one major consumer (Trivy, GitHub Dependabot, or Grype). As of early 2026, none support it and the reference Python library marks its own 3.0 support as experimental. `LicenseRef-*` identifiers are declared in `hasExtractedLicensingInfos` as required by SPDX 2.3; supplier is emitted as `Organization: <name>`.
186+
- **CycloneDX 1.6**: Emitted with `"specVersion": "1.6"` and `"$schema": "http://cyclonedx.org/schema/bom-1.6.schema.json"`.
187+
188+
189+
## How design is tested
190+
191+
To run tests
192+
```bash
193+
# From sbom-tool/ — run all SBOM tests
194+
bazel test //tests/...
195+
```
196+
197+
Sbom was also tested by external tool
198+
https://sbomgenerator.com/tools/validator
199+
200+
#### Tests description
201+
202+
| Test file | Bazel target | What it covers |
203+
|---|---|---|
204+
| `test_bcr_known_licenses.py` | `test_bcr_known_licenses` | `BCR_KNOWN_LICENSES` table; `apply_known_licenses()` priority chain (5 levels); `resolve_component()` integration after license resolution |
205+
| `test_cpp_enrich_checksum.py` | `test_cpp_enrich_checksum` | `enrich_components_from_cpp_cache()` field propagation (checksum, normalised names, parent match); no-manual-curation rule on `cpp_metadata.json` |
206+
| `test_cyclonedx_formatter.py` | `test_cyclonedx_formatter` | CycloneDX 1.6 document structure; license encoding (single id vs compound expression); `or`/`and` normalisation; dependency graph; `_normalize_spdx_license()` |
207+
| `test_spdx_formatter.py` | `test_spdx_formatter` | SPDX 2.3 document structure; PURL as externalRef; SHA-256 checksums; DESCRIBES/DEPENDS_ON relationships; `hasExtractedLicensingInfos` for `LicenseRef-*`; `_normalize_spdx_license()` |
208+
| `test_sbom_generator.py` | `test_sbom_generator` | `filter_repos()`; `resolve_component()` (all 8 repo-type branches); `deduplicate_components()`; `parse_module_bazel_files()`; `parse_module_lockfiles()`; `mark_missing_cpp_descriptions()`; `main()` end-to-end (15 scenarios: SPDX/CycloneDX output, BCR licenses, crate_universe, exclude patterns, version auto-detect, dep_module_files, module_lockfiles, --crates-cache, --cdxgen-sbom, output file selection) |
209+
| `test_generate_crates_metadata_cache.py` | `test_generate_crates_metadata_cache` | `parse_dash_summary()`; `parse_module_bazel_lock()`; `generate_synthetic_cargo_lock()`; end-to-end summary CSV round-trip |
210+
| `test_generate_cpp_metadata_cache.py` | `test_generate_cpp_metadata_cache` | `convert_cdxgen_to_cache()`: version, license (id/name/expression/AND), supplier (name/publisher fallback), PURL, URL from externalReferences, description |
211+
| `test_spdx_to_github_snapshot.py` | `test_spdx_to_github_snapshot` | `convert_spdx_to_snapshot()`: top-level fields; direct vs. indirect classification; package filtering; manifest naming; `pkg:generic/` PURL support |
212+
213+
---

cpp_metadata.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{}

0 commit comments

Comments
 (0)