Add sbom generation tooling (#2232) by Lukasz-Juranek · Pull Request #1 · eclipse-score/sbom-tool

Lukasz-Juranek · 2026-03-06T10:41:57Z

This PR adds SBOM Bazel rules for SCORE modules. For setup details see the https://github.com/Lukasz-Juranek/sbom-tool/blob/feat/issue-2232-sbom-init/README.md

Old discussion is under eclipse-score/tooling#106

Example SBOMs generated by tooling validated with https://sbomgenerator.com/tools/validator

sbom_feo.cdx.json
sbom_feo.spdx.json
sbom_orchestrator.cdx.json
sbom_orchestrator.spdx.json
sbom_baselibs.cdx.json
sbom_baselibs.spdx.json

Documentation added in eclipse-score/score#2672

Copilot

Pull request overview

Adds Bazel-native SBOM generation for SCORE modules, including SPDX 2.3 + CycloneDX 1.6 emitters, metadata collection via a module extension/aspect, and supporting scripts + fixtures/tests.

Changes:

Introduces Bazel rules/aspect/extension to collect dependency metadata and generate SPDX/CycloneDX SBOM outputs.
Adds Python generators/formatters plus helper scripts (crate metadata cache, C++ metadata cache, SPDX→GitHub Dependency Submission snapshot).
Adds a comprehensive test suite with real fixtures and expanded README/setup documentation.

Reviewed changes

Copilot reviewed 43 out of 45 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/test_spdx_to_github_snapshot.py	Unit tests for SPDX→GitHub snapshot conversion logic.
tests/test_spdx_formatter.py	Unit tests for SPDX 2.3 JSON generation and license normalization.
tests/test_real_sbom_integration.py	Integration tests generating SBOMs from real fixture inputs (includes online validator check).
tests/test_generate_crates_metadata_cache.py	Tests for parsing dash-license-scan output, MODULE.bazel.lock crate extraction, and synthetic Cargo.lock generation.
tests/test_generate_cpp_metadata_cache.py	Tests for converting cdxgen CycloneDX output into a C++ metadata cache.
tests/test_cyclonedx_formatter.py	Unit tests for CycloneDX 1.6 JSON generation and license encoding rules.
tests/test_cpp_enrich_checksum.py	Tests for C++ cache enrichment and enforcing no manual curation in cpp_metadata.json.
tests/test_bcr_known_licenses.py	Tests for BCR known-license fallback table and license-application priority behavior.
tests/fixtures/sbom_metadata.json	Fixture metadata input for integration tests.
tests/fixtures/reference_integration.MODULE.bazel.lock	Fixture lockfile slice used for module version/hash enrichment tests.
tests/fixtures/orchestrator_cdxgen.cdx.json	Fixture cdxgen output for orchestrator integration test path.
tests/fixtures/kyron_cdxgen.cdx.json	Fixture cdxgen output for kyron integration test path.
tests/fixtures/baselibs_input.json	Fixture Bazel aspect output for baselibs integration scenario.
tests/init.py	Declares tests as a Python package.
tests/BUILD	Bazel pytest targets for the SBOM test suite.
scripts/spdx_to_github_snapshot.py	Implements SPDX 2.3 → GitHub Dependency Submission snapshot conversion.
scripts/generate_crates_metadata_cache.py	Script to build Rust crate metadata cache via lockfiles + dash-license-scan + crates.io API.
scripts/generate_cpp_metadata_cache.py	Script to convert cdxgen CycloneDX output into cpp_metadata.json cache format.
scripts/BUILD.bazel	Bazel py_library targets for scripts.
npm_wrapper.sh	Shell wrapper intended to run npm/cdxgen from Bazel actions.
internal/rules.bzl	Core sbom_rule implementation wiring aspect outputs + generator actions (+ optional cache/cdxgen generation).
internal/providers.bzl	Defines SbomDepsInfo and SbomMetadataInfo providers.
internal/metadata_rule.bzl	Rule wrapper to expose metadata JSON produced by the extension.
internal/generator/utils.py	Shared utility for SPDX license operator normalization.
internal/generator/spdx_formatter.py	SPDX 2.3 JSON formatter implementation.
internal/generator/sbom_generator.py	Main SBOM generator entry point (resolving components, enrichment, writing outputs).
internal/generator/cyclonedx_formatter.py	CycloneDX 1.6 JSON formatter implementation (components + dependency graph).
internal/generator/init.py	Declares generator package.
internal/generator/BUILD	Bazel targets for generator binaries/libraries.
internal/aspect.bzl	Aspect collecting transitive deps and external dependency edges.
internal/init.py	Declares internal package.
internal/BUILD	Exports internal bzl implementation files.
extensions.bzl	Module extension collecting dependency metadata (modules/http_archives/git/crates/licenses).
defs.bzl	Public sbom() macro API wrapping the rule.
cpp_metadata.json	Initializes C++ metadata cache file (empty).
README.md	Expanded setup/usage/architecture documentation for SBOM tooling.
MODULE.bazel	Declares module metadata and Python toolchain deps for this repo.
BUILD.bazel	Exports public SBOM API files and provides npm wrapper sh_binary.
.gitignore	Ignores Bazel outputs, lockfile, and Python bytecode caches.
.bazelrc	Adds registries and Java toolchain settings for builds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

scripts/spdx_to_github_snapshot.py

internal/generator/cyclonedx_formatter.py

npm_wrapper.sh

README.md

tests/test_generate_cpp_metadata_cache.py

tests/test_real_sbom_integration.py

Lukasz-Juranek · 2026-03-10T20:49:04Z

fixed the issues

Copilot

Pull request overview

Copilot reviewed 43 out of 45 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-11T10:57:12Z

internal/generator/sbom_generator.py

+            # source.json entry – carries the sha256 of the downloaded source
+            # tarball for this module@version. Use it as the component hash.
+            source_match = re.search(
+                r"/modules/([^/]+)/([^/]+)/source\.json$",
+                url,


The code/comment assumes the registryFileHashes hash for .../source.json is the SHA-256 of the module source tarball. In Bazel lockfiles, registryFileHashes hashes the registry files themselves, so exposing this as the component artifact checksum makes the SBOM hash misleading.

Consider omitting module checksums from this source, or fetching/parsing source.json to extract the archive integrity/sha256 (with an explicit networked step).

Copilot · 2026-03-11T10:57:13Z

internal/rules.bzl

+        cache_cmd = "set -euo pipefail\npython3 {} {}".format(
+            ctx.file._crates_cache_script.path,
+            crates_cache.path,
+        )
+        if ctx.file.cargo_lockfile:
+            cache_inputs.append(ctx.file.cargo_lockfile)
+            cache_cmd += " --cargo-lock {}".format(ctx.file.cargo_lockfile.path)
+        for lock in ctx.files.module_lockfiles:
+            cache_inputs.append(lock)
+            cache_cmd += " --module-lock {}".format(lock.path)
+        ctx.actions.run_shell(
+            inputs = cache_inputs,
+            outputs = [crates_cache],
+            command = cache_cmd,
+            mnemonic = "CratesCacheGenerate",
+            progress_message = "Generating crates metadata cache for %s" % ctx.attr.name,
+            execution_requirements = {"requires-network": ""},
+            use_default_shell_env = True,


The crates cache generation action shells out to python3 ..., which bypasses Bazel's configured Python toolchain and can fail in hermetic/remote execution environments (or when python3 isn't on PATH). Prefer running a declared executable (e.g., a py_binary target) via ctx.actions.run, or use the Python toolchain runtime explicitly, so the action is portable and reproducible.

Suggested change

cache_cmd = "set -euo pipefail\npython3 {} {}".format(

ctx.file._crates_cache_script.path,

crates_cache.path,

)

if ctx.file.cargo_lockfile:

cache_inputs.append(ctx.file.cargo_lockfile)

cache_cmd += " --cargo-lock {}".format(ctx.file.cargo_lockfile.path)

for lock in ctx.files.module_lockfiles:

cache_inputs.append(lock)

cache_cmd += " --module-lock {}".format(lock.path)

ctx.actions.run_shell(

inputs = cache_inputs,

outputs = [crates_cache],

command = cache_cmd,

mnemonic = "CratesCacheGenerate",

progress_message = "Generating crates metadata cache for %s" % ctx.attr.name,

execution_requirements = {"requires-network": ""},

use_default_shell_env = True,

cache_args = ctx.actions.args()

cache_args.add(crates_cache.path)

if ctx.file.cargo_lockfile:

cache_inputs.append(ctx.file.cargo_lockfile)

cache_args.add("--cargo-lock", ctx.file.cargo_lockfile.path)

for lock in ctx.files.module_lockfiles:

cache_inputs.append(lock)

cache_args.add("--module-lock", lock.path)

ctx.actions.run(

inputs = cache_inputs,

outputs = [crates_cache],

executable = ctx.executable._crates_cache_script,

arguments = [cache_args],

mnemonic = "CratesCacheGenerate",

progress_message = "Generating crates metadata cache for %s" % ctx.attr.name,

execution_requirements = {"requires-network": ""},

Copilot · 2026-03-11T10:57:13Z

internal/rules.bzl

+        ctx.actions.run(
+            outputs = [cdxgen_sbom],
+            executable = ctx.executable._npm,
+            arguments = [
+                "exec",


The auto_cdxgen action expects npm/cdxgen to be discoverable via PATH (and optionally NVM via $NVM_DIR), but this ctx.actions.run() invocation doesn't opt into inheriting the host environment. Without use_default_shell_env = True (or an explicit env), this can fail to locate system-installed Node/npm/cdxgen in common setups (e.g., nvm installs under $HOME).

Copilot · 2026-03-11T10:57:13Z

defs.bzl

+    License metadata is collected automatically:
+    - Rust crates: from crates_metadata.json cache (bundled with tooling)
+    - C++ deps: from cpp_metadata.json cache (bundled with tooling)
+    - Bazel modules: version/PURL auto-extracted from module graph


This docstring says crate/C++ license metadata is collected automatically from the bundled crates_metadata.json/cpp_metadata.json caches, but the current rule/generator flow only loads crates metadata when --crates-cache is explicitly passed (and there is no codepath that reads the bundled files by default). Either wire the bundled cache files into the rule/generator defaults, or update the documentation here to match the actual behavior.

Suggested change

License metadata is collected automatically:

- Rust crates: from crates_metadata.json cache (bundled with tooling)

- C++ deps: from cpp_metadata.json cache (bundled with tooling)

- Bazel modules: version/PURL auto-extracted from module graph

License metadata can be collected automatically when the corresponding

inputs or caches are provided:

- Rust crates: from a crates metadata cache generated from Cargo.lock

and/or MODULE.bazel.lock files (see cargo_lockfile/module_lockfiles and

auto_crates_cache)

- C++ deps: optionally enriched from a CycloneDX SBOM produced by cdxgen

(see cdxgen_sbom/auto_cdxgen)

- Bazel modules: version/PURL auto-extracted from the Bazel module graph

Copilot · 2026-03-11T10:57:14Z

extensions.bzl

+SBOM generation. License metadata is collected automatically from
+bundled caches (crates_metadata.json, cpp_metadata.json).


The module extension docstring claims license metadata is collected automatically from bundled caches (crates_metadata.json, cpp_metadata.json), but the extension implementation does not read those files. This is misleading for consumers; either implement the cache loading/merging here (or in the generator) or adjust this header comment to reflect the actual data sources.

Suggested change

SBOM generation. License metadata is collected automatically from

bundled caches (crates_metadata.json, cpp_metadata.json).

SBOM generation. License and other dependency metadata are consumed

from pre-generated metadata (for example, JSON caches passed via

the metadata_content attribute).

Copilot · 2026-03-11T10:57:14Z

scripts/generate_cpp_metadata_cache.py

+    if cdx_data.get("bomFormat") != "CycloneDX":
+        print("Error: Input is not a CycloneDX JSON file", file=sys.stderr)
+        sys.exit(1)


convert_cdxgen_to_cache() calls sys.exit(1) on non-CycloneDX input. Since this function is part of the exported py_library (and is imported in tests), exiting the interpreter makes it hard to reuse safely as a library function. Prefer raising a typed exception (e.g., ValueError) or returning {} and letting main() handle the CLI exit code.

Copilot · 2026-03-11T10:57:14Z

scripts/generate_crates_metadata_cache.py

+        uvx,
+        "--from",
+        "dash-license-scan@git+https://github.com/eclipse-score/dash-license-scan",
+        "dash-license-scan",
+        "--summary",


run_dash_license_scan() installs/runs dash-license-scan from an unpinned Git URL (dash-license-scan@git+https://...). That makes results non-reproducible over time and increases supply-chain risk for a networked build-time action. Consider pinning to a specific tag/commit SHA, or using a versioned/published artifact if available.

AlexanderLanin

all copilot findings not critical, merging as-is. can be improved afterwards.

masc2023 · 2026-03-11T12:31:36Z

@Lukasz-Juranek , thank you for the contribution and congrats to the first PR merged in this repo

Lukasz-Juranek force-pushed the feat/issue-2232-sbom-init branch 4 times, most recently from 922ffdd to 1a69f6d Compare March 6, 2026 10:48

Lukasz-Juranek marked this pull request as ready for review March 6, 2026 10:49

Lukasz-Juranek mentioned this pull request Mar 6, 2026

Add sbom generation tooling (#2232) eclipse-score/tooling#106

Closed

AlexanderLanin requested a review from Copilot March 6, 2026 11:37

Copilot started reviewing on behalf of AlexanderLanin March 6, 2026 11:38 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Lukasz-Juranek force-pushed the feat/issue-2232-sbom-init branch from 1a69f6d to 883fa29 Compare March 10, 2026 20:41

Add sbom generation tooling (#2232)

47a1d54

Lukasz-Juranek force-pushed the feat/issue-2232-sbom-init branch from 883fa29 to 47a1d54 Compare March 10, 2026 20:48

Lukasz-Juranek mentioned this pull request Mar 10, 2026

Add sbom generation description (#2232) eclipse-score/score#2672

Merged

AlexanderLanin requested a review from Copilot March 11, 2026 10:47

Copilot started reviewing on behalf of AlexanderLanin March 11, 2026 10:47 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

AlexanderLanin approved these changes Mar 11, 2026

View reviewed changes

AlexanderLanin merged commit 0db176c into eclipse-score:main Mar 11, 2026
5 checks passed

-    License metadata is collected automatically:
-    - Rust crates: from crates_metadata.json cache (bundled with tooling)
-    - C++ deps: from cpp_metadata.json cache (bundled with tooling)
-    - Bazel modules: version/PURL auto-extracted from module graph
+    License metadata can be collected automatically when the corresponding
+    inputs or caches are provided:
+    - Rust crates: from a crates metadata cache generated from Cargo.lock
+      and/or MODULE.bazel.lock files (see cargo_lockfile/module_lockfiles and
+      auto_crates_cache)
+    - C++ deps: optionally enriched from a CycloneDX SBOM produced by cdxgen
+      (see cdxgen_sbom/auto_cdxgen)
+    - Bazel modules: version/PURL auto-extracted from the Bazel module graph

		SBOM generation. License metadata is collected automatically from
		bundled caches (crates_metadata.json, cpp_metadata.json).

-SBOM generation. License metadata is collected automatically from
-bundled caches (crates_metadata.json, cpp_metadata.json).
+SBOM generation. License and other dependency metadata are consumed
+from pre-generated metadata (for example, JSON caches passed via
+the metadata_content attribute).

Conversation

Lukasz-Juranek commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lukasz-Juranek commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

AlexanderLanin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

masc2023 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Lukasz-Juranek commented Mar 6, 2026 •

edited

Loading