Skip to content

Add sbom generation tooling (#2232)#1

Merged
AlexanderLanin merged 1 commit intoeclipse-score:mainfrom
Lukasz-Juranek:feat/issue-2232-sbom-init
Mar 11, 2026
Merged

Add sbom generation tooling (#2232)#1
AlexanderLanin merged 1 commit intoeclipse-score:mainfrom
Lukasz-Juranek:feat/issue-2232-sbom-init

Conversation

@Lukasz-Juranek
Copy link
Contributor

@Lukasz-Juranek Lukasz-Juranek commented Mar 6, 2026

@Lukasz-Juranek Lukasz-Juranek force-pushed the feat/issue-2232-sbom-init branch 4 times, most recently from 922ffdd to 1a69f6d Compare March 6, 2026 10:48
@Lukasz-Juranek Lukasz-Juranek marked this pull request as ready for review March 6, 2026 10:49
@AlexanderLanin AlexanderLanin requested a review from Copilot March 6, 2026 11:37
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Bazel-native SBOM generation for SCORE modules, including SPDX 2.3 + CycloneDX 1.6 emitters, metadata collection via a module extension/aspect, and supporting scripts + fixtures/tests.

Changes:

  • Introduces Bazel rules/aspect/extension to collect dependency metadata and generate SPDX/CycloneDX SBOM outputs.
  • Adds Python generators/formatters plus helper scripts (crate metadata cache, C++ metadata cache, SPDX→GitHub Dependency Submission snapshot).
  • Adds a comprehensive test suite with real fixtures and expanded README/setup documentation.

Reviewed changes

Copilot reviewed 43 out of 45 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/test_spdx_to_github_snapshot.py Unit tests for SPDX→GitHub snapshot conversion logic.
tests/test_spdx_formatter.py Unit tests for SPDX 2.3 JSON generation and license normalization.
tests/test_real_sbom_integration.py Integration tests generating SBOMs from real fixture inputs (includes online validator check).
tests/test_generate_crates_metadata_cache.py Tests for parsing dash-license-scan output, MODULE.bazel.lock crate extraction, and synthetic Cargo.lock generation.
tests/test_generate_cpp_metadata_cache.py Tests for converting cdxgen CycloneDX output into a C++ metadata cache.
tests/test_cyclonedx_formatter.py Unit tests for CycloneDX 1.6 JSON generation and license encoding rules.
tests/test_cpp_enrich_checksum.py Tests for C++ cache enrichment and enforcing no manual curation in cpp_metadata.json.
tests/test_bcr_known_licenses.py Tests for BCR known-license fallback table and license-application priority behavior.
tests/fixtures/sbom_metadata.json Fixture metadata input for integration tests.
tests/fixtures/reference_integration.MODULE.bazel.lock Fixture lockfile slice used for module version/hash enrichment tests.
tests/fixtures/orchestrator_cdxgen.cdx.json Fixture cdxgen output for orchestrator integration test path.
tests/fixtures/kyron_cdxgen.cdx.json Fixture cdxgen output for kyron integration test path.
tests/fixtures/baselibs_input.json Fixture Bazel aspect output for baselibs integration scenario.
tests/init.py Declares tests as a Python package.
tests/BUILD Bazel pytest targets for the SBOM test suite.
scripts/spdx_to_github_snapshot.py Implements SPDX 2.3 → GitHub Dependency Submission snapshot conversion.
scripts/generate_crates_metadata_cache.py Script to build Rust crate metadata cache via lockfiles + dash-license-scan + crates.io API.
scripts/generate_cpp_metadata_cache.py Script to convert cdxgen CycloneDX output into cpp_metadata.json cache format.
scripts/BUILD.bazel Bazel py_library targets for scripts.
npm_wrapper.sh Shell wrapper intended to run npm/cdxgen from Bazel actions.
internal/rules.bzl Core sbom_rule implementation wiring aspect outputs + generator actions (+ optional cache/cdxgen generation).
internal/providers.bzl Defines SbomDepsInfo and SbomMetadataInfo providers.
internal/metadata_rule.bzl Rule wrapper to expose metadata JSON produced by the extension.
internal/generator/utils.py Shared utility for SPDX license operator normalization.
internal/generator/spdx_formatter.py SPDX 2.3 JSON formatter implementation.
internal/generator/sbom_generator.py Main SBOM generator entry point (resolving components, enrichment, writing outputs).
internal/generator/cyclonedx_formatter.py CycloneDX 1.6 JSON formatter implementation (components + dependency graph).
internal/generator/init.py Declares generator package.
internal/generator/BUILD Bazel targets for generator binaries/libraries.
internal/aspect.bzl Aspect collecting transitive deps and external dependency edges.
internal/init.py Declares internal package.
internal/BUILD Exports internal bzl implementation files.
extensions.bzl Module extension collecting dependency metadata (modules/http_archives/git/crates/licenses).
defs.bzl Public sbom() macro API wrapping the rule.
cpp_metadata.json Initializes C++ metadata cache file (empty).
README.md Expanded setup/usage/architecture documentation for SBOM tooling.
MODULE.bazel Declares module metadata and Python toolchain deps for this repo.
BUILD.bazel Exports public SBOM API files and provides npm wrapper sh_binary.
.gitignore Ignores Bazel outputs, lockfile, and Python bytecode caches.
.bazelrc Adds registries and Java toolchain settings for builds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Lukasz-Juranek Lukasz-Juranek force-pushed the feat/issue-2232-sbom-init branch from 1a69f6d to 883fa29 Compare March 10, 2026 20:41
@Lukasz-Juranek Lukasz-Juranek force-pushed the feat/issue-2232-sbom-init branch from 883fa29 to 47a1d54 Compare March 10, 2026 20:48
@Lukasz-Juranek
Copy link
Contributor Author

fixed the issues

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 43 out of 45 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +106 to +110
# source.json entry – carries the sha256 of the downloaded source
# tarball for this module@version. Use it as the component hash.
source_match = re.search(
r"/modules/([^/]+)/([^/]+)/source\.json$",
url,
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code/comment assumes the registryFileHashes hash for .../source.json is the SHA-256 of the module source tarball. In Bazel lockfiles, registryFileHashes hashes the registry files themselves, so exposing this as the component artifact checksum makes the SBOM hash misleading.

Consider omitting module checksums from this source, or fetching/parsing source.json to extract the archive integrity/sha256 (with an explicit networked step).

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +133
cache_cmd = "set -euo pipefail\npython3 {} {}".format(
ctx.file._crates_cache_script.path,
crates_cache.path,
)
if ctx.file.cargo_lockfile:
cache_inputs.append(ctx.file.cargo_lockfile)
cache_cmd += " --cargo-lock {}".format(ctx.file.cargo_lockfile.path)
for lock in ctx.files.module_lockfiles:
cache_inputs.append(lock)
cache_cmd += " --module-lock {}".format(lock.path)
ctx.actions.run_shell(
inputs = cache_inputs,
outputs = [crates_cache],
command = cache_cmd,
mnemonic = "CratesCacheGenerate",
progress_message = "Generating crates metadata cache for %s" % ctx.attr.name,
execution_requirements = {"requires-network": ""},
use_default_shell_env = True,
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crates cache generation action shells out to python3 ..., which bypasses Bazel's configured Python toolchain and can fail in hermetic/remote execution environments (or when python3 isn't on PATH). Prefer running a declared executable (e.g., a py_binary target) via ctx.actions.run, or use the Python toolchain runtime explicitly, so the action is portable and reproducible.

Suggested change
cache_cmd = "set -euo pipefail\npython3 {} {}".format(
ctx.file._crates_cache_script.path,
crates_cache.path,
)
if ctx.file.cargo_lockfile:
cache_inputs.append(ctx.file.cargo_lockfile)
cache_cmd += " --cargo-lock {}".format(ctx.file.cargo_lockfile.path)
for lock in ctx.files.module_lockfiles:
cache_inputs.append(lock)
cache_cmd += " --module-lock {}".format(lock.path)
ctx.actions.run_shell(
inputs = cache_inputs,
outputs = [crates_cache],
command = cache_cmd,
mnemonic = "CratesCacheGenerate",
progress_message = "Generating crates metadata cache for %s" % ctx.attr.name,
execution_requirements = {"requires-network": ""},
use_default_shell_env = True,
cache_args = ctx.actions.args()
cache_args.add(crates_cache.path)
if ctx.file.cargo_lockfile:
cache_inputs.append(ctx.file.cargo_lockfile)
cache_args.add("--cargo-lock", ctx.file.cargo_lockfile.path)
for lock in ctx.files.module_lockfiles:
cache_inputs.append(lock)
cache_args.add("--module-lock", lock.path)
ctx.actions.run(
inputs = cache_inputs,
outputs = [crates_cache],
executable = ctx.executable._crates_cache_script,
arguments = [cache_args],
mnemonic = "CratesCacheGenerate",
progress_message = "Generating crates metadata cache for %s" % ctx.attr.name,
execution_requirements = {"requires-network": ""},

Copilot uses AI. Check for mistakes.
Comment on lines +140 to +144
ctx.actions.run(
outputs = [cdxgen_sbom],
executable = ctx.executable._npm,
arguments = [
"exec",
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto_cdxgen action expects npm/cdxgen to be discoverable via PATH (and optionally NVM via $NVM_DIR), but this ctx.actions.run() invocation doesn't opt into inheriting the host environment. Without use_default_shell_env = True (or an explicit env), this can fail to locate system-installed Node/npm/cdxgen in common setups (e.g., nvm installs under $HOME).

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +51
License metadata is collected automatically:
- Rust crates: from crates_metadata.json cache (bundled with tooling)
- C++ deps: from cpp_metadata.json cache (bundled with tooling)
- Bazel modules: version/PURL auto-extracted from module graph
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring says crate/C++ license metadata is collected automatically from the bundled crates_metadata.json/cpp_metadata.json caches, but the current rule/generator flow only loads crates metadata when --crates-cache is explicitly passed (and there is no codepath that reads the bundled files by default). Either wire the bundled cache files into the rule/generator defaults, or update the documentation here to match the actual behavior.

Suggested change
License metadata is collected automatically:
- Rust crates: from crates_metadata.json cache (bundled with tooling)
- C++ deps: from cpp_metadata.json cache (bundled with tooling)
- Bazel modules: version/PURL auto-extracted from module graph
License metadata can be collected automatically when the corresponding
inputs or caches are provided:
- Rust crates: from a crates metadata cache generated from Cargo.lock
and/or MODULE.bazel.lock files (see cargo_lockfile/module_lockfiles and
auto_crates_cache)
- C++ deps: optionally enriched from a CycloneDX SBOM produced by cdxgen
(see cdxgen_sbom/auto_cdxgen)
- Bazel modules: version/PURL auto-extracted from the Bazel module graph

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +6
SBOM generation. License metadata is collected automatically from
bundled caches (crates_metadata.json, cpp_metadata.json).
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module extension docstring claims license metadata is collected automatically from bundled caches (crates_metadata.json, cpp_metadata.json), but the extension implementation does not read those files. This is misleading for consumers; either implement the cache loading/merging here (or in the generator) or adjust this header comment to reflect the actual data sources.

Suggested change
SBOM generation. License metadata is collected automatically from
bundled caches (crates_metadata.json, cpp_metadata.json).
SBOM generation. License and other dependency metadata are consumed
from pre-generated metadata (for example, JSON caches passed via
the metadata_content attribute).

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +28
if cdx_data.get("bomFormat") != "CycloneDX":
print("Error: Input is not a CycloneDX JSON file", file=sys.stderr)
sys.exit(1)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert_cdxgen_to_cache() calls sys.exit(1) on non-CycloneDX input. Since this function is part of the exported py_library (and is imported in tests), exiting the interpreter makes it hard to reuse safely as a library function. Prefer raising a typed exception (e.g., ValueError) or returning {} and letting main() handle the CLI exit code.

Copilot uses AI. Check for mistakes.
Comment on lines +188 to +192
uvx,
"--from",
"dash-license-scan@git+https://github.com/eclipse-score/dash-license-scan",
"dash-license-scan",
"--summary",
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_dash_license_scan() installs/runs dash-license-scan from an unpinned Git URL (dash-license-scan@git+https://...). That makes results non-reproducible over time and increases supply-chain risk for a networked build-time action. Consider pinning to a specific tag/commit SHA, or using a versioned/published artifact if available.

Copilot uses AI. Check for mistakes.
Copy link
Member

@AlexanderLanin AlexanderLanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all copilot findings not critical, merging as-is. can be improved afterwards.

@AlexanderLanin AlexanderLanin merged commit 0db176c into eclipse-score:main Mar 11, 2026
5 checks passed
@masc2023
Copy link

@Lukasz-Juranek , thank you for the contribution and congrats to the first PR merged in this repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants