Conversation
Reviewer's GuideThe CycloneDX SBOM loader has been refactored to centralize label extraction and extended to detect AI model components by adding an "ai" label; new integration tests and JSON fixtures have been added to validate ingestion of AIBOM CycloneDX documents. Sequence diagram for ingesting an AIBOM CycloneDX SBOMsequenceDiagram
participant Test as "Test Function"
participant Ingestor as "IngestorService"
participant Loader as "CyclonedxLoader"
participant Labels as "Labels"
Test->>Ingestor: ingest(data, Format::CycloneDX, ...)
Ingestor->>Loader: load(buffer)
Loader->>Labels: extract_labels(components)
Labels-->>Loader: labels (with "ai" if AI model found)
Loader-->>Ingestor: processed SBOM with labels
Ingestor-->>Test: ingestion result
Class diagram for CycloneDX SBOM loader label extraction changesclassDiagram
class CyclonedxLoader {
+load(buffer: &[u8])
}
class Labels {
+add(key: &str, value: &str) Labels
}
class Component {
+type_: String
}
CyclonedxLoader --> Labels
CyclonedxLoader --> Component
CyclonedxLoader : +extract_labels(components: Option<&Vec<Component>>) Labels
Labels <.. extract_labels
extract_labels --> Component
extract_labels --> Labels
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
@desmax74 format : ) |
f777708 to
9cc19d7
Compare
|
Searching through all components to determine a label might become problematic for large sboms. I would suggest that we try to figure out if we can determine what we need at this stage using Or in case like in the tests here, looking for the property Both of these information could indicate that ai is used in the sbom. If we need to search for model in the components, I would propose that we do things like we do for others sbom entities: Create a Creator/Processor that will be called while iterating through components (in a single iteration per sbom) and extract necessary information. In the first iteration we can only add a label, but later on we can extract model card information and store it in the database. |
|
If we are going to switch to the current approach, here's some quick code improvements that avoid duplicate value assignments and improve performance as they exist the loop when the first model is encountered diff --git a/modules/ingestor/src/service/sbom/cyclonedx.rs b/modules/ingestor/src/service/sbom/cyclonedx.rs
index 17343837..336ec036 100644
--- a/modules/ingestor/src/service/sbom/cyclonedx.rs
+++ b/modules/ingestor/src/service/sbom/cyclonedx.rs
@@ -30,7 +30,7 @@ impl<'g> CyclonedxLoader<'g> {
let cdx: Box<serde_cyclonedx::cyclonedx::v_1_6::CycloneDx> = serde_json::from_slice(buffer)
.map_err(|err| Error::UnsupportedFormat(format!("Failed to parse: {err}")))?;
- let labels = labels_with_ai_type_check(cdx.components.clone());
+ let labels = extract_labels(cdx.components.as_ref());
log::info!(
"Storing - version: {:?}, serialNumber: {:?}",
@@ -76,22 +76,20 @@ impl<'g> CyclonedxLoader<'g> {
}
}
-fn labels_with_ai_type_check(components: Option<Vec<Component>>) -> Labels {
- match components {
- Some(vec) => {
- for component in vec {
- if component.type_ == "machine-learning-model" {
- return Labels::new()
- .add("type", "cyclonedx")
- .add("ai", "machine-learning-model");
- }
- }
- }
- None => {
- return Labels::new().add("type", "cyclonedx");
+fn extract_labels(components: Option<&Vec<Component>>) -> Labels {
+ let mut labels = Labels::new().add("type", "cyclonedx");
+
+ // find if there are machine learning model components in the SBOM
+ if let Some(components) = components {
+ if components
+ .iter()
+ .any(|c| c.type_ == "machine-learning-model")
+ {
+ labels = labels.add("ai", "machine-learning-model");
}
}
- Labels::new().add("type", "cyclonedx")
+
+ labels
}
#[cfg(test)]
Before approving this, let's test if there are ingestion performance degradation for cyclonedx. I'll try to find a good example. |
|
I don't think we have a good one in our examples yet, but this one with ~400 components can be a good start |
|
Just noticed this PR and how it works. There's another PR, which actually imports those properties: #1913 I use this as an example for currently ignored data. |
|
@ctron Thanks for the pointer. Unfortunately, it turns out that those properties are not reliable in identifying ai components in the sbom. I think searching through the components is the only way. The good news is that this process doesn't slow down ingestion process in the first tests. The other option is to create a full |
26e68c2 to
a0fb298
Compare
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- extract_labels currently creates a brand-new Labels object instead of augmenting the incoming labels—consider changing it to take and merge with the existing labels to preserve other metadata.
- The AI component type is hardcoded as "machine-learning-model"—it would be more future-proof to parameterize or extract this list into a constant or config so you can easily support additional AI types later.
- Add a negative test case for a CycloneDX SBOM without any ML components to assert that the "ai" label is only added when appropriate, preventing false positives.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- extract_labels currently creates a brand-new Labels object instead of augmenting the incoming labels—consider changing it to take and merge with the existing labels to preserve other metadata.
- The AI component type is hardcoded as "machine-learning-model"—it would be more future-proof to parameterize or extract this list into a constant or config so you can easily support additional AI types later.
- Add a negative test case for a CycloneDX SBOM without any ML components to assert that the "ai" label is only added when appropriate, preventing false positives.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
eada6c6 to
f244087
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1991 +/- ##
==========================================
- Coverage 68.43% 68.22% -0.22%
==========================================
Files 359 359
Lines 19923 19947 +24
Branches 19923 19947 +24
==========================================
- Hits 13634 13608 -26
- Misses 5504 5559 +55
+ Partials 785 780 -5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a50b271 to
72e5ed4
Compare
This PR loops through the sbom components and applies kind label if it finds ai or crypto components inside Signed-off-by: desmax74 <mdessi@redhat.com>
|
/scale-test |
|
🛠️ Scale test has started! Follow the progress here: Workflow Run |
Goose ReportGoose Attack ReportPlan Overview
Request Metrics
Response Time Metrics
Status Code Metrics
Transaction Metrics
Scenario Metrics
📄 Full Report (Go to "Artifacts" and download report) |
|
/backport |
|
Successfully created backport PR for |
AI label added when the CycloneDX 1.6 file contains
"type": "machine-learning-model"on the components based on the spec https://cyclonedx.org/docs/1.6/json/#components_items_typeSee: https://issues.redhat.com/browse/TC-1910
Summary by Sourcery
Add tests and fixtures to validate ingestion and analysis of AI BOM (AIBOM) CycloneDX SBOMs and ensure component group and version fields are correctly captured for SPDX and CycloneDX packages.
Tests:
Summary by Sourcery
Tag CycloneDX SBOMs with an AI label when machine-learning-model components are present and validate the ingestion with new tests and fixtures.
New Features:
Enhancements:
Tests: