feat: add support for CSAF advisories with CPE-based product IDs instead of PURLs#2257
feat: add support for CSAF advisories with CPE-based product IDs instead of PURLs#2257Strum355 wants to merge 2 commits intoguacsec:mainfrom
Conversation
Reviewer's GuideExtends vulnerability analysis to incorporate CSAF product_status entries that reference products by CPE instead of PURL, by prefetching CPE records, expanding SQL to union product_status-based advisories, threading CPE context through the analysis pipeline, and adding tests to validate behavior on real CSAF documents. Sequence diagram for CPE-aware vulnerability analysis pipelinesequenceDiagram
actor Client
participant VulnerabilityService
participant DB as Database
participant CPEMap as cpe_map
participant PurlStatus
Client->>VulnerabilityService: analyze_vulnerabilities(connection, purls)
Note over VulnerabilityService: Fetch purl_status and product_status data
VulnerabilityService->>DB: SELECT ... FROM purl_status ...
DB-->>VulnerabilityService: purl_status rows (JSON advisories)
VulnerabilityService->>DB: SELECT ... FROM product_status LEFT JOIN cpe ...
DB-->>VulnerabilityService: product_status rows (JSON advisories with context_cpe)
Note over VulnerabilityService: Collect IDs from JSONB advisories
VulnerabilityService->>VulnerabilityService: extract advisory_id, context_cpe into sets
VulnerabilityService->>DB: SELECT * FROM advisory WHERE id = ANY(advisory_ids)
DB-->>VulnerabilityService: advisory models
VulnerabilityService->>DB: SELECT * FROM cpe WHERE id = ANY(cpe_ids)
DB-->>VulnerabilityService: cpe models
VulnerabilityService->>CPEMap: build HashMap<Uuid, cpe_Model>
Note over VulnerabilityService: Build analysis details per PURL
loop for each purl
VulnerabilityService->>VulnerabilityService: deserialize AdvisoryEntryDetailsPhase (includes context_cpe)
alt context_cpe present
VulnerabilityService->>CPEMap: lookup cpe_model by id
CPEMap-->>VulnerabilityService: cpe_Model
VulnerabilityService->>VulnerabilityService: cpe_string = cpe_Model.to_string()
else no context_cpe
VulnerabilityService->>VulnerabilityService: cpe_string = None
end
VulnerabilityService->>PurlStatus: from_head(head, advisory_head, purl, status, version_range, cpe_string, scores)
PurlStatus-->>VulnerabilityService: PurlStatus instance
end
VulnerabilityService-->>Client: AnalysisData and AnalysisDetailsV3
Class diagram for updated vulnerability analysis data flow with CPE supportclassDiagram
class VulnerabilityService {
+analyze_vulnerabilities(connection, purls) Result~AnalysisData~
+build_analysis_details_v3(connection, head, purl, descriptions_map, scores_map, advisories_map, cpe_map) Result~tuple~
}
class AnalysisData {
+purls_with_vulnerabilities : Vec
+warnings : Vec
+descriptions_map : HashMap~String, vulnerability_description_Model~
+scores : Vec~advisory_vulnerability_score_Model~
+advisories_map : HashMap~Uuid, AdvisoryData~
+cpe_map : HashMap~Uuid, cpe_Model~
}
class AdvisoryEntryQueryPhase {
+advisory_id : Uuid
+context_cpe : Option~Uuid~
}
class AdvisoryEntryDetailsPhase {
+status : String
+advisory_id : Uuid
+version_range : VersionRange
+remediations : Vec~RemediationEntry~
+context_cpe : Option~Uuid~
}
class cpe_Model {
+id : Uuid
+to_string() String
}
class PurlStatus {
+from_head(head, advisory_head, purl, status, version_range, context_cpe_string, scores) Result~PurlStatus~
}
class AdvisoryData {
<<struct>>
}
class VersionRange {
<<struct>>
}
class RemediationEntry {
<<struct>>
}
class AnalysisDetailsV3 {
<<struct>>
}
class AnalysisPurlStatus {
+purl_status : PurlStatus
}
VulnerabilityService --> AnalysisData : builds
VulnerabilityService --> AnalysisDetailsV3 : builds
AnalysisData --> cpe_Model : uses as map values
AnalysisData --> AdvisoryData : uses as map values
VulnerabilityService --> AdvisoryEntryQueryPhase : deserializes from JSONB
VulnerabilityService --> AdvisoryEntryDetailsPhase : deserializes from JSONB
VulnerabilityService --> cpe_Model : prefetches
VulnerabilityService --> PurlStatus : constructs via from_head
PurlStatus --> cpe_Model : optional context via to_string
AnalysisDetailsV3 --> AnalysisPurlStatus : contains
AnalysisPurlStatus --> PurlStatus : wraps
Flow diagram for building combined purl_status and product_status SQLflowchart TD
A[Start: build SQL for a requested PURL] --> B[Determine namespace condition for base_purl]
B --> C[Build purl_status_sql string
SELECT ... FROM purl_status ...]
C --> D[Create purl_status_query using Statement::from_sql_and_values
with requested_purl, name, type, version]
D --> E[Determine package_condition
- if namespace present: product_status.package = full_name OR name
- else: product_status.package = name]
E --> F[Build product_status_sql string
SELECT ... FROM product_status
JOIN status, vulnerability, product_version_range, version_range
LEFT JOIN cpe ON product_status.context_cpe_id = cpe.id
WHERE package_condition AND product_status.package IS NOT NULL
AND status.slug NOT IN fixed, not_affected, recommended]
F --> G[Create product_status_query using Statement::from_sql_and_values
with requested_purl]
G --> H[Concatenate SQL
final_sql = purl_status_query UNION ALL product_status_query]
H --> I[Return final_sql for execution]
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The dynamic SQL composition for
package_conditionandproduct_status_sqlrelies onStatement::to_string()and string interpolation, which is brittle and backend-specific; consider using SeaORM's query builder (or consistently tracked parameter lists) to avoid subtle issues with placeholders and SQL injection/escaping. - There is a
println!left inanalyze_purls_product_statusthat prints remediations during the test run; consider removing or converting this to alogcall to keep test output clean.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The dynamic SQL composition for `package_condition` and `product_status_sql` relies on `Statement::to_string()` and string interpolation, which is brittle and backend-specific; consider using SeaORM's query builder (or consistently tracked parameter lists) to avoid subtle issues with placeholders and SQL injection/escaping.
- There is a `println!` left in `analyze_purls_product_status` that prints remediations during the test run; consider removing or converting this to a `log` call to keep test output clean.
## Individual Comments
### Comment 1
<location path="modules/fundamental/src/vulnerability/service/mod.rs" line_range="537-546" />
<code_context>
);
- Ok(Some(query.to_string()))
+ let package_condition = match &purl.namespace {
+ Some(namespace) => {
+ let full_name = format!("{}/{}", namespace, purl.name);
+ let sql = "(product_status.package = $1 OR product_status.package = $2)";
+ Statement::from_sql_and_values(
+ connection.get_database_backend(),
+ sql,
+ [full_name.into(), purl.name.clone().into()],
+ )
+ .to_string()
+ }
+ None => {
</code_context>
<issue_to_address>
**issue (bug_risk):** Building `package_condition` via `Statement::to_string()` is brittle and may silently depend on SeaORM’s debug formatting/parameter inlining behavior.
This approach couples `package_condition` to SeaORM’s `Statement::to_string()` debug behavior and its current choice to inline bound parameters. If SeaORM changes `to_string()` to keep `$1`/`$2` placeholders, `product_status_sql` will contain unbound parameters since `product_status_query` only supplies `[p.into()]`, leading to runtime errors or parameter mismatches.
Prefer either building `package_condition` as plain SQL with explicit, safely-escaped interpolations, or representing the entire `product_status_sql` as a single `Statement` with all bindings managed together, rather than embedding a stringified `Statement`.
</issue_to_address>
### Comment 2
<location path="modules/fundamental/src/vulnerability/service/test.rs" line_range="933-939" />
<code_context>
+ "All statuses should be affected"
+ );
+
+ // Should have CPE context on statuses
+ assert!(
+ details
+ .purl_statuses
+ .iter()
+ .any(|s| s.purl_status.context.is_some()),
+ "Should have CPE context from product_status"
+ );
+
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen CPE context assertions to verify the actual CPE value, not only presence
This assertion only checks that *some* `purl_status.context` is `Some`, so it would still pass with the wrong CPE or a changed format. To properly validate the new `context_cpe` wiring, assert that the returned `context` values include the specific expected CPE string(s) from the CSAF (e.g. compare against a known `"cpe:2.3:..."`), so the `cpe_map` join and `PurlStatus` serialization are actually verified.
Suggested implementation:
```rust
// All statuses should be "affected" (from known_affected)
assert!(
details
.purl_statuses
.iter()
.all(|s| s.purl_status.status == "affected"),
"All statuses should be affected"
);
// Should have CPE context on statuses and it should match the expected CPE(s)
let contexts: Vec<&str> = details
.purl_statuses
.iter()
.filter_map(|s| s.purl_status.context.as_deref())
.collect();
assert!(
!contexts.is_empty(),
"Expected at least one PurlStatus to have CPE context"
);
// Verify that the expected CPE from the CSAF is present in the contexts.
// NOTE: replace the placeholder CPE string with the specific value from the CSAF fixture.
let expected_cpe = "cpe:2.3:REPLACE_ME_WITH_EXPECTED_CPE_FROM_CSAF";
assert!(
contexts.iter().any(|c| *c == expected_cpe),
"Expected CPE context `{}` not found in PurlStatus contexts: {:?}",
expected_cpe,
contexts
);
```
To fully implement the intent of the review comment, update the `expected_cpe` placeholder string to the actual CPE value present in the `csaf/CVE-2023-20862.json` (or the relevant CSAF fixture for this test).
If multiple distinct CPEs are expected, you can extend this check by:
1. Defining a `Vec<&str>` or `HashSet<&str>` of expected CPEs.
2. Asserting that each expected CPE is contained in `contexts` (e.g. with `all(|expected| contexts.contains(expected))`).
</issue_to_address>
### Comment 3
<location path="modules/fundamental/src/vulnerability/service/test.rs" line_range="1033-1038" />
<code_context>
+ let details = &analysis.details[0];
+ assert_eq!(details.head.identifier, "CVE-2023-0044");
+
+ let remediations = details
+ .purl_statuses
+ .iter()
+ .flat_map(|status| status.remediations.clone())
+ .collect_vec();
+ assert_eq!(remediations.len(), 2);
+
+ Ok(())
</code_context>
<issue_to_address>
**suggestion (testing):** Make the remediation assertions more specific than just counting entries
Here, the test only checks that there are 2 remediations, so it would still pass if their categories or contents changed. Since this test is specific to `cve-2023-0044.json`, please also assert the expected remediation categories (e.g., `Workaround` and `NoneAvailable`) and, if possible, that key fields like `details` or `url` are present. That will better guard the product_status→remediation mapping for namespaced packages.
Suggested implementation:
```rust
let details = &analysis.details[0];
assert_eq!(details.head.identifier, "CVE-2023-0044");
let remediations = details
.purl_statuses
.iter()
.flat_map(|status| status.remediations.iter())
.collect_vec();
assert_eq!(remediations.len(), 2, "Expected exactly two remediations for CVE-2023-0044");
// Ensure we have the expected remediation categories for this CSAF document
let categories = remediations
.iter()
.map(|remediation| &remediation.category)
.collect_vec();
assert!(
categories.iter().any(|c| matches!(c, RemediationCategory::Workaround)),
"Expected at least one remediation with category Workaround"
);
assert!(
categories.iter().any(|c| matches!(c, RemediationCategory::NoneAvailable)),
"Expected at least one remediation with category NoneAvailable"
);
// Ensure each remediation has some meaningful content attached
for remediation in remediations {
assert!(
remediation.details.is_some() || remediation.url.is_some(),
"Each remediation should have either details text or a URL"
);
}
```
You may need to:
1. Import or qualify the `RemediationCategory` enum at the top of this file, for example:
`use trustify_csaf::model::RemediationCategory;`
or whatever path is used elsewhere in the codebase.
2. Adjust the field names `details` and `url` on the remediation struct if they differ
(e.g. `description` instead of `details`), so that the assertions compile correctly.
3. If `status.remediations` is already a slice or reference type, you can keep using
`.clone()` as in the original code; I switched to iterating by reference with
`.iter()` to avoid unnecessary cloning, but you can align this with the existing
conventions in the file.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // Should have CPE context on statuses | ||
| assert!( | ||
| details | ||
| .purl_statuses | ||
| .iter() | ||
| .any(|s| s.purl_status.context.is_some()), | ||
| "Should have CPE context from product_status" |
There was a problem hiding this comment.
suggestion (testing): Strengthen CPE context assertions to verify the actual CPE value, not only presence
This assertion only checks that some purl_status.context is Some, so it would still pass with the wrong CPE or a changed format. To properly validate the new context_cpe wiring, assert that the returned context values include the specific expected CPE string(s) from the CSAF (e.g. compare against a known "cpe:2.3:..."), so the cpe_map join and PurlStatus serialization are actually verified.
Suggested implementation:
// All statuses should be "affected" (from known_affected)
assert!(
details
.purl_statuses
.iter()
.all(|s| s.purl_status.status == "affected"),
"All statuses should be affected"
);
// Should have CPE context on statuses and it should match the expected CPE(s)
let contexts: Vec<&str> = details
.purl_statuses
.iter()
.filter_map(|s| s.purl_status.context.as_deref())
.collect();
assert!(
!contexts.is_empty(),
"Expected at least one PurlStatus to have CPE context"
);
// Verify that the expected CPE from the CSAF is present in the contexts.
// NOTE: replace the placeholder CPE string with the specific value from the CSAF fixture.
let expected_cpe = "cpe:2.3:REPLACE_ME_WITH_EXPECTED_CPE_FROM_CSAF";
assert!(
contexts.iter().any(|c| *c == expected_cpe),
"Expected CPE context `{}` not found in PurlStatus contexts: {:?}",
expected_cpe,
contexts
);
To fully implement the intent of the review comment, update the expected_cpe placeholder string to the actual CPE value present in the csaf/CVE-2023-20862.json (or the relevant CSAF fixture for this test).
If multiple distinct CPEs are expected, you can extend this check by:
- Defining a
Vec<&str>orHashSet<&str>of expected CPEs. - Asserting that each expected CPE is contained in
contexts(e.g. withall(|expected| contexts.contains(expected))).
| let remediations = details | ||
| .purl_statuses | ||
| .iter() | ||
| .flat_map(|status| status.remediations.clone()) | ||
| .collect_vec(); | ||
| assert_eq!(remediations.len(), 2); |
There was a problem hiding this comment.
suggestion (testing): Make the remediation assertions more specific than just counting entries
Here, the test only checks that there are 2 remediations, so it would still pass if their categories or contents changed. Since this test is specific to cve-2023-0044.json, please also assert the expected remediation categories (e.g., Workaround and NoneAvailable) and, if possible, that key fields like details or url are present. That will better guard the product_status→remediation mapping for namespaced packages.
Suggested implementation:
let details = &analysis.details[0];
assert_eq!(details.head.identifier, "CVE-2023-0044");
let remediations = details
.purl_statuses
.iter()
.flat_map(|status| status.remediations.iter())
.collect_vec();
assert_eq!(remediations.len(), 2, "Expected exactly two remediations for CVE-2023-0044");
// Ensure we have the expected remediation categories for this CSAF document
let categories = remediations
.iter()
.map(|remediation| &remediation.category)
.collect_vec();
assert!(
categories.iter().any(|c| matches!(c, RemediationCategory::Workaround)),
"Expected at least one remediation with category Workaround"
);
assert!(
categories.iter().any(|c| matches!(c, RemediationCategory::NoneAvailable)),
"Expected at least one remediation with category NoneAvailable"
);
// Ensure each remediation has some meaningful content attached
for remediation in remediations {
assert!(
remediation.details.is_some() || remediation.url.is_some(),
"Each remediation should have either details text or a URL"
);
}You may need to:
- Import or qualify the
RemediationCategoryenum at the top of this file, for example:
use trustify_csaf::model::RemediationCategory;
or whatever path is used elsewhere in the codebase. - Adjust the field names
detailsandurlon the remediation struct if they differ
(e.g.descriptioninstead ofdetails), so that the assertions compile correctly. - If
status.remediationsis already a slice or reference type, you can keep using
.clone()as in the original code; I switched to iterating by reference with
.iter()to avoid unnecessary cloning, but you can align this with the existing
conventions in the file.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2257 +/- ##
==========================================
+ Coverage 69.08% 70.51% +1.42%
==========================================
Files 407 413 +6
Lines 23047 23943 +896
Branches 23047 23943 +896
==========================================
+ Hits 15922 16883 +961
+ Misses 6231 6142 -89
- Partials 894 918 +24 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Output from /v3/vulnerability/analyze output given https://github.com/guacsec/trustify/blob/main/etc/test-data/csaf/cve-2023-0044.json document for the purl {
"vulnerability": {
"normative": true,
"identifier": "CVE-2023-0044",
"title": null,
"description": null,
"reserved": null,
"published": null,
"modified": null,
"withdrawn": null,
"discovered": null,
"released": null,
"cwes": []
},
"advisory": {
"uuid": "urn:uuid:dc6d3239-ddaa-4241-81ad-910dfe143d0e",
"identifier": "https://www.redhat.com/#CVE-2023-0044",
"document_id": "CVE-2023-0044",
"issuer": {
"id": "88d35642-663a-4daa-9982-590d74a7409d",
"name": "Red Hat Product Security",
"cpe_key": null,
"website": null
},
"published": "2023-01-04T00:00:00Z",
"modified": "2023-11-13T11:31:31Z",
"withdrawn": null,
"title": "quarkus-vertx-http: a cross-site attack may be initiated which might lead to the Information Disclosure",
"labels": {
"source": "TrustifyContext",
"type": "csaf"
}
},
"scores": [
{
"type": "3.1",
"value": 5.3,
"severity": "medium"
}
],
"average_severity": "medium",
"average_score": 5.3,
"status": "affected",
"context": {
"cpe": "cpe:/a:redhat:quarkus:2:*:*:*"
},
"version_range": {
"version_scheme_id": "rpm",
"low_version": "2.0.0",
"low_inclusive": true,
"high_version": "3.0.0",
"high_inclusive": false
},
"remediations": [
{
"id": "b8e80493-7c71-5e98-a584-9cd0fc4434d5",
"category": "workaround",
"details": "This attack can be prevented with the Quarkus CSRF Prevention feature.",
"url": null,
"data": {
"details": "This attack can be prevented with the Quarkus CSRF Prevention feature.",
"category": "workaround",
"product_ids": [
"Red Hat build of Quarkus",
"Red Hat build of Quarkus 2.7.7",
"a-mq_clients_2:quarkus-vertx-http",
"red_hat_build_of_quarkus:io.quarkus/quarkus-vertx-http",
"red_hat_integration_camel_k:quarkus-vertx-http",
"red_hat_integration_camel_quarkus:quarkus-vertx-http",
"red_hat_integration_change_data_capture:quarkus-vertx-http",
"red_hat_integration_service_registry:quarkus-vertx-http",
"red_hat_jboss_enterprise_application_platform_7:quarkus-vertx-http",
"red_hat_jboss_enterprise_application_platform_expansion_pack:quarkus-vertx-http",
"red_hat_jboss_fuse_7:quarkus-vertx-http",
"red_hat_process_automation_7:quarkus-vertx-http"
]
}
},
{
"id": "e2826a19-8598-5e67-bab2-4fe1a4c33583",
"category": "none_available",
"details": "Affected",
"url": null,
"data": {
"details": "Affected",
"category": "none_available",
"product_ids": [
"red_hat_build_of_quarkus:io.quarkus/quarkus-vertx-http"
]
}
}
]
} |
|
Sorry for asking maybe basic questions, I wasn't part of the discussions up until now. I'd like to understand the motivation. If the user provides an SBOM (most likely, of their own project), which contains a package I am not sure, that this is what users are looking for. So, is that the intention? |
The |
|
Thanks for linking the PR, that should help me understanding things. |
After reading this, I'm still not sure that this what the user wants. But to my understanding you all discussed this and this is what we want. |
|
It looks like the PR does some significant changes on database queries. So I want to be sure to understand the performance impact on this. Which endpoints would be affected by this? And do we have tests in scale testing (https://github.com/guacsec/trustify-scale-testing/blob/main/src/restapi.rs) covering them? If not, we should add them first. As those endpoints seem to exist already. This PR is just adding more data to it. |
There was a problem hiding this comment.
This function is getting a bit too big. We should start splitting it up.
|
|
||
| let product_status_sql = format!( | ||
| r#" | ||
| SELECT |
There was a problem hiding this comment.
There is a lot of code duplication with the previous SQL statement. It should be possible to reduce that by introducing a common function, helping to build that statement.
There was a problem hiding this comment.
Ive tried to address this in the below commit, let me know if this works for you or youd rather a different approach?
e32ad98
|
|
||
| #[test_context(TrustifyContext)] | ||
| #[test(tokio::test)] | ||
| async fn analyze_purls_product_status(ctx: &TrustifyContext) -> Result<(), anyhow::Error> { |
There was a problem hiding this comment.
I think it would be good to have a quick summary of with both tests are supposed to test.
|
|
||
| #[test_context(TrustifyContext)] | ||
| #[test(tokio::test)] | ||
| async fn analyze_purls_product_status(ctx: &TrustifyContext) -> Result<(), anyhow::Error> { |
There was a problem hiding this comment.
Why not transform those tests into endpoint tests? Covering the endpoint handling as well.
If Im reading the CSAF document right (which I may not be), the remediation linkage seems correct: The reason
The affected endpoints should be /api/v3/purl/recommend and /api/v2/vulnerability/analyze. The former doesn't have any scale tests currently (Ill go ahead and add some), the latter has scale tests (https://github.com/guacsec/trustify-scale-testing/blob/main/src/restapi.rs#L355C33-L368) Thanks for the review feedback so far! @ctron |
|
So, to my understanding, the CSAF document contains a list of vulnerabilities. And each vulnerability lists:
So the data is modeled in a way that makes is easy to transport information, for the use case of CSAF. However, that doesn't mean that just returning the whole section makes sense.
This field defines for which product IDs this remediation entry relates to. However, when returning a remediation entry for "component A", it doesn't seem to make sense to return the information that the same remediation also applies to "component B". |

Summary by Sourcery
Extend vulnerability analysis to include CSAF product_status data with CPE-based product identifiers alongside existing PURL-based matching.
New Features:
Enhancements:
Tests: