refactor: fetch source_document alongside, and only what's required#2299
refactor: fetch source_document alongside, and only what's required#2299ctron wants to merge 18 commits intoguacsec:mainfrom
Conversation
Reviewer's GuideRefactors SBOM fetching and related APIs to eagerly load SBOM+node+source_document via SelectThree, introduces a generic IntoPackage abstraction and v3 SBOM listing that returns lightweight package summaries, updates delete/related/package endpoints and OpenAPI/utoipa metadata accordingly, and adds DB/util/migration support (SelectThree limiting/filtering, new p2p index, slow-SQL logging, and auth schema test). Sequence diagram for updated SBOM delete endpoint behaviorsequenceDiagram
actor User
participant SbomEndpoints_delete as SbomEndpoints_delete
participant Authorizer
participant Database as Db
participant SbomService
participant Storage
User->>SbomEndpoints_delete: DELETE /api/v2/sbom/{id}
SbomEndpoints_delete->>Authorizer: require(user, ReadSbom)
Authorizer-->>SbomEndpoints_delete: ok
SbomEndpoints_delete->>Db: begin()
Db-->>SbomEndpoints_delete: tx
SbomEndpoints_delete->>SbomService: fetch_sbom(id, &tx)
SbomService->>Db: SELECT sbom + sbom_node + source_document
Db-->>SbomService: Option<(sbom, sbom_node, source_document)>
SbomService-->>SbomEndpoints_delete: Some((sbom, sbom_node, source_document)) or None
alt SBOM not found
SbomEndpoints_delete-->>User: 404 Not Found
else SBOM found
SbomEndpoints_delete->>SbomService: delete_sbom(sbom.sbom_id, &tx)
SbomService->>Db: DELETE sbom and related rows
Db-->>SbomService: deleted:true/false
SbomService-->>SbomEndpoints_delete: true or false
alt delete returned false
SbomEndpoints_delete-->>User: 404 Not Found
else delete returned true
SbomEndpoints_delete->>Db: tx.commit()
Db-->>SbomEndpoints_delete: committed
SbomEndpoints_delete->>Storage: delete_doc(SourceDocument::from_entity(&source_document))
Storage-->>SbomEndpoints_delete: Ok or Err(logged and ignored)
SbomEndpoints_delete-->>User: 204 No Content
end
end
ER diagram for SBOM, node, source_document, and package relationshipserDiagram
SBOM {
uuid sbom_id
uuid document_id
string document_namespace
datetime published
jsonb authors
jsonb suppliers
jsonb data_licenses
uuid source_document_id
}
SBOM_NODE {
uuid node_id
uuid sbom_id
string name
}
SOURCE_DOCUMENT {
uuid id
string content_type
string location
}
SBOM_PACKAGE {
string id
uuid sbom_id
string name
string group
string version
}
PACKAGE_RELATES_TO_PACKAGE {
uuid sbom_id
string relationship
uuid left_node_id
uuid right_node_id
index idx_p2p_right_id "(sbom_id, relationship, right_node_id)"
}
SBOM ||--o{ SBOM_NODE : has_node
SBOM ||--o{ SBOM_PACKAGE : has_package
SBOM ||--o{ PACKAGE_RELATES_TO_PACKAGE : has_relation
SOURCE_DOCUMENT ||--|| SBOM : is_source_of
SBOM_NODE ||--|| SBOM_PACKAGE : describes
PACKAGE_RELATES_TO_PACKAGE }o--|| SBOM_NODE : left_node
PACKAGE_RELATES_TO_PACKAGE }o--|| SBOM_NODE : right_node
Class diagram for SBOM package abstractions and IntoPackage refactorclassDiagram
direction LR
class SbomService {
+fetch_sbom~C:ConnectionTrait~(id:Id, connection:C) Result~Option~Tuple~sbom::Model,sbom_node::Model,source_document::Model~~~~, Error~
+fetch_sboms~C:ConnectionTrait, P:IntoPackage~(search:Query, paginated:Paginated, options:FetchOptions, connection:C) Result~PaginatedResults~SbomSummary~P~~~~, Error~
+describes_packages~C:ConnectionTrait, R:Resulting, P:IntoPackage~(sbom_id:Uuid, paginated:R, db:C) Result~R::Output~P~, Error~
+fetch_related_packages~C:ConnectionTrait, R:Resulting, P:IntoPackage~(sbom_id:Uuid, search:Query, options:R, reference:SbomNodeReference, relationship:Option~Relationship~, db:C) Result~R::Output~SbomPackageRelation~P~~~~, Error~
}
class IntoPackage {
<<trait>>
+type Row
+build_query(query:Select~package_relates_to_package::Entity~) Select~package_relates_to_package::Entity~
+from_row(row:Row) Self
}
class SbomPackageSummary {
+id:String
+name:String
+group:Option~String~
+version:Option~String~
}
class SbomPackage {
+id:String
+name:String
+group:Option~String~
+version:Option~String~
+purl:Vec~PurlSummary~
+cpe:Vec~String~
+licenses:Vec~LicenseInfo~
+licenses_ref_mapping:Vec~LicenseRefMapping~
}
class SbomPackageRelation_P {
<<generic P:IntoPackage>>
+relationship:Relationship
+package:P
}
class PackageCatcherBase {
+id:String
+name:String
+group:Option~String~
+version:Option~String~
}
class PackageCatcher {
+base:PackageCatcherBase
+purls:Vec~Value~
+cpes:Value
+licenses:Vec~LicenseBasicInfo~
}
class LicenseBasicInfo {
+id:String
+name:String
+license_type:i32
}
class PaginatedResults_T {
<<generic T>>
+items:Vec~T~
+total:u64
+new~C,S1,S2~(db:C, fetch_selector:S1, count_selector:S2) PaginatedResults~R~
+default() PaginatedResults~T~
}
IntoPackage <|.. SbomPackageSummary
IntoPackage <|.. SbomPackage
SbomPackageSummary ..> PackageCatcherBase : Row = PackageCatcherBase
SbomPackage ..> PackageCatcher : Row = PackageCatcher
SbomService --> IntoPackage : generic P
SbomService --> SbomPackage
SbomService --> SbomPackageSummary
SbomService --> SbomPackageRelation_P
SbomPackageRelation_P --> IntoPackage : P
PackageCatcher --> PackageCatcherBase : base
PackageCatcher --> LicenseBasicInfo
PaginatedResults_T o--> "*" T
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
a19e319 to
3d95abf
Compare
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The new
.filter_map(|(sbom, node, source_document)| Some((sbom, node?, source_document?)))in the listing queries will now silently drop SBOMs that are missing a node or source document; if this is not intentional, consider either making the join inner or handling theNonecases explicitly (e.g., erroring or providing a fallback) instead of filtering them out. - The tuple normalization from
(sbom, Option<sbom_node>, Option<source_document>)to(sbom, sbom_node, source_document)is duplicated in several places; consider extracting a small helper (e.g.,fn require_related(...) -> Option<...>) to centralize this logic and make the intent and failure mode clearer.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `.filter_map(|(sbom, node, source_document)| Some((sbom, node?, source_document?)))` in the listing queries will now silently drop SBOMs that are missing a node or source document; if this is not intentional, consider either making the join inner or handling the `None` cases explicitly (e.g., erroring or providing a fallback) instead of filtering them out.
- The tuple normalization from `(sbom, Option<sbom_node>, Option<source_document>)` to `(sbom, sbom_node, source_document)` is duplicated in several places; consider extracting a small helper (e.g., `fn require_related(...) -> Option<...>`) to centralize this logic and make the intent and failure mode clearer.
## Individual Comments
### Comment 1
<location path="modules/fundamental/src/sbom/model/mod.rs" line_range="88-95" />
<code_context>
#[instrument(skip(service, tx), err(level=tracing::Level::INFO))]
pub async fn from_entity<C>(
- (sbom, node): (sbom::Model, Option<sbom_node::Model>),
+ (sbom, node, source_document): (sbom::Model, sbom_node::Model, source_document::Model),
service: &SbomService,
tx: &C,
</code_context>
<issue_to_address>
**question (bug_risk):** `SbomSummary::from_entity` now assumes node and source document always exist; consider whether failing earlier with an explicit error would make missing data easier to diagnose.
Previously, missing data either returned `Ok(None)` (no node) or an explicit `Error::NotFound` (no source document). Now this function assumes both `node` and `source_document` exist and depends on earlier filtering to enforce that.
If every SBOM must have both, consider keeping this function defensive by asserting that invariant or returning a clear error when it’s violated, so join/data issues are easier to trace. If the current behavior is intentional, it’d help to document that this function only accepts fully joined rows to avoid accidental use with partial data in the future.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| (sbom, node, source_document): (sbom::Model, sbom_node::Model, source_document::Model), | ||
| service: &SbomService, | ||
| db: &C, | ||
| ) -> Result<Option<SbomSummary>, Error> { | ||
| ) -> Result<SbomSummary, Error> { | ||
| // TODO: consider improving the n-select issues here | ||
| let described_by = service.describes_packages(sbom.sbom_id, (), db).await?; | ||
|
|
||
| let source_document = sbom | ||
| .find_related(source_document::Entity) | ||
| .one(db) | ||
| .await? | ||
| .ok_or_else(|| Error::NotFound("Missing source document".to_string()))?; | ||
|
|
||
| Ok(match node { | ||
| Some(_) => Some(SbomSummary { | ||
| head: SbomHead::from_entity(&sbom, node, db).await?, | ||
| source_document: SourceDocument::from_entity(&source_document), | ||
| described_by, | ||
| }), | ||
| None => None, | ||
| Ok(SbomSummary { |
There was a problem hiding this comment.
question (bug_risk): SbomSummary::from_entity now assumes node and source document always exist; consider whether failing earlier with an explicit error would make missing data easier to diagnose.
Previously, missing data either returned Ok(None) (no node) or an explicit Error::NotFound (no source document). Now this function assumes both node and source_document exist and depends on earlier filtering to enforce that.
If every SBOM must have both, consider keeping this function defensive by asserting that invariant or returning a clear error when it’s violated, so join/data issues are easier to trace. If the current behavior is intentional, it’d help to document that this function only accepts fully joined rows to avoid accidental use with partial data in the future.
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The new
fetch_sbomand list queries silently drop SBOMs when the node or source document is missing (node?/source_document?in the tuple andfilter_map), which can hide data integrity issues; consider either surfacing an explicit error or returning a 404 instead of skipping these records. - The repeated pattern
filter_map(|(sbom, node, source_document)| Some((sbom, node?, source_document?)))is a bit opaque and duplicated; a small helper to enforcesbom_node/source_documentpresence (with a clear error path) would make the intent clearer and reduce repetition.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `fetch_sbom` and list queries silently drop SBOMs when the node or source document is missing (`node?` / `source_document?` in the tuple and `filter_map`), which can hide data integrity issues; consider either surfacing an explicit error or returning a 404 instead of skipping these records.
- The repeated pattern `filter_map(|(sbom, node, source_document)| Some((sbom, node?, source_document?)))` is a bit opaque and duplicated; a small helper to enforce `sbom_node`/`source_document` presence (with a clear error path) would make the intent clearer and reduce repetition.
## Individual Comments
### Comment 1
<location path="modules/fundamental/src/sbom/endpoints/mod.rs" line_range="334" />
<code_context>
),
responses(
- (status = 200, description = "Matching SBOM", body = SbomSummary),
+ (status = 204, description = "Matching SBOM as deleted"),
(status = 404, description = "The SBOM could not be found"),
),
</code_context>
<issue_to_address>
**nitpick:** OpenAPI description suggests a body for a 204 response, which cannot have one.
The handler returns `HttpResponse::NoContent().finish()`, so the 204 has no body. The description "Matching SBOM as deleted" can be read as if a body is returned. Please rephrase to make the empty response explicit (e.g. "SBOM deleted" or "Deletion successful; no content returned") to keep generated API docs/clients unambiguous.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| ), | ||
| responses( | ||
| (status = 200, description = "Matching SBOM", body = SbomSummary), | ||
| (status = 204, description = "Matching SBOM as deleted"), |
There was a problem hiding this comment.
nitpick: OpenAPI description suggests a body for a 204 response, which cannot have one.
The handler returns HttpResponse::NoContent().finish(), so the 204 has no body. The description "Matching SBOM as deleted" can be read as if a body is returned. Please rephrase to make the empty response explicit (e.g. "SBOM deleted" or "Deletion successful; no content returned") to keep generated API docs/clients unambiguous.
aad43fc to
1d977c0
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2299 +/- ##
==========================================
- Coverage 67.71% 67.47% -0.24%
==========================================
Files 433 435 +2
Lines 24875 24692 -183
Branches 24875 24692 -183
==========================================
- Hits 16843 16660 -183
- Misses 7111 7139 +28
+ Partials 921 893 -28 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
aff35fc to
436df3c
Compare
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- Both
fetch_sbomandfetch_sbomsnow silently drop SBOMs where thesbom_nodeorsource_documentisNone(vianode?/source_document?infilter_map), which changes previous behavior and can hide data inconsistencies; consider either surfacing a specific error or at least logging when such rows are discarded. - The new v3
/api/v3/sbomendpoint is declared in OpenAPI (andutoipa::path) as returningPaginatedResults<SbomPackageSummary>, but the implementation still usesfetch_sbomswhich returnsPaginatedResults<SbomSummary<P>>; align the schema or the implementation so the wire format and OpenAPI contract match.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Both `fetch_sbom` and `fetch_sboms` now silently drop SBOMs where the `sbom_node` or `source_document` is `None` (via `node?` / `source_document?` in `filter_map`), which changes previous behavior and can hide data inconsistencies; consider either surfacing a specific error or at least logging when such rows are discarded.
- The new v3 `/api/v3/sbom` endpoint is declared in OpenAPI (and `utoipa::path`) as returning `PaginatedResults<SbomPackageSummary>`, but the implementation still uses `fetch_sboms` which returns `PaginatedResults<SbomSummary<P>>`; align the schema or the implementation so the wire format and OpenAPI contract match.
## Individual Comments
### Comment 1
<location path="modules/fundamental/src/sbom/service/sbom.rs" line_range="82-84" />
<code_context>
- .find_also_linked(SbomNodeLink)
- .one(connection)
- .await?)
+ let map = |(sbom, node, source_document)| Some((sbom, node?, source_document?));
+
+ Ok(select.one(connection).await?.and_then(map))
}
</code_context>
<issue_to_address>
**issue (bug_risk):** `fetch_sbom` now silently drops SBOMs missing a node or source document instead of failing on missing source documents
Previously, a missing `source_document` caused `SbomSummary::from_entity` to return `Error::NotFound(..)`, while a missing `sbom_node` yielded `Ok(None)` and was filtered out. With the new mapping, both `sbom_node` and `source_document` are required, but either being missing now returns `None`, making this indistinguishable from a non‑existent SBOM id to the caller. If missing `source_document` should still be treated as a data integrity error, consider preserving the error behavior for that case (and only treating missing `sbom_node` as absence), or at least log when a row is dropped due to a missing join.
</issue_to_address>
### Comment 2
<location path="modules/fundamental/src/sbom/service/sbom.rs" line_range="300-307" />
<code_context>
- .try_filter_map(futures_util::future::ok)
- .try_collect()
- .await?;
+ let items = stream::iter(
+ sboms
+ .into_iter()
+ .filter_map(|(sbom, node, source_document)| Some((sbom, node?, source_document?))),
+ )
+ .then(|row| async { SbomSummary::from_entity(row, self, connection).await })
+ .try_collect()
+ .instrument(info_span!("from_entity"))
+ .await?;
</code_context>
<issue_to_address>
**issue (bug_risk):** `fetch_sboms` now also silently drops SBOM rows with missing node or source document, changing error semantics
The new `filter_map(|(sbom, node, source_document)| Some((sbom, node?, source_document?)))` now skips any row missing `sbom_node` or `source_document`. Previously, only missing nodes were skipped; missing `source_document` caused `SbomSummary::from_entity` to error. This changes external behaviour (fewer results instead of an error) and can hide data issues. If this change is intentional, consider at least logging skipped rows or still erroring on missing `source_document` so data problems aren’t silently masked.
</issue_to_address>
### Comment 3
<location path="modules/fundamental/src/sbom/model/mod.rs" line_range="80-81" />
<code_context>
}
#[derive(Serialize, Deserialize, Debug, Clone, ToSchema)]
-pub struct SbomSummary {
+pub struct SbomSummary<P: IntoPackage = SbomPackage> {
#[serde(flatten)]
pub head: SbomHead,
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `ToSchema` on a generic `SbomSummary<P>` may be fragile for OpenAPI generation
Deriving `ToSchema` on a generic `SbomSummary<P: IntoPackage = SbomPackage>` requires utoipa to describe a generic type, which may not be well-supported and can lead to compile errors or odd schemas, especially with the default type parameter. Since the API only exposes specific instantiations (e.g. with `SbomPackage` or `SbomPackageSummary`), consider either introducing concrete wrapper types for those cases or removing `ToSchema` from the generic and deriving it only on the concrete types/aliases used in the OpenAPI spec.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| let map = |(sbom, node, source_document)| Some((sbom, node?, source_document?)); | ||
|
|
||
| Ok(select.one(connection).await?.and_then(map)) |
There was a problem hiding this comment.
issue (bug_risk): fetch_sbom now silently drops SBOMs missing a node or source document instead of failing on missing source documents
Previously, a missing source_document caused SbomSummary::from_entity to return Error::NotFound(..), while a missing sbom_node yielded Ok(None) and was filtered out. With the new mapping, both sbom_node and source_document are required, but either being missing now returns None, making this indistinguishable from a non‑existent SBOM id to the caller. If missing source_document should still be treated as a data integrity error, consider preserving the error behavior for that case (and only treating missing sbom_node as absence), or at least log when a row is dropped due to a missing join.
| let items = stream::iter( | ||
| sboms | ||
| .into_iter() | ||
| .filter_map(|(sbom, node, source_document)| Some((sbom, node?, source_document?))), | ||
| ) | ||
| .then(|row| async { SbomSummary::from_entity(row, self, connection).await }) | ||
| .try_collect() | ||
| .instrument(info_span!("from_entity")) |
There was a problem hiding this comment.
issue (bug_risk): fetch_sboms now also silently drops SBOM rows with missing node or source document, changing error semantics
The new filter_map(|(sbom, node, source_document)| Some((sbom, node?, source_document?))) now skips any row missing sbom_node or source_document. Previously, only missing nodes were skipped; missing source_document caused SbomSummary::from_entity to error. This changes external behaviour (fewer results instead of an error) and can hide data issues. If this change is intentional, consider at least logging skipped rows or still erroring on missing source_document so data problems aren’t silently masked.
ac56fbb to
275994d
Compare
|
/scale-test |
|
🛠️ Scale test has started! Follow the progress here: Workflow Run |
Instead of fully fetching all information of an SBOM and then just using the ID, this fetches the bare minimum. But then also fetches the source_document in the same step. Passing on to other parts which need it. This should prevent one N+1 select issues.
This adds a v3 sbom list endpoint, which returns less information, but is much faster. Also, allow overriding the slow SQL threshold.
This was a red herring from AI. Node IDs are strings.
Co-authored-by: Claude <noreply@anthropic.com>
Fix license filter using UNION subquery instead of OR'd IN subqueries to avoid full table scan on qualified_purl Co-authored-by: Claude <noreply@anthropic.com>
Replaces OR'd IN subqueries with a single UNION subquery in fetch_sboms and fetch_sbom_packages, avoiding full table scans on sbom and sbom_package. Co-authored-by: Claude <noreply@anthropic.com>
Instrumentation on the sync `new` function doesn't add much.
275994d to
19b457f
Compare
Goose ReportGoose Attack ReportPlan Overview
Request Metrics
Response Time Metrics
Status Code Metrics
Transaction Metrics
Scenario Metrics
📄 Full Report (Go to "Artifacts" and download report) |
Instead of fully fetching all information of an SBOM and then just using the ID, this fetches the bare minimum. But then also fetches the source_document in the same step. Passing on to other parts which need it. This should prevent one N+1 select issues.
Summary by Sourcery
Optimize SBOM fetching to include only required data and source documents, introduce a more flexible package representation, and add a new v3 SBOM listing API while deprecating older behaviors.
New Features:
Bug Fixes:
Enhancements:
Documentation:
Tests:
Chores: