Skip to content

Commit 4d438c1

Browse files
committed
Merge remote-tracking branch 'origin/main' into decomm-vuln-analyzer
2 parents 4fc723a + 5657f47 commit 4d438c1

File tree

6 files changed

+295
-69
lines changed

6 files changed

+295
-69
lines changed

.github/workflows/update-config-docs.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ jobs:
4646
repository: DependencyTrack/hyades-apiserver
4747
path: hyades-apiserver
4848
- name: Generate API Server Documentation
49-
uses: jbangdev/jbang-action@c31b3ed5004fff51232945e77a2c09dd7c0df37d # tag=v0.136.0
49+
uses: jbangdev/jbang-action@7b0e42ecace72f0a0df37a1574c8c57b0d5a77aa # tag=v0.137.0
5050
with:
5151
trust: https://github.com/DependencyTrack/jbang-catalog
5252
script: gen-config-docs@DependencyTrack
@@ -55,7 +55,7 @@ jobs:
5555
--output ./docs/reference/configuration/api-server.md
5656
./hyades-apiserver/apiserver/src/main/resources/application.properties
5757
- name: Generate Repository Metadata Analyzer Documentation
58-
uses: jbangdev/jbang-action@c31b3ed5004fff51232945e77a2c09dd7c0df37d # tag=v0.136.0
58+
uses: jbangdev/jbang-action@7b0e42ecace72f0a0df37a1574c8c57b0d5a77aa # tag=v0.137.0
5959
with:
6060
trust: https://github.com/DependencyTrack/jbang-catalog
6161
script: gen-config-docs@DependencyTrack
Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
| Status | Date | Author(s) |
2+
|:---------|:-----------|:-------------------------------------|
3+
| Accepted | 2026-02-23 | [@nscuro](https://github.com/nscuro) |
4+
5+
## Context
6+
7+
Vulnerability aliases are stored in the denormalized `VULNERABILITYALIAS` table:
8+
9+
| Column | Type | Constraints |
10+
|:------------|:-------|:-----------------|
11+
| ID | BIGINT | PK |
12+
| CVE_ID | TEXT | |
13+
| GHSA_ID | TEXT | |
14+
| GSD_ID | TEXT | |
15+
| INTERNAL_ID | TEXT | |
16+
| OSV_ID | TEXT | |
17+
| SNYK_ID | TEXT | |
18+
| SONATYPE_ID | TEXT | |
19+
| VULNDB_ID | TEXT | |
20+
| UUID | UUID | NOT NULL, UNIQUE |
21+
22+
This design poses a few challenges:
23+
24+
* Rows lack a natural key, making it impossible to detect and prevent duplicates.
25+
* Modifying rows (i.e. adding a new ID to an existing alias group) is prone to race conditions.
26+
* Due to the combination of the points above, batching operations on this table is not possible.
27+
* Vulnerability sources are hardcoded as columns, making it unnecessarily challenging to add new sources.
28+
* Querying the table is unnecessarily hard, as it requires the caller to know what column to query on.
29+
* The lack of provenance for alias relationships prevents safe removal of relationships,
30+
e.g. when upstream sources correct their data.
31+
32+
The [logic to create or modify alias records](https://github.com/DependencyTrack/hyades-apiserver/blob/f969a32387c03b45eff186e2fcc4ba900a7059f9/apiserver/src/main/java/org/dependencytrack/persistence/VulnerabilityQueryManager.java#L474-L591)
33+
is brittle and non-deterministic. Making it concurrency-safe would require acquisition of coarse advisory locks.
34+
35+
Alias synchronization unfortunately is in the hot path for vulnerability analysis result reconciliation,
36+
and is performed concurrently with potentially overlapping data. To ensure that synchronization is both
37+
performant and correct, we need a solution that allows us to batch database operations,
38+
while effectively shielding us against data races.
39+
40+
## Decision
41+
42+
### Schema
43+
44+
Normalize the data into a new `VULNERABILITY_ALIAS` table with the following schema:
45+
46+
| Column | Type | Constraints |
47+
|:---------|:-----|:------------|
48+
| GROUP_ID | UUID | NOT NULL |
49+
| SOURCE | TEXT | PK |
50+
| VULN_ID | TEXT | PK |
51+
52+
* The separate ID columns are collapsed into `SOURCE` and `VULN_ID`.
53+
* `SOURCE` and `VULN_ID` form the natural (primary) key, effectively preventing duplicates.
54+
* Alias relationships are identified via matching `GROUP_ID`.
55+
56+
### Querying
57+
58+
To query all aliases of a vulnerability identified by `source` and `vulnId`, *excluding the input pair itself*:
59+
60+
```sql linenums="1"
61+
SELECT va.*
62+
FROM "VULNERABILITY_ALIAS" AS va
63+
WHERE va."GROUP_ID" IN (
64+
SELECT va2."GROUP_ID"
65+
FROM "VULNERABILITY_ALIAS" AS va2
66+
WHERE va2."SOURCE" = :source
67+
AND va2."VULN_ID" = :vulnId
68+
)
69+
AND (va."SOURCE", va."VULN_ID") != (:source, :vulnId)
70+
```
71+
72+
### Alias Assertions
73+
74+
To track provenance of alias relationships, a separate `VULNERABILITY_ALIAS_ASSERTION` table records
75+
which entity asserted that two vulnerabilities are aliases:
76+
77+
| Column | Type | Constraints |
78+
|:-------------|:---------------|:------------------------|
79+
| ASSERTER | TEXT | PK |
80+
| VULN_SOURCE | TEXT | PK |
81+
| VULN_ID | TEXT | PK |
82+
| ALIAS_SOURCE | TEXT | PK |
83+
| ALIAS_ID | TEXT | PK |
84+
| CREATED_AT | TIMESTAMPTZ(3) | NOT NULL, DEFAULT NOW() |
85+
86+
Each row records that `ASSERTER` claimed (`VULN_SOURCE`, `VULN_ID`) and (`ALIAS_SOURCE`, `ALIAS_ID`)
87+
are aliases. Assertions are directional: (`VULN_SOURCE`, `VULN_ID`) is the declaring vulnerability,
88+
(`ALIAS_SOURCE`, `ALIAS_ID`) is the alias attributed to it. This enables efficient reconciliation
89+
by querying existing assertions for a given vulnerability.
90+
91+
Alias groups in the `VULNERABILITY_ALIAS` table are derived from assertions and serve as a
92+
materialized view for efficient read queries. They are recomputed whenever assertions change.
93+
Assertions provide an audit trail and enable workflows such as revoking assertions from
94+
a specific source, without affecting others.
95+
96+
### Synchronization Algorithm
97+
98+
Given an asserter (e.g. `NVD`) and a map of declaring vulnerabilities to their asserted aliases:
99+
100+
```js linenums="1"
101+
{
102+
{source: 'NVD', vulnId: 'CVE-1'}: [
103+
{source: 'GITHUB', vulnId: 'GHSA-1'},
104+
{source: 'SNYK', vulnId: 'SNYK-1'}
105+
]
106+
}
107+
```
108+
109+
1. Begin transaction.
110+
2. Acquire PostgreSQL advisory locks for all declaring vulnerabilities,
111+
ordered by key to prevent deadlocks between concurrent transactions:
112+
```sql linenums="1"
113+
SELECT PG_ADVISORY_XACT_LOCK(HASHTEXT(key))
114+
FROM (
115+
SELECT DISTINCT UNNEST(ARRAY['vuln-alias-sync|NVD|CVE-1']) AS key
116+
ORDER BY 1
117+
) AS t
118+
```
119+
3. Fetch existing assertions for the declaring vulnerabilities:
120+
```sql linenums="1"
121+
SELECT "ASSERTER"
122+
, "VULN_SOURCE"
123+
, "VULN_ID"
124+
, "ALIAS_SOURCE"
125+
, "ALIAS_ID"
126+
FROM "VULNERABILITY_ALIAS_ASSERTION"
127+
WHERE ("VULN_SOURCE", "VULN_ID") IN (SELECT * FROM UNNEST(:sources, :vulnIds))
128+
```
129+
4. Reconcile incoming aliases against existing assertions, scoped to the current asserter:
130+
* Assertions to create: incoming alias keys minus existing alias keys for this asserter.
131+
* Assertions to delete: existing alias keys for this asserter minus incoming alias keys.
132+
* `UNKNOWN` cleanup: if the asserter is not `UNKNOWN` and `UNKNOWN` assertions
133+
exist for the same declaring vulnerability, mark it for removal.
134+
5. Delete stale assertions:
135+
```sql linenums="1"
136+
DELETE
137+
FROM "VULNERABILITY_ALIAS_ASSERTION"
138+
WHERE ("ASSERTER", "VULN_SOURCE", "VULN_ID", "ALIAS_SOURCE", "ALIAS_ID")
139+
IN (SELECT * FROM UNNEST(:asserters, :vulnSources, :vulnIds, :aliasSources, :aliasIds))
140+
```
141+
6. Create new assertions:
142+
```sql linenums="1"
143+
INSERT INTO "VULNERABILITY_ALIAS_ASSERTION" (
144+
"ASSERTER"
145+
, "VULN_SOURCE"
146+
, "VULN_ID"
147+
, "ALIAS_SOURCE"
148+
, "ALIAS_ID"
149+
)
150+
SELECT *
151+
FROM UNNEST(:asserters, :vulnSources, :vulnIds, :aliasSources, :aliasIds)
152+
```
153+
7. Delete `UNKNOWN` assertions for declaring vulnerabilities where a real asserter now provides claims:
154+
```sql linenums="1"
155+
DELETE
156+
FROM "VULNERABILITY_ALIAS_ASSERTION"
157+
WHERE "ASSERTER" = 'UNKNOWN'
158+
AND ("VULN_SOURCE", "VULN_ID") IN (SELECT * FROM UNNEST(:sources, :vulnIds))
159+
```
160+
8. Recompute alias groups for all modified vulnerabilities:
161+
1. Expand transitively: iteratively query both `VULNERABILITY_ALIAS` and
162+
`VULNERABILITY_ALIAS_ASSERTION` to discover all transitively related keys.
163+
For example, if `CVE-1` is being linked to `GHSA-1`, but `GHSA-1` already
164+
has an assertion linking it to `GHSA-2`, expansion ensures `GHSA-2` is included.
165+
2. Build a [union-find] from the expanded assertions to compute [connected components].
166+
3. For each component, pick the lowest existing group UUID (deterministic via sorted set),
167+
or generate a new one if the component has no prior group.
168+
4. Upsert alias records, only writing when the group ID actually changed:
169+
```sql linenums="1"
170+
INSERT INTO "VULNERABILITY_ALIAS" AS va ("GROUP_ID", "SOURCE", "VULN_ID")
171+
SELECT * FROM UNNEST(:groupIds, :sources, :vulnIds)
172+
ON CONFLICT ("SOURCE", "VULN_ID") DO UPDATE
173+
SET "GROUP_ID" = EXCLUDED."GROUP_ID"
174+
WHERE va."GROUP_ID" IS DISTINCT FROM EXCLUDED."GROUP_ID"
175+
```
176+
5. Delete orphaned aliases no longer backed by any assertion.
177+
9. Commit transaction and release locks (implicit).
178+
179+
!!! note
180+
Advisory locks are scoped to *declaring* vulnerability only. This is sufficient because
181+
assertions are directional: a given asserter always writes assertions under the declaring
182+
vulnerability it owns (e.g. NVD writes assertions under `NVD|CVE-*`).
183+
184+
All `SELECT`, `DELETE`, and `INSERT` operations are batched via `UNNEST`, allowing multiple
185+
vulnerabilities to be processed in a single transaction with minimal round trips.
186+
The upsert's `WHERE ... IS DISTINCT FROM` clause avoids unnecessary writes.
187+
188+
### Data Migration
189+
190+
Existing data is migrated from `VULNERABILITYALIAS` to `VULNERABILITY_ALIAS` via Liquibase.
191+
The migration replicates the [synchronization algorithm](#synchronization-algorithm) in SQL.
192+
193+
The old `VULNERABILITYALIAS` table is dropped afterwards.
194+
195+
Assertions are seeded from the migrated alias groups. For each group, one assertion per unordered
196+
pair of members is inserted with `ASSERTER = 'UNKNOWN'`, since the original data does not carry
197+
provenance information.
198+
199+
An integration test verifies that the migration works as expected,
200+
including the handling of potential duplicates in the existing data set,
201+
and the correctness of seeded assertions.
202+
203+
## Consequences
204+
205+
* Adding new vulnerability sources requires no schema changes.
206+
* Alias synchronization can be fully batched, reducing round trips in the hot path.
207+
* The natural primary key prevents duplicate alias entries by construction.
208+
* Querying aliases is uniform, and callers no longer need source-specific column knowledge.
209+
* The old `UUID` column is dropped. Any external references to alias records by UUID will break.
210+
No known external consumers depend on this identifier.
211+
* Advisory locks add contention under concurrent writes to overlapping alias sets.
212+
This is bounded by the lock granularity (per declaring vulnerability key), and acceptable
213+
given the correctness guarantees it provides.
214+
* Alias group recomputation requires transitive expansion, which issues additional queries.
215+
In practice, alias groups are small (< 5 members), so this is negligible.
216+
* Alias assertions provide provenance but grow linearly with the number of aliases per
217+
declaring vulnerability. Given the small expected group sizes, this is acceptable.
218+
* `UNKNOWN` assertions seeded during migration are automatically superseded when a real
219+
asserter (e.g. NVD, GitHub) provides claims for the same declaring vulnerability.
220+
221+
[connected components]: https://en.wikipedia.org/wiki/Component_(graph_theory)
222+
[union-find]: https://en.wikipedia.org/wiki/Disjoint-set_data_structure

docs/architecture/design/workflow-state-tracking.md renamed to docs/architecture/design/archive/workflow-state-tracking.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
!!! warning "Superseded"
2+
This document describes an old design that has been superseded by [durable execution](../durable-execution.md).
3+
14
# Tracking of Workflow State for BOM Processing and Analysis
25

36
!!! note

0 commit comments

Comments
 (0)