|
| 1 | +| Status | Date | Author(s) | |
| 2 | +|:---------|:-----------|:-------------------------------------| |
| 3 | +| Accepted | 2026-02-23 | [@nscuro](https://github.com/nscuro) | |
| 4 | + |
| 5 | +## Context |
| 6 | + |
| 7 | +Vulnerability aliases are stored in the denormalized `VULNERABILITYALIAS` table: |
| 8 | + |
| 9 | +| Column | Type | Constraints | |
| 10 | +|:------------|:-------|:-----------------| |
| 11 | +| ID | BIGINT | PK | |
| 12 | +| CVE_ID | TEXT | | |
| 13 | +| GHSA_ID | TEXT | | |
| 14 | +| GSD_ID | TEXT | | |
| 15 | +| INTERNAL_ID | TEXT | | |
| 16 | +| OSV_ID | TEXT | | |
| 17 | +| SNYK_ID | TEXT | | |
| 18 | +| SONATYPE_ID | TEXT | | |
| 19 | +| VULNDB_ID | TEXT | | |
| 20 | +| UUID | UUID | NOT NULL, UNIQUE | |
| 21 | + |
| 22 | +This design poses a few challenges: |
| 23 | + |
| 24 | +* Rows lack a natural key, making it impossible to detect and prevent duplicates. |
| 25 | +* Modifying rows (i.e. adding a new ID to an existing alias group) is prone to race conditions. |
| 26 | +* Due to the combination of the points above, batching operations on this table is not possible. |
| 27 | +* Vulnerability sources are hardcoded as columns, making it unnecessarily challenging to add new sources. |
| 28 | +* Querying the table is unnecessarily hard, as it requires the caller to know what column to query on. |
| 29 | +* The lack of provenance for alias relationships prevents safe removal of relationships, |
| 30 | + e.g. when upstream sources correct their data. |
| 31 | + |
| 32 | +The [logic to create or modify alias records](https://github.com/DependencyTrack/hyades-apiserver/blob/f969a32387c03b45eff186e2fcc4ba900a7059f9/apiserver/src/main/java/org/dependencytrack/persistence/VulnerabilityQueryManager.java#L474-L591) |
| 33 | +is brittle and non-deterministic. Making it concurrency-safe would require acquisition of coarse advisory locks. |
| 34 | + |
| 35 | +Alias synchronization unfortunately is in the hot path for vulnerability analysis result reconciliation, |
| 36 | +and is performed concurrently with potentially overlapping data. To ensure that synchronization is both |
| 37 | +performant and correct, we need a solution that allows us to batch database operations, |
| 38 | +while effectively shielding us against data races. |
| 39 | + |
| 40 | +## Decision |
| 41 | + |
| 42 | +### Schema |
| 43 | + |
| 44 | +Normalize the data into a new `VULNERABILITY_ALIAS` table with the following schema: |
| 45 | + |
| 46 | +| Column | Type | Constraints | |
| 47 | +|:---------|:-----|:------------| |
| 48 | +| GROUP_ID | UUID | NOT NULL | |
| 49 | +| SOURCE | TEXT | PK | |
| 50 | +| VULN_ID | TEXT | PK | |
| 51 | + |
| 52 | +* The separate ID columns are collapsed into `SOURCE` and `VULN_ID`. |
| 53 | +* `SOURCE` and `VULN_ID` form the natural (primary) key, effectively preventing duplicates. |
| 54 | +* Alias relationships are identified via matching `GROUP_ID`. |
| 55 | + |
| 56 | +### Querying |
| 57 | + |
| 58 | +To query all aliases of a vulnerability identified by `source` and `vulnId`, *excluding the input pair itself*: |
| 59 | + |
| 60 | +```sql linenums="1" |
| 61 | +SELECT va.* |
| 62 | + FROM "VULNERABILITY_ALIAS" AS va |
| 63 | + WHERE va."GROUP_ID" IN ( |
| 64 | + SELECT va2."GROUP_ID" |
| 65 | + FROM "VULNERABILITY_ALIAS" AS va2 |
| 66 | + WHERE va2."SOURCE" = :source |
| 67 | + AND va2."VULN_ID" = :vulnId |
| 68 | + ) |
| 69 | + AND (va."SOURCE", va."VULN_ID") != (:source, :vulnId) |
| 70 | +``` |
| 71 | + |
| 72 | +### Alias Assertions |
| 73 | + |
| 74 | +To track provenance of alias relationships, a separate `VULNERABILITY_ALIAS_ASSERTION` table records |
| 75 | +which entity asserted that two vulnerabilities are aliases: |
| 76 | + |
| 77 | +| Column | Type | Constraints | |
| 78 | +|:-------------|:---------------|:------------------------| |
| 79 | +| ASSERTER | TEXT | PK | |
| 80 | +| VULN_SOURCE | TEXT | PK | |
| 81 | +| VULN_ID | TEXT | PK | |
| 82 | +| ALIAS_SOURCE | TEXT | PK | |
| 83 | +| ALIAS_ID | TEXT | PK | |
| 84 | +| CREATED_AT | TIMESTAMPTZ(3) | NOT NULL, DEFAULT NOW() | |
| 85 | + |
| 86 | +Each row records that `ASSERTER` claimed (`VULN_SOURCE`, `VULN_ID`) and (`ALIAS_SOURCE`, `ALIAS_ID`) |
| 87 | +are aliases. Assertions are directional: (`VULN_SOURCE`, `VULN_ID`) is the declaring vulnerability, |
| 88 | +(`ALIAS_SOURCE`, `ALIAS_ID`) is the alias attributed to it. This enables efficient reconciliation |
| 89 | +by querying existing assertions for a given vulnerability. |
| 90 | + |
| 91 | +Alias groups in the `VULNERABILITY_ALIAS` table are derived from assertions and serve as a |
| 92 | +materialized view for efficient read queries. They are recomputed whenever assertions change. |
| 93 | +Assertions provide an audit trail and enable workflows such as revoking assertions from |
| 94 | +a specific source, without affecting others. |
| 95 | + |
| 96 | +### Synchronization Algorithm |
| 97 | + |
| 98 | +Given an asserter (e.g. `NVD`) and a map of declaring vulnerabilities to their asserted aliases: |
| 99 | + |
| 100 | +```js linenums="1" |
| 101 | +{ |
| 102 | + {source: 'NVD', vulnId: 'CVE-1'}: [ |
| 103 | + {source: 'GITHUB', vulnId: 'GHSA-1'}, |
| 104 | + {source: 'SNYK', vulnId: 'SNYK-1'} |
| 105 | + ] |
| 106 | +} |
| 107 | +``` |
| 108 | + |
| 109 | +1. Begin transaction. |
| 110 | +2. Acquire PostgreSQL advisory locks for all declaring vulnerabilities, |
| 111 | + ordered by key to prevent deadlocks between concurrent transactions: |
| 112 | + ```sql linenums="1" |
| 113 | + SELECT PG_ADVISORY_XACT_LOCK(HASHTEXT(key)) |
| 114 | + FROM ( |
| 115 | + SELECT DISTINCT UNNEST(ARRAY['vuln-alias-sync|NVD|CVE-1']) AS key |
| 116 | + ORDER BY 1 |
| 117 | + ) AS t |
| 118 | + ``` |
| 119 | +3. Fetch existing assertions for the declaring vulnerabilities: |
| 120 | + ```sql linenums="1" |
| 121 | + SELECT "ASSERTER" |
| 122 | + , "VULN_SOURCE" |
| 123 | + , "VULN_ID" |
| 124 | + , "ALIAS_SOURCE" |
| 125 | + , "ALIAS_ID" |
| 126 | + FROM "VULNERABILITY_ALIAS_ASSERTION" |
| 127 | + WHERE ("VULN_SOURCE", "VULN_ID") IN (SELECT * FROM UNNEST(:sources, :vulnIds)) |
| 128 | + ``` |
| 129 | +4. Reconcile incoming aliases against existing assertions, scoped to the current asserter: |
| 130 | + * Assertions to create: incoming alias keys minus existing alias keys for this asserter. |
| 131 | + * Assertions to delete: existing alias keys for this asserter minus incoming alias keys. |
| 132 | + * `UNKNOWN` cleanup: if the asserter is not `UNKNOWN` and `UNKNOWN` assertions |
| 133 | + exist for the same declaring vulnerability, mark it for removal. |
| 134 | +5. Delete stale assertions: |
| 135 | + ```sql linenums="1" |
| 136 | + DELETE |
| 137 | + FROM "VULNERABILITY_ALIAS_ASSERTION" |
| 138 | + WHERE ("ASSERTER", "VULN_SOURCE", "VULN_ID", "ALIAS_SOURCE", "ALIAS_ID") |
| 139 | + IN (SELECT * FROM UNNEST(:asserters, :vulnSources, :vulnIds, :aliasSources, :aliasIds)) |
| 140 | + ``` |
| 141 | +6. Create new assertions: |
| 142 | + ```sql linenums="1" |
| 143 | + INSERT INTO "VULNERABILITY_ALIAS_ASSERTION" ( |
| 144 | + "ASSERTER" |
| 145 | + , "VULN_SOURCE" |
| 146 | + , "VULN_ID" |
| 147 | + , "ALIAS_SOURCE" |
| 148 | + , "ALIAS_ID" |
| 149 | + ) |
| 150 | + SELECT * |
| 151 | + FROM UNNEST(:asserters, :vulnSources, :vulnIds, :aliasSources, :aliasIds) |
| 152 | + ``` |
| 153 | +7. Delete `UNKNOWN` assertions for declaring vulnerabilities where a real asserter now provides claims: |
| 154 | + ```sql linenums="1" |
| 155 | + DELETE |
| 156 | + FROM "VULNERABILITY_ALIAS_ASSERTION" |
| 157 | + WHERE "ASSERTER" = 'UNKNOWN' |
| 158 | + AND ("VULN_SOURCE", "VULN_ID") IN (SELECT * FROM UNNEST(:sources, :vulnIds)) |
| 159 | + ``` |
| 160 | +8. Recompute alias groups for all modified vulnerabilities: |
| 161 | + 1. Expand transitively: iteratively query both `VULNERABILITY_ALIAS` and |
| 162 | + `VULNERABILITY_ALIAS_ASSERTION` to discover all transitively related keys. |
| 163 | + For example, if `CVE-1` is being linked to `GHSA-1`, but `GHSA-1` already |
| 164 | + has an assertion linking it to `GHSA-2`, expansion ensures `GHSA-2` is included. |
| 165 | + 2. Build a [union-find] from the expanded assertions to compute [connected components]. |
| 166 | + 3. For each component, pick the lowest existing group UUID (deterministic via sorted set), |
| 167 | + or generate a new one if the component has no prior group. |
| 168 | + 4. Upsert alias records, only writing when the group ID actually changed: |
| 169 | + ```sql linenums="1" |
| 170 | + INSERT INTO "VULNERABILITY_ALIAS" AS va ("GROUP_ID", "SOURCE", "VULN_ID") |
| 171 | + SELECT * FROM UNNEST(:groupIds, :sources, :vulnIds) |
| 172 | + ON CONFLICT ("SOURCE", "VULN_ID") DO UPDATE |
| 173 | + SET "GROUP_ID" = EXCLUDED."GROUP_ID" |
| 174 | + WHERE va."GROUP_ID" IS DISTINCT FROM EXCLUDED."GROUP_ID" |
| 175 | + ``` |
| 176 | + 5. Delete orphaned aliases no longer backed by any assertion. |
| 177 | +9. Commit transaction and release locks (implicit). |
| 178 | + |
| 179 | +!!! note |
| 180 | + Advisory locks are scoped to *declaring* vulnerability only. This is sufficient because |
| 181 | + assertions are directional: a given asserter always writes assertions under the declaring |
| 182 | + vulnerability it owns (e.g. NVD writes assertions under `NVD|CVE-*`). |
| 183 | + |
| 184 | +All `SELECT`, `DELETE`, and `INSERT` operations are batched via `UNNEST`, allowing multiple |
| 185 | +vulnerabilities to be processed in a single transaction with minimal round trips. |
| 186 | +The upsert's `WHERE ... IS DISTINCT FROM` clause avoids unnecessary writes. |
| 187 | +
|
| 188 | +### Data Migration |
| 189 | +
|
| 190 | +Existing data is migrated from `VULNERABILITYALIAS` to `VULNERABILITY_ALIAS` via Liquibase. |
| 191 | +The migration replicates the [synchronization algorithm](#synchronization-algorithm) in SQL. |
| 192 | +
|
| 193 | +The old `VULNERABILITYALIAS` table is dropped afterwards. |
| 194 | +
|
| 195 | +Assertions are seeded from the migrated alias groups. For each group, one assertion per unordered |
| 196 | +pair of members is inserted with `ASSERTER = 'UNKNOWN'`, since the original data does not carry |
| 197 | +provenance information. |
| 198 | +
|
| 199 | +An integration test verifies that the migration works as expected, |
| 200 | +including the handling of potential duplicates in the existing data set, |
| 201 | +and the correctness of seeded assertions. |
| 202 | +
|
| 203 | +## Consequences |
| 204 | +
|
| 205 | +* Adding new vulnerability sources requires no schema changes. |
| 206 | +* Alias synchronization can be fully batched, reducing round trips in the hot path. |
| 207 | +* The natural primary key prevents duplicate alias entries by construction. |
| 208 | +* Querying aliases is uniform, and callers no longer need source-specific column knowledge. |
| 209 | +* The old `UUID` column is dropped. Any external references to alias records by UUID will break. |
| 210 | + No known external consumers depend on this identifier. |
| 211 | +* Advisory locks add contention under concurrent writes to overlapping alias sets. |
| 212 | + This is bounded by the lock granularity (per declaring vulnerability key), and acceptable |
| 213 | + given the correctness guarantees it provides. |
| 214 | +* Alias group recomputation requires transitive expansion, which issues additional queries. |
| 215 | + In practice, alias groups are small (< 5 members), so this is negligible. |
| 216 | +* Alias assertions provide provenance but grow linearly with the number of aliases per |
| 217 | + declaring vulnerability. Given the small expected group sizes, this is acceptable. |
| 218 | +* `UNKNOWN` assertions seeded during migration are automatically superseded when a real |
| 219 | + asserter (e.g. NVD, GitHub) provides claims for the same declaring vulnerability. |
| 220 | +
|
| 221 | +[connected components]: https://en.wikipedia.org/wiki/Component_(graph_theory) |
| 222 | +[union-find]: https://en.wikipedia.org/wiki/Disjoint-set_data_structure |
0 commit comments