|
| 1 | +| Status | Date | Author(s) | |
| 2 | +|:---------|:-----------|:-------------------------------------| |
| 3 | +| Accepted | 2026-02-12 | [@nscuro](https://github.com/nscuro) | |
| 4 | + |
| 5 | +## Context |
| 6 | + |
| 7 | +Findings are expressed as records in the `COMPONENTS_VULNERABILITIES` table. |
| 8 | +The table is a simple junction table with the following schema: |
| 9 | + |
| 10 | +| Column | Type | Constraints | |
| 11 | +|:-----------------|:-------|:------------| |
| 12 | +| COMPONENT_ID | BIGINT | PK, FK | |
| 13 | +| VULNERABILITY_ID | BIGINT | PK, FK | |
| 14 | + |
| 15 | +For each finding, additional metadata is recorded in the `FINDINGATTRIBUTION` table: |
| 16 | + |
| 17 | +| Column | Type | Constraints | |
| 18 | +|:-----------------|:------------|:-------------| |
| 19 | +| ID | BIGINT | PK | |
| 20 | +| PROJECT_ID | BIGINT | FK, NOT NULL | |
| 21 | +| COMPONENT_ID | BIGINT | FK, NOT NULL | |
| 22 | +| VULNERABILITY_ID | BIGINT | FK, NOT NULL | |
| 23 | +| ANALYZERIDENTITY | TEXT | NOT NULL | |
| 24 | +| ATTRIBUTED_ON | TIMESTAMPTZ | NOT NULL | |
| 25 | +| ALT_ID | TEXT | | |
| 26 | +| REFERENCE_URL | TEXT | | |
| 27 | +| UUID | UUID | NOT NULL | |
| 28 | + |
| 29 | +Only a single `FINDINGATTRIBUTION` record can exist for each `COMPONENTS_VULNERABILITIES` record. |
| 30 | +This is enforced using a `UNIQUE` constraint on the `COMPONENT_ID, VULNERABILITY_ID` columns. |
| 31 | + |
| 32 | +Consequently, only the first analyzer that reported a finding gets an attribution. |
| 33 | + |
| 34 | +So far this design has been sufficient, because findings were only *added*, but never *removed*. |
| 35 | + |
| 36 | +This does not reflect reality though: |
| 37 | + |
| 38 | +* Vulnerability databases get updated, vulnerable version ranges get revised. |
| 39 | +* Upstream analyzers such as OSS Index correct their data in response to FP reports. |
| 40 | +* Users disable analyzers that they no longer want to use. |
| 41 | + |
| 42 | +In any of the cases above, findings would need to be *removed*. |
| 43 | +The current design makes this challenging to do. Consider the following sequence of events: |
| 44 | + |
| 45 | +1. Analyzer **A** reports vuln **X** on component **C**. An attribution is created for analyzer **A**. |
| 46 | +2. Analyzer **B** also reports vuln **X** on component **C**. |
| 47 | + No attribution is created because there already is one for analyzer **A**. |
| 48 | +3. Analyzer **A** stops reporting the finding. Analyzer **B** still reports it. |
| 49 | + We can't safely remove the finding because we never tracked which analyzer other than **A** reported it. |
| 50 | + |
| 51 | +Additionally, we can't just *delete* finding records: |
| 52 | + |
| 53 | +* It would achieve the desired effect, but would leave users who check timeseries metrics |
| 54 | + behind wondering *what the hell happened*. |
| 55 | +* Findings may already have an audit trail with user comments etc., which would be wiped. |
| 56 | + If a finding is later re-discovered (e.g. by an analyzer being re-enabled), |
| 57 | + the audit trail no longer being there would be confusing. |
| 58 | + |
| 59 | +## Decision |
| 60 | + |
| 61 | +* Modify the `FINDINGATTRIBUTION` table such that all analyzers that reported a finding are |
| 62 | + tracked, not just the first. Modify the `UNIQUE` constraint to include the `ANALYZERIDENTITY`. |
| 63 | +* Use soft-deletion for `FINDINGATTRIBUTION` records. Introduce a new `DELETED_AT` column for this. |
| 64 | + When an analyzer no longer reports a finding, update its attribution's `DELETED_AT` timestamp accordingly. |
| 65 | +* Never delete `COMPONENTS_VULNERABILITIES` records, unless the corresponding `COMPONENT` record is deleted. |
| 66 | + This resembles the status quo and is necessary to retain attributions. |
| 67 | +* When analyzers report a finding again, unset their attribution's `DELETED_AT` column. |
| 68 | +* Consider findings with *at least one* `FINDINGATTRIBUTION` record where `DELETED_AT` |
| 69 | + is `NULL` as *active*. |
| 70 | +* Consider findings with only deleted `FINDINGATTRIBUTION` records as *inactive*. |
| 71 | +* Hide *inactive* findings by default. Eventually add API parameters and UI elements to show them. |
| 72 | +* To avoid breaking changes in the REST API, continue to only report a single attribution per finding. |
| 73 | + The attribution to report is the *first, non-deleted one* (first meaning lowest `ID`), or, |
| 74 | + if all attributions are deleted, the *last deleted* one. |
| 75 | + |
| 76 | +This enables findings to transition between *active* and *inactive* status |
| 77 | +without dropping or otherwise modifying their audit trail. |
| 78 | + |
| 79 | +By using soft-deletion for `FINDINGATTRIBUTION` records, we retain a history of |
| 80 | +what analyzers previously reported a finding, but no longer do. We could further |
| 81 | +surface this data to users, enabling them to see where analyzers overlap. |
| 82 | + |
| 83 | +### Considered Alternatives |
| 84 | + |
| 85 | +It was considered to automatically *suppress* findings that are no longer reported by |
| 86 | +any analyzer. This was discarded because it made coordination of who "owns" the analysis |
| 87 | +of a finding challenging. i.e.: |
| 88 | + |
| 89 | +1. Finding gets reported. |
| 90 | +2. User suppresses finding. |
| 91 | +3. Finding is no longer reported, but already suppressed so no action. |
| 92 | +4. Finding gets reported again, but is it safe / OK to un-suppress without user consent? |
| 93 | + |
| 94 | +It would have also mixed two different concerns, i.e. a finding being applicable at all, |
| 95 | +and it being applicable but suppressed. |
| 96 | + |
| 97 | +## Consequences |
| 98 | + |
| 99 | +* Reconciliation logic of findings and finding attributions must be updated. |
| 100 | +* Queries for listing findings must be updated to not produce duplicate rows |
| 101 | + when more than one attribution exists for a finding. |
| 102 | +* Queries that return attribution data must be updated to only return one |
| 103 | + attribution, not multiple. |
| 104 | + |
0 commit comments