-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Technically this isn't a bug but tech debt (i.e., known in advance that it may become a problem, and solving it was postponed), but we're now observing a degradation in user experience because of it.
We're currently creating new containers in the database even when the CVE ID already exists:
nix-security-tracker/src/shared/fetchers.py
Lines 287 to 289 in 8e3c303
| if record is not None: | |
| # TODO: Remove stale data to prevent overgrowth | |
| pass |
This leads to new matches being triggered for arbitrarily old CVEs, because those happen on container insertion. Recently upstream decided to add microsecond precision (!) to publication dates retroactively, which produced >2k redundant items.
We should instead update our existing data in such a case.
Note
We may want to consider dropping the custom data model and ingest JSON into Postgres directly instead. We can still have structured data in application code using the upstream schema with generated Pydantic models. And at the moment we're processing each CVE separately and only once anyway, so there should be no issue with querying aggregate data. Such a change would require quite a bit of rewiring though.