Skip to content

Don't create new containers for existing CVEs #812

@fricklerhandwerk

Description

@fricklerhandwerk

Technically this isn't a bug but tech debt (i.e., known in advance that it may become a problem, and solving it was postponed), but we're now observing a degradation in user experience because of it.

We're currently creating new containers in the database even when the CVE ID already exists:

if record is not None:
# TODO: Remove stale data to prevent overgrowth
pass

This leads to new matches being triggered for arbitrarily old CVEs, because those happen on container insertion. Recently upstream decided to add microsecond precision (!) to publication dates retroactively, which produced >2k redundant items.

We should instead update our existing data in such a case.

Note

We may want to consider dropping the custom data model and ingest JSON into Postgres directly instead. We can still have structured data in application code using the upstream schema with generated Pydantic models. And at the moment we're processing each CVE separately and only once anyway, so there should be no issue with querying aggregate data. Such a change would require quite a bit of rewiring though.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdatasomething about quality or quantity of ingested data

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions