Skip to content

Deduplicate Backlog Items #708

@zkdev

Description

@zkdev

Motivation

Increasing amount of backlog items does not scale well, as this floods the etcd.
Especially in spike situations, e.g. new product component release, the scan workers have a lot on the plate.
We see situations where artefact-enumerator re-runs faster than workers can process backlog items.
This leads to situations where multiple backlog-items for the very same scanner and artefact exist.

They add no value as the workers will process the first, and then skip (in most situations) subsequent scans. Due to the aforementioned bad scaling behaviour and the fact that it is causing lot of noise in the cluster (and for operators), let's consider a concept to deduplicate backlog-items.

Proposals

  1. We should avoid putting too much load on the API-server, we should consider the amount of requests as most important efficiency metric. Using list operation with label selector will result in only one request, as the labels are part of resource metadata and filtering is done server-side.

  2. Use digest-based backlog item names

Metadata

Metadata

Assignees

Labels

area/ipceiImportant Project of Common European Interestkind/tasksmall task, normally part of feature or epic

Type

Projects

Status

🔍 Review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions