-
Notifications
You must be signed in to change notification settings - Fork 271
Description
What steps did you take and what happened:
After a vulnerability report scan job completes successfully, the job is left in the cluster until the TTL expires due to changes in PR #2632 and Issue #2362. Prior to those changes, I had the scan job TTL set at 24 hours to make sure I could find failed jobs and investigate why they failed. The TTL was used only when a job failed as the operator immediately deleted completed jobs.
As of now, this new behavior makes a large TTL unreasonable due to the way the operator performs the work. As far as I know, the operator creates jobs in the cluster until it reaches the concurrent job limit and then waits for these jobs to complete and then it deletes them from the cluster. Since the jobs are never deleted and wait to be deleted at the TTL, this means completed jobs block any new jobs from spawning.
What did you expect to happen:
The trivy operator should properly scan containers in my cluster.
Anything else you would like to add:
It seems like the quickest way to fix this is either reverting the change or adding a flag to allow selecting this behavior as a user did request it. I could see changes down the line where the operator ignores completed and failed jobs when scheduling new scans but I think that would require a limit on the number of jobs to keep in the cluster at one time.
Environment:
- Trivy-Operator version (use
trivy-operator version): 0.29.0 - Kubernetes version (use
kubectl version): 1.33.3 - OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc): Almalinux 9.6