Skip to content

Completed Scan Jobs stop any new jobs from starting #2793

@plnordquist-pnnl

Description

@plnordquist-pnnl

What steps did you take and what happened:

After a vulnerability report scan job completes successfully, the job is left in the cluster until the TTL expires due to changes in PR #2632 and Issue #2362. Prior to those changes, I had the scan job TTL set at 24 hours to make sure I could find failed jobs and investigate why they failed. The TTL was used only when a job failed as the operator immediately deleted completed jobs.

As of now, this new behavior makes a large TTL unreasonable due to the way the operator performs the work. As far as I know, the operator creates jobs in the cluster until it reaches the concurrent job limit and then waits for these jobs to complete and then it deletes them from the cluster. Since the jobs are never deleted and wait to be deleted at the TTL, this means completed jobs block any new jobs from spawning.

What did you expect to happen:

The trivy operator should properly scan containers in my cluster.

Anything else you would like to add:

It seems like the quickest way to fix this is either reverting the change or adding a flag to allow selecting this behavior as a user did request it. I could see changes down the line where the operator ignores completed and failed jobs when scheduling new scans but I think that would require a limit on the number of jobs to keep in the cluster at one time.

Environment:

  • Trivy-Operator version (use trivy-operator version): 0.29.0
  • Kubernetes version (use kubectl version): 1.33.3
  • OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc): Almalinux 9.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions