Skip to content

Reconciler error: Large size VulnerabilityReports will failed create and remains scan jobs #757

@mrtc0

Description

@mrtc0

What steps did you take and what happened:

Creation of VulnerabilityReports fails if the number of detected vulnerabilities is too large.

Steps to reproduce

  1. Create a Pod with python:3.5.10-buster image on a cluster with trivy-operator installed

(python:3.5.10-buster has 1252 vulnerabilities.)

$ kubectl create deployment python -n default --image python:3.5.10-buster
  1. Scan finishes successfully, but creation of VulnerabilityReports fails
# Scan is completed successfully, but no VulnerabilityAlerts are created and no jobs are deleted
❯ kubectl get jobs -n trivy-system
NAME                                  COMPLETIONS   DURATION   AGE
scan-vulnerabilityreport-7d8db4fc84   1/1           73s        5m42s

❯ kubectl get pods -n trivy-system
NAME                                        READY   STATUS      RESTARTS   AGE
scan-vulnerabilityreport-7d8db4fc84-gsztw   0/1     Completed   0          4m12s
trivy-operator-54bc4769db-5djnd             1/1     Running     0          85m

❯ kubectl -n trivy-system logs scan-vulnerabilityreport-7d8db4fc84-gsztw | tr -d "\n" | base64 -d | bunzip2 | jq .ArtifactName
Defaulted container "python" out of: python, 95164937-7089-491c-ab58-7f52bc8a8cce (init)
"python:3.5.10-buster"

❯ kubectl get vulnerabilityreports -n default
No resources found in default namespace.

# Reconciler error is logged in Operator
❯ kubectl logs -n trivy-system trivy-operator-54bc4769db-5djnd
...
{"level":"error","ts":1670231487.1708708,"msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-7d8db4fc84","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-7d8db4fc84","reconcileID":"7e1ccc8e-f6eb-4080-b1dd-25dcdb150e7c","error":"rpc error: code = ResourceExhausted desc = trying to send message larger than max (2633583 vs. 2097152)","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:326\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:234"}

I am not familiar with the Kubernetes Operator implementation, but from the error message (trying to send message larger than max (2633583 vs. 2097152)) I would guess that it is due to the VulnerabilityAlerts being too large.

Normally, reconcileJobs() should delete the Job, but it gets stuck because it fails to create VulnerabilityReports.
Therefore, when the number of similar jobs reaches the scanJobsConcurrentLimit, no new scans will be performed.

What did you expect to happen:

There are several possible approaches to solving this error:

  • Failure to create VulnerabilityReports Job to be deleted
  • VulnerabilityReports are successfully created

Environment:

  • Trivy-Operator version (use trivy-operator version): 0.7.1
  • Kubernetes version (use kubectl version): v1.25.0
  • OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc):

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.priority/backlogHigher priority than priority/awaiting-more-evidence.target/kubernetesIssues relating to kubernetes cluster scanning

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions