-
Notifications
You must be signed in to change notification settings - Fork 705
Description
Hello Contour team π
What happened?
In our Kubernetes environment, we are observing CPU throttling on the shutdown-manager container within the Envoy DaemonSet.
What you expected to happen?
We expected to be able to configure the CPU/memory requests and limits for the shutdown-manager container, in the same way that resources can be configured for the contour and envoy containers.
How to reproduce it?
This issue is noticeable in clusters under load where the shutdown-manager requires more CPU than the default allocation to perform its duties, leading to throttling by the Kubernetes scheduler.
The core of the issue is that there appears to be no API field to define these resources.
Current Environment
- Contour version: v1.33.1
- Installation method: Contour Operator using the
ContourDeploymentCRD with GatewayAPI. - Kubernetes version: 1.34.x
- Cloud provider or hardware: Self-hosted
- Orchestration: GitOps-based with FluxCD
Proposed Solution
To solve this, we propose adding a new field to the ContourDeployment API specification. A logical location would be under spec.envoy, mirroring the structure of other components.
Example of the desired configuration in the ContourDeployment resource:
apiVersion: projectcontour.io/v1alpha1
kind: ContourDeployment
metadata:
name: internal
namespace: projectcontour
spec:
# ... other contour/envoy settings
envoy:
# ... other envoy settings
shutdownManager:
resources:
requests:
cpu: 100m
limits:
cpu: 200m
# ... other envoy settingsWorkarounds Considered
We are using the Contour Operator in a GitOps workflow, which means the Envoy DaemonSet is created and reconciled at runtime. This prevents us from using a standard Kustomize patch in our Git repository to modify the DaemonSet, as the Operator would likely overwrite any manual changes during reconciliation.
Furthermore, we've identified that the containers resource limits are currently hardcoded in the Contour codebase
Without a way to specify these values through the ContourDeployment CRD, we have no clear path to resolving the CPU throttling issue. If you have an idea, it's welcome in the meantime.
Thank you for considering this feature.