-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem? Please describe.
When fluent-bit is deployed as a sidecar container in a Kubernetes cluster, there is a small time window where the pod received a SIGTERM signal and still receives traffic. This is because the cluster sends the SIGTERM signal and updates the loadbalancer (iptables) in parallel. This leaves a small time period where the pod is marked as terminating but still receives requests.
Though this problem is not limited to fluent-bit, it impacts any multi-container pod, fluent-bit does not have any option to gracefully handle this time window.
Fluent-bit closes almost immediately if there are no pending tasks. There are two variables to be set here, flush and grace (docs).
flush sets the the frequency in which the input is flushed.
grace is the maximum time the service waits for finishing the flush process. if there are no tasks pending, the process immediately ends.
Describe the solution you'd like
I would like to suggest a delay-shutdown or shutdown-wait-period option in the service configuration section. This option configures the time in seconds before shutting down after receiving SIGTERM. I noticed a very similar option in the OPA repository.
Describe alternatives you've considered
There are a few alternatives:
- Do not use sidecars.
- Use containers with
debugtags. Containers published withdebugtags are not distroless and thus have access to thesleepcommand. With this, it is possible to usepreStophook in Kubernetes. This alternative is described here and here (taken from the OPA issue mentioned above).
Additional context
In our use case, we are forwarding specific logs for business reports. We are not using fluent-bit for operational logs (container logs stdout / stderr). Thus running fluent-bit as a daemonset on the cluster is not desirable. Conceptually, we'd prefer the sidecar setup.
I'm very much open to new views on this issue by the fluent-bit community.