-
Notifications
You must be signed in to change notification settings - Fork 358
Description
Is your feature request related to a problem? Please describe.
When running FluentBit on transient infrastructure like EC2 spot instances, logs can be permanently lost when instances are terminated. FluentBit buffers logs to disk, but without the storage.backlog.flush_on_shutdown configuration enabled, the buffer is not flushed during graceful shutdown. This results in data loss when spot instances are reclaimed or nodes are drained, as the buffered logs are deleted along with the instance.
Describe the solution you'd like
Add support for the storage.backlog.flush_on_shutdown configuration parameter in the BufferStorage CRD spec. This would allow users to configure FluentBit to flush buffered logs during graceful shutdown, preventing data loss on transient infrastructure.
The configuration is already supported in FluentBit's service section (documentation) but is not exposed in the Logging Operator's FluentBit CRD (current spec).
Proposed addition to BufferStorage:
apiVersion: logging.banzaicloud.io/v1beta1
kind: FluentbitAgent
spec:
bufferStorage:
storage.backlog.flush_on_shutdown: true # new fieldDescribe alternatives you've considered
- Increasing termination grace periods: This only delays the problem and doesn't guarantee log delivery
- External FluentBit configuration: Managing FluentBit outside the operator defeats the purpose of using the Logging Operator
Additional context
This is particularly critical for:
- Spot instance/preemptible VM workloads
- Kubernetes clusters with frequent node rotations
- Cost-optimized infrastructure with transient compute resources
The FluentBit project already provides this safety mechanism; the Logging Operator just needs to expose it through the CRD.