-
Notifications
You must be signed in to change notification settings - Fork 359
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug:
leftover buffers with no associated configs (flow / output does not exist anymore) are never drained, so drain-watch does not kill fluentd and buffer-volume-sidecar and the drainer jobs finishes with error because of timeout.
Expected behaviour:
Drainer job skips old chunks with no associated config.
Better: Drainer job is executed on config update so no orphan chunks stays in the buffers.
Steps to reproduce the bug:
- Delete an output with chunks still in the pipe.
- Once Fluentd is reloaded, these chunks (in /buffers) are now unmatched and will never be sent to the destination and stay forever in the folder.
- If you scale down this fluentd, the drainer job will stay stuck forever as the script since this condition never becomes true https://github.com/banzaicloud/logging-operator/blob/76533182e425e00ade40d8d9d8b3ffadda6c4548/drain-watch-image/drain-watch.sh#L23
Workaround:
for each errored drainer pods (with associated logging-operator-logging-fluentd-XX pod):
- remove the drainer pod and wait for it to be recreated
- exec to the drainer pod and empty the /buffers directory
Environment details:
- Kubernetes version (e.g. v1.15.2): v1.20.9
- Cloud-provider/provisioner (e.g. AKS, GKE, EKS, PKE etc): RKE
- logging-operator version (e.g. 2.1.1): 3.17.9
- Install method (e.g. helm or static manifests): helm
- Logs from the misbehaving component (and any other relevant logs): Let me know if you need something
- Resource definition (possibly in YAML format) that caused the issue, without sensitive data: N/A
/kind bug
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working