Skip to content

Fluentd drainer job cannot complete when old chunks with no associated config are present  #1120

@aslafy-z

Description

@aslafy-z

Describe the bug:
leftover buffers with no associated configs (flow / output does not exist anymore) are never drained, so drain-watch does not kill fluentd and buffer-volume-sidecar and the drainer jobs finishes with error because of timeout.

Expected behaviour:
Drainer job skips old chunks with no associated config.
Better: Drainer job is executed on config update so no orphan chunks stays in the buffers.

Steps to reproduce the bug:

Workaround:

for each errored drainer pods (with associated logging-operator-logging-fluentd-XX pod):

  • remove the drainer pod and wait for it to be recreated
  • exec to the drainer pod and empty the /buffers directory

Environment details:

  • Kubernetes version (e.g. v1.15.2): v1.20.9
  • Cloud-provider/provisioner (e.g. AKS, GKE, EKS, PKE etc): RKE
  • logging-operator version (e.g. 2.1.1): 3.17.9
  • Install method (e.g. helm or static manifests): helm
  • Logs from the misbehaving component (and any other relevant logs): Let me know if you need something
  • Resource definition (possibly in YAML format) that caused the issue, without sensitive data: N/A

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions