-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
Fluent Bit emits incomplete (split) log records during container log file rotation managed by containerd.
When containerd splits a log record across two files at rotation time,
Fluent Bit forwards each fragment as a separate log record instead of joining them.
The result is malformed JSON records with missing fields that arrive at the destination silently broken.
The bug appears only under high load, since only in this case containerd splits a log record across two files.
Over 1 hour of testing at 10k logs/sec from 2 Pods, Fluent Bit produced 34 split records.
To Reproduce
- Clone the benchmark repository:
git clone https://github.com/VictoriaMetrics/log-collectors-benchmark
cd log-collectors-benchmark
- Create a
kindKubernetes cluster (requireskubectl,kind,helm,docker,make):
kind create cluster --name log-collectors-bench
- Install VictoriaLogs as the log storage backend:
helm repo add vm https://victoriametrics.github.io/helm-charts/
helm install vls vm/victoria-logs-single --namespace logging --create-namespace
- Configure Fluent Bit to write to VictoriaLogs:
make set-endpoint VLS_HOST='vls-victoria-logs-single-server.logging.svc.cluster.local' VLS_PORT=9428
- Deploy Fluent Bit:
make bench-up-fluent-bit
- Start the load generator:
make bench-up-generator GENERATOR_REPLICAS=1 LOGS_PER_SECOND=10000 RAMP_UP=false
You can increase the number of the load generator replicas (GENERATOR_REPLICAS) to greater value if your machine is fast enough.
This will increase the load and a chance to reproduce the bug.
- Forward the VictoriaLogs port to your local machine:
kubectl port-forward -n logging vls-victoria-logs-single-server-0 9428:9428
-
Wait approximately 30 minutes (the bug is intermittent and appears only under sustained load).
-
Query VictoriaLogs for malformed records using the expression
sequence_id:""-
this finds all records missing thesequence_idfield, which are the split fragments. -
Clean up:
make bench-down-all
Expected behavior
Fluent Bit should detect that a log record was split at a file rotation boundary and reconstruct the complete record before forwarding it.
Screenshots
Your Environment
- Version used: v4.2.3
- Configuration: default config from the official Helm chart. See full configuration here: https://github.com/VictoriaMetrics/log-collectors-benchmark/blob/11e1fa7760a4b53d9ed1f59a61d2195b627bd2f9/values/fluent-bit.yml
- Environment name and version: Kubernetes (
kindv0.31.0), single-node cluster - Server type and version: GCP
n2-highcpu-32(32 vCPU, 32 GiB RAM, local SSD) - Operating System and version: Ubuntu 22.04
- Filters and plugins:
tailinput plugin reading from/var/log/pods, JSON parser
Additional context
The root cause appears to be specific to the last log record of a file at rotation time.
The record is split across two files by containerd and is marked with the partial flag (P in CRI format),
even though its size does not exceed the standard 16 KiB threshold at which containerd normally splits long lines.
Fluent Bit forwards each part as a separate record instead of waiting for and joining the continuation from the new file.
We custom-modified our collector to verify that the issue is rotation-specific. Since other collectors don't encounter this, we've confirmed the application is writing logs properly and isn't the source of truncated or partial log lines.