-
Notifications
You must be signed in to change notification settings - Fork 141
Description
Description
We are using AWS ECS (Fargate, platform version 1.4.0) to transfer application logs. Logs produced by application containers are forwarded via a log container based on Fluent Bit (aws-for-fluent-bit:2.25.1) to multiple destinations such as Datadog and AWS S3 buckets.
Under normal operation, log forwarding works correctly. However, after running for several days, we observed that the log container’s CPU usage dropped drastically (appearing idle), and all log forwarding stopped.
For log input, we are using the awsfirelens log driver, and logs are transferred via a Unix Domain Socket (confirmed to be opened in the running container).
To investigate, we used the strace command to monitor the log container. Immediately after running the following command inside the log container, the log container’s CPU usage suddenly spiked close to 100%, and for several minutes, a large backlog of logs was flushed and delivered to destinations such as the S3 bucket. It appeared that the log container processed data that had accumulated in the socket buffer.
strace -p 1 -f -o strace_0902_log.txt -s 2048
※Since this issue occurred in a production environment, we were not running Fluent Bit with debug or info log levels enabled. Apologies for the lack of detailed diagnostic information.
Expected Behavior
- The log container should continue forwarding logs reliably, without becoming idle after running for several days.
Actual Behavior
- After several days of uptime, the log container becomes idle (CPU near 0%), and log forwarding stops completely.
Environment
- Service: AWS ECS Fargate
- Platform version: 1.4.0
- Fluent Bit image: aws-for-fluent-bit:2.25.1
- Log driver: awsfirelens (Unix Domain Socket input)
- Outputs: Datadog, AWS S3
Thank you for support us.