-
Notifications
You must be signed in to change notification settings - Fork 141
Description
Describe the question/issue
When running AWS for Fluent Bit on the ARM64 platform and outputting logs to S3, multibyte characters (such as Japanese) in the original logs are converted to Unicode escape sequences in the S3 output files. This makes the logs difficult to read and process, especially for applications that expect properly encoded multibyte characters.
Configuration
I am using AWS for Fluent Bit with the S3 output plugin in an ECS environment. The issue occurs when logs containing Japanese or other multibyte characters are processed and written to S3.
Example configuration:
[OUTPUT]
Name s3
Match api.log*
Retry_Limit ${RETRY_LIMIT}
region ${S3_REGION}
bucket ${S3_BUCKET}
s3_key_format ${S3_KEY_FORMAT_API}
total_file_size ${TOTAL_FILE_SIZE_API}
upload_timeout ${UPLOAD_TIMEOUT_API}
#upload_chunk_size ${UPLOAD_CHUNK_SIZE}
use_put_object ${USE_PUT_OBJECT}
compression gzip
content_type application/gzip
Fluent Bit Log Output
The logs appear normal in Fluent Bit's output, but when examining the files written to S3, multibyte characters are escaped. For example, Japanese text like "こんにちは" appears as "\u3053\u3093\u306b\u3061\u306f" in the S3 output files.
Fluent Bit Version Info
- AWS for Fluent Bit versions tested: arm64-2.28.3 and arm64-2.32.5.20250327
- The issue is present in both versions tested
- I have also confirmed this issue exists in upstream Fluent Bit 3.x and 4.0.0 (both x86 and ARM64 versions)
- Interestingly, older versions of Fluent Bit (1.9.10) do not exhibit this escaping behavior
Cluster Details
- ECS on Fargate
- ARM64 platform
- Standard VPC networking setup
- Sidecar deployment for Fluent Bit
Application Details
- Application generates logs with Japanese and other multibyte characters
Steps to reproduce issue
- Deploy an ECS task with AWS for Fluent Bit as a sidecar on ARM64 platform
- Configure the S3 output plugin to write logs to an S3 bucket
- Generate logs containing multibyte characters (e.g., Japanese text)
- Check the logs written to S3 - multibyte characters will be escaped as Unicode sequences
Related Issues
- Fluent Bit Issue #8521: Emoji got UNICODE-escaped for file output when running fluent-bit with Docker fluent/fluent-bit#8521
- Fluent Bit PR #8851: cmake: fix UNICODE-escaped characters on aarch64 fluent/fluent-bit#8851
This issue appears to be related to how Fluent Bit handles multibyte characters when writing to S3. While there have been attempts to address this in the upstream Fluent Bit project, the problem persists in the latest versions (3.x and 4.0.0).