Skip to content

Multibyte characters (e.g. Japanese) are escaped as Unicode in S3 output on ARM64 platform #930

@MaezonoTaiki

Description

@MaezonoTaiki

Describe the question/issue

When running AWS for Fluent Bit on the ARM64 platform and outputting logs to S3, multibyte characters (such as Japanese) in the original logs are converted to Unicode escape sequences in the S3 output files. This makes the logs difficult to read and process, especially for applications that expect properly encoded multibyte characters.

Configuration

I am using AWS for Fluent Bit with the S3 output plugin in an ECS environment. The issue occurs when logs containing Japanese or other multibyte characters are processed and written to S3.

Example configuration:

[OUTPUT]
    Name s3
    Match api.log*
    Retry_Limit ${RETRY_LIMIT}
    region ${S3_REGION}
    bucket ${S3_BUCKET}
    s3_key_format ${S3_KEY_FORMAT_API}
    total_file_size ${TOTAL_FILE_SIZE_API}
    upload_timeout ${UPLOAD_TIMEOUT_API}
    #upload_chunk_size ${UPLOAD_CHUNK_SIZE}
    use_put_object ${USE_PUT_OBJECT}
    compression gzip
    content_type application/gzip

Fluent Bit Log Output

The logs appear normal in Fluent Bit's output, but when examining the files written to S3, multibyte characters are escaped. For example, Japanese text like "こんにちは" appears as "\u3053\u3093\u306b\u3061\u306f" in the S3 output files.

Fluent Bit Version Info

  • AWS for Fluent Bit versions tested: arm64-2.28.3 and arm64-2.32.5.20250327
  • The issue is present in both versions tested
  • I have also confirmed this issue exists in upstream Fluent Bit 3.x and 4.0.0 (both x86 and ARM64 versions)
  • Interestingly, older versions of Fluent Bit (1.9.10) do not exhibit this escaping behavior

Cluster Details

  • ECS on Fargate
  • ARM64 platform
  • Standard VPC networking setup
  • Sidecar deployment for Fluent Bit

Application Details

  • Application generates logs with Japanese and other multibyte characters

Steps to reproduce issue

  1. Deploy an ECS task with AWS for Fluent Bit as a sidecar on ARM64 platform
  2. Configure the S3 output plugin to write logs to an S3 bucket
  3. Generate logs containing multibyte characters (e.g., Japanese text)
  4. Check the logs written to S3 - multibyte characters will be escaped as Unicode sequences

Related Issues

This issue appears to be related to how Fluent Bit handles multibyte characters when writing to S3. While there have been attempts to address this in the upstream Fluent Bit project, the problem persists in the latest versions (3.x and 4.0.0).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions