Skip to content

Conversation

@cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Jul 31, 2025

This is because RFC 8259 requests the following both of cases:

  1. escaping multibyte characters are allowed
  2. Raw UTF-8 strings are also allowed

ref: https://datatracker.ietf.org/doc/html/rfc8259#section-8

Currently, we implemented pattern 1 of RFC 8259. But That RFC also permits to handle raw UTF-8 strings.
So, we need to support not escaping choice especially for the multibyte characters.

Closes #10631.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
[SERVICE]
    Flush 1
    Daemon off
    Log_Level debug
    grace   2
    grace_input   1
    json.escape_unicode Off


[INPUT]
    Name         tail
    path          my_app_gbk.log
    Tag           dynamic_logs
    generic.encoding GBK
    Refresh_Interval  5
    Skip_Long_Lines On
    read_from_head true

[OUTPUT]
    Name        file
    Match       *
    File        test3.log

with the GBK encoded file:

{"message": "应用启动成功", "level": "info"}
  • Debug log output from testing the change
Fluent Bit v4.1.0
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _             ___  _____ 
|  ___| |                | |   | ___ (_) |           /   ||  _  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __/ /| || |/' |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| ||  /| |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /\___  |\ |_/ /
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/     |_(_)___/ 


[2025/07/31 20:35:30] [ info] Configuration:
[2025/07/31 20:35:30] [ info]  flush time     | 1.000000 seconds
[2025/07/31 20:35:30] [ info]  grace          | 2 seconds
[2025/07/31 20:35:30] [ info]  daemon         | 0
[2025/07/31 20:35:30] [ info] ___________
[2025/07/31 20:35:30] [ info]  inputs:
[2025/07/31 20:35:30] [ info]      tail
[2025/07/31 20:35:30] [ info] ___________
[2025/07/31 20:35:30] [ info]  filters:
[2025/07/31 20:35:30] [ info] ___________
[2025/07/31 20:35:30] [ info]  outputs:
[2025/07/31 20:35:30] [ info]      file.0
[2025/07/31 20:35:30] [ info] ___________
[2025/07/31 20:35:30] [ info]  collectors:
[2025/07/31 20:35:30] [ info] [fluent bit] version=4.1.0, commit=3db7314ecb, pid=3084
[2025/07/31 20:35:30] [debug] [engine] coroutine stack size: 36864 bytes (36.0K)
[2025/07/31 20:35:30] [ info] [storage] ver=1.2.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/07/31 20:35:30] [ info] [simd    ] disabled
[2025/07/31 20:35:30] [ info] [cmetrics] version=1.0.5
[2025/07/31 20:35:30] [ info] [ctraces ] version=0.6.6
[2025/07/31 20:35:30] [ info] [input:tail:tail.0] initializing
[2025/07/31 20:35:30] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2025/07/31 20:35:30] [debug] [tail:tail.0] created event channels: read=25 write=26
[2025/07/31 20:35:30] [debug] [input:tail:tail.0] flb_tail_fs_stat_init() initializing stat tail input
[2025/07/31 20:35:30] [debug] [input:tail:tail.0] scanning path my_app_gbk.log
[2025/07/31 20:35:30] [debug] [input:tail:tail.0] inode=191383172 with offset=0 appended as my_app_gbk.log
[2025/07/31 20:35:30] [debug] [input:tail:tail.0] scan_glob add(): my_app_gbk.log, inode 191383172
[2025/07/31 20:35:30] [debug] [input:tail:tail.0] 1 new files found on path 'my_app_gbk.log'
[2025/07/31 20:35:30] [debug] [file:file.0] created event channels: read=32 write=33
[2025/07/31 20:35:30] [ info] [output:file:file.0] worker #0 started
[2025/07/31 20:35:30] [ info] [sp] stream processor started
[2025/07/31 20:35:30] [ info] [engine] Shutdown Grace Period=2, Shutdown Input Grace Period=1
[2025/07/31 20:35:30] [debug] [input:tail:tail.0] [static files] processed 45b
[2025/07/31 20:35:30] [debug] [input:tail:tail.0] inode=191383172 file=my_app_gbk.log promote to TAIL_EVENT
[2025/07/31 20:35:30] [debug] [input:tail:tail.0] [static files] processed 0b, done
[2025/07/31 20:35:31] [debug] [task] created task=0x600003cbc000 id=0 OK
[2025/07/31 20:35:31] [debug] [output:file:file.0] task_id=0 assigned to thread #0
[2025/07/31 20:35:31] [debug] [out flush] cb_destroy coro_id=0
[2025/07/31 20:35:31] [debug] [task] destroy task=0x600003cbc000 (task_id=0)
^C[2025/07/31 20:35:33] [engine] caught signal (SIGINT)
[2025/07/31 20:35:33] [ info] [input] pausing tail.0
[2025/07/31 20:35:33] [ info] [output:file:file.0] thread worker #0 stopping...
[2025/07/31 20:35:33] [ info] [output:file:file.0] thread worker #0 stopped
[2025/07/31 20:35:33] [debug] [input:tail:tail.0] inode=191383172 removing file name my_app_gbk.log

Then, the generated file contains:

$ cat test3.log
dynamic_logs: [1753961970.958898000, {"log":"{\"message\": \"应用启动成功\", \"level\": \"info\"}"}]
  • Attached Valgrind output that shows no leaks or memory corruption was found

With leaks which is a detector of memory leaks on macOS:

Process:         fluent-bit [3121]
Path:            /Users/USER/*/fluent-bit
Load Address:    0x10215c000
Identifier:      fluent-bit
Version:         0
Code Type:       ARM64
Platform:        macOS
Parent Process:  leaks [3120]
Target Type:     live task

Date/Time:       2025-07-31 20:36:15.809 +0900
Launch Time:     2025-07-31 20:36:09.331 +0900
OS Version:      macOS 15.5 (24F74)
Report Version:  7
Analysis Tool:   /Applications/Xcode.app/Contents/Developer/usr/bin/leaks
Analysis Tool Version:  Xcode 16.4 (16F6)

Physical footprint:         4577K
Physical footprint (peak):  4673K
Idle exit:                  untracked
----

leaks Report Version: 4.0, multi-line stacks
Process 3121: 761 nodes malloced for 79 KB
Process 3121: 0 leaks for 0 total leaked bytes.

From valgrind:

==41087== 
==41087== HEAP SUMMARY:
==41087==     in use at exit: 0 bytes in 0 blocks
==41087==   total heap usage: 3,328 allocs, 3,328 frees, 1,022,286 bytes allocated
==41087== 
==41087== All heap blocks were freed -- no leaks are possible
==41087== 
==41087== For lists of detected and suppressed errors, rerun with: -s
==41087== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features
    • Added global setting json.escape_unicode to control Unicode escaping in JSON output. Applies across filters, processors, HTTP API responses, and major outputs (e.g., HTTP/TCP/UDP, Kafka, Kinesis/Firehose, S3, Loki, Splunk, Elasticsearch/OpenSearch, CloudWatch, Azure, BigQuery, Datadog, Stackdriver, NATS, File, Stdout, WebSocket, etc.). Enabled by default; disable to emit raw UTF-8.
  • Tests
    • Expanded test coverage to validate both escaped and raw UTF-8 JSON paths.
  • Chores
    • CI: Updated aarch64 workflow to ubuntu-24.04-arm and modern compiler toolchains.

@coderabbitai
Copy link

coderabbitai bot commented Jul 31, 2025

Walkthrough

Adds a new config option json.escape_unicode and plumbs it through core APIs and many plugins. Public headers and functions gain an escape_unicode parameter; calls are updated accordingly. Default is enabled. Tests and HTTP endpoints are adjusted. CI updates GCC/Clang versions for aarch64 workflow.

Changes

Cohort / File(s) Summary
Config surface
include/fluent-bit/flb_config.h, src/flb_config.c
Add config field json_escape_unicode, macro FLB_CONF_UNICODE_STR_JSON_ESCAPE, service config entry, default init to FLB_TRUE.
JSON pack API (public)
include/fluent-bit/flb_pack.h
Add escape_unicode arg to: flb_pack_msgpack_to_json_format, flb_msgpack_to_json, flb_msgpack_to_json_str, flb_msgpack_raw_to_json_sds.
String utils API
include/fluent-bit/flb_utils.h, src/flb_utils.c, src/flb_sds.c
Add escape_unicode to flb_utils_write_str and flb_utils_write_str_buf; implement escaped/raw paths and UTF-8 validation; call updated in flb_sds_cat_utf8(…, FLB_TRUE).
Core pack impl
src/flb_pack.c
Thread escape_unicode through internal msgpack2json and exported functions; all internal calls updated.
Help/CLI JSON outputs
src/flb_help.c, src/fluent-bit.c
Pass FLB_TRUE to flb_msgpack_raw_to_json_sds call sites.
HTTP server endpoints
src/http_server/api/v1/*.{c}, src/http_server/api/v2/reload.c, src/http_server/flb_hs_endpoints.c
Update calls to flb_msgpack_raw_to_json_sds with third arg FLB_TRUE.
Filter plugins
plugins/filter_expect/expect.c, plugins/filter_nightfall/nightfall_api.c, plugins/filter_wasm/filter_wasm.c
Propagate config to use config->json_escape_unicode in JSON conversions; some call sites pass FLB_TRUE.
Azure outputs
plugins/out_azure/..., plugins/out_azure_blob/..., plugins/out_azure_kusto/..., plugins/out_azure_logs_ingestion/...
Add config param to formatters; pass config->json_escape_unicode to JSON conversion.
AWS outputs
plugins/out_cloudwatch_logs/*, plugins/out_kinesis_firehose/*, plugins/out_kinesis_streams/*
Add config param across processing/dispatch; use config->json_escape_unicode in flb_msgpack_to_json and write helpers; headers updated accordingly.
Other outputs (1)
plugins/out_bigquery/bigquery.c, out_calyptia/calyptia.c, out_chronicle/chronicle.c, out_datadog/datadog.c, out_es/es.c, out_file/file.c
Add/configure escape flag in JSON serialization; some functions gain config param; plain/file paths updated.
Other outputs (2)
plugins/out_http/http.c, out_kafka/kafka.c, out_kafka_rest/kafka.c, out_lib/out_lib.c, out_logdna/logdna.c, out_loki/loki.c
Propagate config where needed; all JSON conversions pass config->json_escape_unicode; multiple internal helpers updated in Loki.
Other outputs (3)
plugins/out_nats/nats.c, out_nrlogs/newrelic.c, out_null/null.c, out_opensearch/opensearch.c, out_oracle_log_analytics/oci_logan.c, out_pgsql/pgsql.c, out_s3/s3.c, out_skywalking/skywalking.c, out_slack/slack.c, out_splunk/splunk.c, out_stackdriver/stackdriver.c, out_stdout/stdout.c, out_tcp/tcp.c, out_udp/udp.c, out_vivo_exporter/vivo.c, out_websocket/websocket.c
Update format/flush paths to include escape flag; several functions gain config param; calls adjusted to new pack APIs.
Processor
plugins/processor_content_modifier/cm_utils.c
flb_msgpack_raw_to_json_sds now called with FLB_TRUE.
Record accessor
src/flb_record_accessor.c
flb_msgpack_to_json_str(..., FLB_TRUE) for map-to-JSON case.
Tests (unit/runtime/fuzzers)
tests/internal/*, tests/runtime/*
Update all calls to new signatures, generally passing FLB_TRUE; add tests for raw string mode in utils tests.
CI
.github/workflows/unit-tests.yaml
aarch64 workflow: move to ubuntu-24.04-arm (repo-conditional), toolchain bumped to gcc/g++/clang 14.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant User as Config file
    participant Core as flb_config
    participant API as Pack/Utils APIs
    participant Plugin as Output/Filter Plugins

    User->>Core: Set "json.escape_unicode" (bool)
    Core-->>Plugin: Pass struct flb_config*
    Plugin->>API: flb_msgpack_*_to_json(..., escape_unicode=config->json_escape_unicode)
    API-->>Plugin: JSON string (escaped or raw Unicode)
    Plugin-->>User: Output/Send data
    note over API: Default initialized to FLB_TRUE
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • edsiper
  • koleini
  • fujimotos
  • patrick-stephens

Poem

A rabbit taps keys with unicode cheer,
Hops through the bytes: “All glyphs welcome here!”
Escape if you wish, or let them run free—
汉字, emojis, and logs sipping tea. 🐇
Now streams flow smooth, from burrow to sea.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning The pull request includes modifications to the CI workflow (.github/workflows/unit-tests.yaml) updating the aarch64 job target and compiler versions, which are unrelated to providing a choice for raw UTF-8 encoding. These CI environment changes fall outside the scope of the linked issue’s objective to add a Unicode escape configuration. The CI workflow changes should be separated into a distinct pull request, and this PR should be limited to the implementation of the json.escape_unicode feature to maintain clarity and focus.
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title succinctly describes the primary change of adding a choice for raw UTF-8 string encoding in the utilities and plugins, directly reflecting the introduction of the json.escape_unicode configuration option. It is clear, concise, and focuses on the main feature without extraneous details.
Linked Issues Check ✅ Passed The changes fully implement the objective from issue #10631 by introducing the json.escape_unicode configuration option, adding it to the service_configs table, initializing its default value, and propagating the escape flag through all relevant JSON serialization paths across plugins and utilities. This enables users to disable Unicode escaping and emit raw multibyte UTF-8 in compliance with RFC 8259.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cosmo0920-provide-raw-utf8-choice

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
plugins/out_bigquery/bigquery.c (2)

850-852: Avoid passing the whole struct flb_config * when only one flag is needed

bigquery_format() now depends on the full global config, although it uses only json_escape_unicode. This tightens coupling and complicates unit-testing. Consider passing the plain int escape_unicode flag instead (mirroring flb_msgpack_raw_to_json_sds) or wrapping it into the existing ctx structure so that only plugin-local state flows through the call chain.


963-966: (void) config; is now misleading

The cast was previously used to silence an “unused parameter” warning, but config is now actively passed to bigquery_format(). Keeping the cast may confuse future readers and automated linters.

-    (void) config;

Comment on lines +941 to 943
out_buf = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size,
config->json_escape_unicode);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Missing NULL-check before dereferencing config

config is assumed non-NULL when accessing config->json_escape_unicode. The engine passes a valid pointer in normal paths, but defensive code would make the function robust to accidental NULLs or future refactors.

-    out_buf = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size,
-                                          config->json_escape_unicode);
+    if (!config) {
+        flb_plg_error(ctx->ins, "NULL config passed to bigquery_format()");
+        return -1;
+    }
+    out_buf = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size,
+                                          config->json_escape_unicode);

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In plugins/out_bigquery/bigquery.c around lines 941 to 943, the code
dereferences the config pointer without checking if it is NULL, which can cause
crashes if config is NULL. Add a NULL check for config before accessing
config->json_escape_unicode, and handle the NULL case appropriately, such as
using a default value or returning an error, to make the function more robust.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/runtime/in_systemd.c (1)

35-35: LGTM! Function signature update correctly applied.

The addition of the FLB_TRUE parameter maintains the existing Unicode escaping behavior in this test, which is appropriate since this test focuses on duplicated key handling rather than Unicode encoding behavior.

Consider adding a separate test case to verify both Unicode escaping behaviors (FLB_TRUE and FLB_FALSE) if the systemd input plugin is expected to handle Unicode characters, ensuring comprehensive coverage of the new functionality introduced by this PR.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02a902b and 1336593.

📒 Files selected for processing (1)
  • tests/runtime/in_systemd.c (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/runtime/in_systemd.c (1)
src/flb_pack.c (1)
  • flb_msgpack_raw_to_json_sds (814-873)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-centos-7

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/flb_utils.c (1)

791-1097: Fix two bugs in escaped path: surrogate pairs and index advance.

  • Codepoints > 0xFFFF are emitted as a single “\uXXXX” (wrong) and there’s an unreachable branch checking c > 0xFFFF (c is a byte). Emit surrogate pairs in the accepted decode path.
  • After processing a multibyte sequence, the loop does i += processed_bytes followed by i++ at the end, skipping one extra byte.

Apply this diff:

-            else if (c >= 0x80 && c <= 0xFFFF) {
+            else if ((c & 0x80) != 0) {
                 hex_bytes = flb_utf8_len(&str[i]);
@@
-                if (state == FLB_UTF8_ACCEPT) {
-                    len = snprintf(tmp, sizeof(tmp), "\\u%.4x", codepoint);
+                if (state == FLB_UTF8_ACCEPT) {
+                    if (codepoint <= 0xFFFF) {
+                        len = snprintf(tmp, sizeof(tmp), "\\u%.4x", codepoint);
+                    }
+                    else {
+                        high = 0xD800 + ((codepoint - 0x10000) >> 10);
+                        low  = 0xDC00 + ((codepoint - 0x10000) & 0x3FF);
+                        len = snprintf(tmp, sizeof(tmp), "\\u%.4x\\u%.4x", high, low);
+                    }
                     if (available < len) {
                         return FLB_FALSE;
                     }
                     memcpy(p, tmp, len);
                     p += len;
                     offset += len;
                     available -= len;
-                }
-                else {
-                    flb_warn("[pack] Invalid UTF-8 bytes found, skipping.");
-                }
-
-                i += processed_bytes;
+                }
+                else {
+                    /* replacement char for invalid sequence */
+                    if (available < 3) {
+                        return FLB_FALSE;
+                    }
+                    memcpy(p, "\xEF\xBF\xBD", 3);
+                    p += 3;
+                    offset += 3;
+                    available -= 3;
+                }
+
+                /* we consumed hex_bytes; adjust for the loop's trailing i++ */
+                i += processed_bytes - 1;
                 }
-            }
-            /* Handle sequences beyond 0xFFFF */
-            else if (c > 0xFFFF) {
-                utf_sequence_length = flb_utf8_len(str + i);
-                /* ... unreachable block removed ... */
-            }
+            }
🧹 Nitpick comments (20)
plugins/out_nrlogs/newrelic.c (2)

145-147: Make config parameter const to document immutability.

This helper only reads from config; marking it const tightens the contract and prevents accidental writes. Static scope keeps impact local.

-static flb_sds_t newrelic_compose_payload(struct flb_newrelic *ctx,
-                                          const void *data, size_t bytes,
-                                          struct flb_config *config)
+static flb_sds_t newrelic_compose_payload(struct flb_newrelic *ctx,
+                                          const void *data, size_t bytes,
+                                          const struct flb_config *config)

236-237: Defensive: avoid potential NULL dereference on config.

Core should pass a valid config, but a defensive default prevents crashes in tests/mocks.

-    json = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size,
-                                       config->json_escape_unicode);
+    int escape_unicode = FLB_TRUE;
+    if (config != NULL) {
+        escape_unicode = config->json_escape_unicode;
+    }
+    json = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size,
+                                       escape_unicode);
plugins/out_kinesis_firehose/firehose_api.h (1)

40-41: API addition (config param) is correct — declarations/callers updated; optional: make config const

  • Verified: signature updated in plugins/out_kinesis_firehose/firehose_api.h and definition in firehose_api.c; caller updated in plugins/out_kinesis_firehose/firehose.c (no remaining 4‑arg calls found).
  • Optional refactor: if the config pointer is only read, change the parameter to const across affected signatures: process_and_send_records (firehose_api.h/.c), process_event (firehose_api.c), add_event (firehose_api.c) and any internal helpers that accept config.
plugins/out_loki/loki.c (3)

480-482: Guard config and skip non-string map keys in pack_maps

  • Dereferencing config without a null-check can crash if a caller ever passes NULL.
  • pack_label_key assumes string keys; non-string keys should be skipped to avoid UB.

Apply:

 static void pack_maps(struct flb_loki *ctx,
                       msgpack_packer *mp_pck,
                       char *tag, int tag_len,
                       msgpack_object *map,
                       struct flb_mp_map_header *mh,
-                      struct mk_list *list,
-                      struct flb_config *config)
+                      struct mk_list *list,
+                      struct flb_config *config)
 {
+    /* Defensive: config is expected non-NULL, but tolerate just in case */
+    int escape = config ? config->json_escape_unicode : FLB_CONF_UNICODE_STR_JSON_ESCAPE;
@@
-                        pack_label_key(mp_pck, (char*) accessed_map_kv.key.via.str.ptr,
+                        /* Ensure key is string */
+                        if (accessed_map_kv.key.type != MSGPACK_OBJECT_STR) {
+                            continue;
+                        }
+                        pack_label_key(mp_pck, (char*) accessed_map_kv.key.via.str.ptr,
                                        accessed_map_kv.key.via.str.size);
@@
-                            accessed_map_val_json = flb_msgpack_to_json_str(1024,
-                                                                            &accessed_map_kv.val,
-                                                                            config->json_escape_unicode);
+                            accessed_map_val_json = flb_msgpack_to_json_str(1024,
+                                                                            &accessed_map_kv.val,
+                                                                            escape);

Also applies to: 542-545


1422-1424: Null-safe use of config in pack_record

Avoid unconditional deref of config; use a local escape flag.

 static int pack_record(struct flb_loki *ctx,
                        msgpack_packer *mp_pck, msgpack_object *rec,
                        flb_sds_t *dynamic_tenant_id,
                        struct flb_mp_accessor *remove_mpa,
                        struct flb_config *config)
 {
+    int escape = config ? config->json_escape_unicode : FLB_CONF_UNICODE_STR_JSON_ESCAPE;
@@
-        line = flb_msgpack_to_json_str(size_hint, rec, config->json_escape_unicode);
+        line = flb_msgpack_to_json_str(size_hint, rec, escape);

Also applies to: 1514-1514


2252-2254: Avoid leaking dynamic_tenant_id in format test

On success, dynamic_tenant_id isn’t freed.

     *out_data = payload;
     *out_size = flb_sds_len(payload);
 
+    if (dynamic_tenant_id != NULL) {
+        flb_sds_destroy(dynamic_tenant_id);
+    }
     return 0;
tests/internal/mp.c (2)

221-223: Assert JSON conversion success in tests.

Capture and assert the return value from flb_msgpack_to_json to avoid silent failures.

-    flb_msgpack_to_json(&final_json[0], sizeof(final_json), &result_final.data, FLB_TRUE);
+    ret = flb_msgpack_to_json(&final_json[0], sizeof(final_json), &result_final.data, FLB_TRUE);
+    TEST_CHECK(ret > 0);
-    flb_msgpack_to_json(&final_json[0], sizeof(final_json), &result_final.data, FLB_TRUE);
+    ret = flb_msgpack_to_json(&final_json[0], sizeof(final_json), &result_final.data, FLB_TRUE);
+    TEST_CHECK(ret > 0);
-    flb_msgpack_to_json(&final_json[0], sizeof(final_json), &result_final.data, FLB_TRUE);
+    ret = flb_msgpack_to_json(&final_json[0], sizeof(final_json), &result_final.data, FLB_TRUE);
+    TEST_CHECK(ret > 0);

Also applies to: 313-314, 415-416


522-524: Consider adding a test variant with escape disabled.

Add a subtest using FLB_FALSE to validate raw UTF-8 emission path alongside the escaped path.

tests/internal/opentelemetry.c (1)

191-192: Broaden coverage: test with unicode escaping off.

These expectations exercise FLB_TRUE only. Add a parameterized run (FLB_FALSE) for inputs containing non-ASCII to ensure parity and surface differences intentionally introduced by the option.

Also applies to: 212-214, 311-314, 572-572, 606-606, 611-611, 616-616, 621-621

plugins/out_influxdb/influxdb.c (1)

245-248: Minor hygiene: null out freed temp.

Set str to NULL after flb_free to avoid accidental reuse in future edits.

-                flb_free(str);
-                str_size = 0;
+                flb_free(str);
+                str = NULL;
+                str_size = 0;
src/flb_help.c (1)

828-829: Use config->json_escape_unicode instead of hardcoded FLB_TRUE.

Make help schema JSON respect the global setting for consistency with other endpoints.

-    json = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size, FLB_TRUE);
+    json = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size, config->json_escape_unicode);
src/http_server/api/v2/reload.c (1)

116-118: Honor global json_escape_unicode in HTTP responses.

Avoid hardcoded FLB_TRUE to keep behavior consistent with the configured setting.

-    out_buf = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size, FLB_TRUE);
+    out_buf = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size, config->json_escape_unicode);
-    out_buf = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size, FLB_TRUE);
+    out_buf = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size, config->json_escape_unicode);

Also applies to: 151-152

src/http_server/api/v1/storage.c (1)

107-110: Consider propagating json_escape_unicode to storage metrics JSON.

This path always escapes. If feasible, thread config->json_escape_unicode into this callback (e.g., via hs context or by delaying JSON conversion to response time) for consistency.

src/http_server/api/v1/plugins.c (1)

93-93: Use config->json_escape_unicode for consistency with global setting.

This endpoint currently forces escaping. Since you already have config, consider honoring the global flag.

Apply this diff:

-    out_buf = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size, FLB_TRUE);
+    out_buf = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size, config->json_escape_unicode);
src/http_server/api/v1/uptime.c (1)

91-91: Consider honoring json_escape_unicode instead of hard-coding.

Uptime payload is ASCII today, but for consistency with other endpoints, consider using the global flag.

Apply this diff:

-    out_buf = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size, FLB_TRUE);
+    out_buf = flb_msgpack_raw_to_json_sds(mp_sbuf.data, mp_sbuf.size, config->json_escape_unicode);
tests/internal/utils.c (4)

217-231: Pass-through of escape_unicode is correct; add minor robustness checks.

The new parameter is plumbed correctly into flb_utils_write_str. To harden the test helper:

  • Guard against allocation failure.
  • Also assert off equals strlen(buf) to catch accidental NULs or length mismatches.

Apply:

 static void write_str_test_cases_w_buf_size(struct write_str_case *cases, int buf_size,
                                             int escape_unicode)
 {
-    char *buf = flb_calloc(buf_size + 1, sizeof(char));
+    char *buf = flb_calloc(buf_size + 1, sizeof(char));
     int size = buf_size + 1;
     int off;
     int ret;
@@
-        ret = flb_utils_write_str(buf, &off, buf_size, tcase->input, tcase->input_len,
-                                  escape_unicode);
+        ret = flb_utils_write_str(buf, &off, buf_size, tcase->input, tcase->input_len,
+                                  escape_unicode);
+
+        if (buf == NULL) {
+            TEST_CHECK(buf != NULL);
+            return;
+        }
@@
         if (!TEST_CHECK(strlen(buf) == strlen(tcase->output))) {
@@
         }
+        if (!TEST_CHECK((size_t) off == strlen(buf))) {
+            TEST_MSG("off (%d) != strlen(buf) (%zu)", off, strlen(buf));
+        }

398-436: Raw UTF‑8 happy-path coverage is solid; add a couple of JSON-escape cases.

Since raw mode must still escape JSON control/meta chars, consider adding newline and backslash cases.

Apply:

         /* Test string with a quote */
         {
             "\"hello\"", 7,
             "\\\"hello\\\"",
             FLB_TRUE
         },
+        /* Newline must be escaped even in raw mode */
+        {
+            "line 1\nline 2", 13,
+            "line 1\\nline 2",
+            FLB_TRUE
+        },
+        /* Backslash must be escaped */
+        {
+            "a\\b", 3,
+            "a\\\\b",
+            FLB_TRUE
+        },

438-503: Minor: duplicated test case; optional extra NUL-char case.

  • Case 2 and Case 6 are identical inputs/expectations; drop one to avoid redundant coverage.
  • Optional: add a raw-mode case containing U+0000 to assert it’s escaped as \u0000 per JSON.

Apply:

@@
-        /*
-         * Case 6: Sequence with invalid continuation byte
-         */
-        {
-            /* Input: "a" + 日(E6 97 A5) + ASCII "b" */
-            "a\xE6\x97""b", 4,
-            "a""\xEF\xBF\xBD""\xEF\xBF\xBD""b",
-            FLB_TRUE
-        },
+        /* (removed duplicate of Case 2) */
@@
         /* End of cases */
         { 0 }
     };

And (optional) append before the terminator:

+        /*
+         * Case X: NUL must be escaped in JSON
+         */
+        {
+            "a\0b", 3,
+            "a\\u0000b",
+            FLB_TRUE
+        },

Please confirm that flb_utils_write_str_raw collapses incomplete sequences into two U+FFFD bytes (your expectations) and not a single replacement for the whole invalid span; adjust expected outputs if the implementation differs.


927-929: Test registration OK; consider aligning names.

Display name "write_raw_str_invalid_bytes" maps to test_write_raw_str_invalid_sequences. For consistency, either rename the test function or the display name.

plugins/processor_content_modifier/cm_utils.c (1)

153-155: Honor config->json_escape_unicode: add opt-in helper and keep BC wrapper.

Hard-coding FLB_TRUE here diverges from config->json_escape_unicode and can produce double-escaped sequences when JSON is re-serialized. Repository search shows no other callers of cm_utils_variant_convert_to_json, so adding an opt-in helper is safe.

Apply this refactor (keep BC wrapper):

+cfl_sds_t cm_utils_variant_convert_to_json_with_opts(struct cfl_variant *value, int escape_unicode)
+{
+    cfl_sds_t      json_result;
+    mpack_writer_t writer;
+    char          *data = NULL;
+    size_t         size = 0;
+
+    mpack_writer_init_growable(&writer, &data, &size);
+    pack_cfl_variant(&writer, value);
+    mpack_writer_destroy(&writer);
+
+    json_result = flb_msgpack_raw_to_json_sds(data, size, escape_unicode);
+    MPACK_FREE(data);
+    return json_result;
+}
 cfl_sds_t cm_utils_variant_convert_to_json(struct cfl_variant *value)
 {
-    cfl_sds_t      json_result;
-    mpack_writer_t writer;
-    char          *data;
-    size_t         size;
-
-    data = NULL;
-    size = 0;
-
-    mpack_writer_init_growable(&writer, &data, &size);
-    pack_cfl_variant(&writer, value);
-    mpack_writer_destroy(&writer);
-
-    /* Using JSON escape here to keep backward compatibility */
-    json_result = flb_msgpack_raw_to_json_sds(data, size, FLB_TRUE);
-    MPACK_FREE(data);
-    return json_result;
+    /* Keep BC: default to escaping unless callers opt in to config-driven behavior */
+    return cm_utils_variant_convert_to_json_with_opts(value, FLB_TRUE);
 }

Example call-site when config is available:

out = cm_utils_variant_convert_to_json_with_opts(val, config->json_escape_unicode);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3cf2ad4 and 4510a2a.

📒 Files selected for processing (77)
  • .github/workflows/unit-tests.yaml (3 hunks)
  • include/fluent-bit/flb_config.h (2 hunks)
  • include/fluent-bit/flb_pack.h (1 hunks)
  • include/fluent-bit/flb_utils.h (1 hunks)
  • plugins/filter_expect/expect.c (10 hunks)
  • plugins/filter_nightfall/nightfall_api.c (1 hunks)
  • plugins/filter_wasm/filter_wasm.c (1 hunks)
  • plugins/out_azure/azure.c (3 hunks)
  • plugins/out_azure_blob/azure_blob.c (1 hunks)
  • plugins/out_azure_kusto/azure_kusto.c (4 hunks)
  • plugins/out_azure_logs_ingestion/azure_logs_ingestion.c (3 hunks)
  • plugins/out_bigquery/bigquery.c (3 hunks)
  • plugins/out_calyptia/calyptia.c (3 hunks)
  • plugins/out_chronicle/chronicle.c (7 hunks)
  • plugins/out_cloudwatch_logs/cloudwatch_api.c (11 hunks)
  • plugins/out_cloudwatch_logs/cloudwatch_api.h (1 hunks)
  • plugins/out_cloudwatch_logs/cloudwatch_logs.c (1 hunks)
  • plugins/out_datadog/datadog.c (1 hunks)
  • plugins/out_es/es.c (1 hunks)
  • plugins/out_file/file.c (3 hunks)
  • plugins/out_http/http.c (4 hunks)
  • plugins/out_influxdb/influxdb.c (1 hunks)
  • plugins/out_kafka/kafka.c (1 hunks)
  • plugins/out_kafka_rest/kafka.c (3 hunks)
  • plugins/out_kinesis_firehose/firehose.c (1 hunks)
  • plugins/out_kinesis_firehose/firehose_api.c (7 hunks)
  • plugins/out_kinesis_firehose/firehose_api.h (1 hunks)
  • plugins/out_kinesis_streams/kinesis.c (1 hunks)
  • plugins/out_kinesis_streams/kinesis_api.c (7 hunks)
  • plugins/out_kinesis_streams/kinesis_api.h (1 hunks)
  • plugins/out_lib/out_lib.c (2 hunks)
  • plugins/out_logdna/logdna.c (4 hunks)
  • plugins/out_loki/loki.c (11 hunks)
  • plugins/out_nats/nats.c (3 hunks)
  • plugins/out_nrlogs/newrelic.c (3 hunks)
  • plugins/out_null/null.c (1 hunks)
  • plugins/out_opensearch/opensearch.c (2 hunks)
  • plugins/out_oracle_log_analytics/oci_logan.c (1 hunks)
  • plugins/out_pgsql/pgsql.c (1 hunks)
  • plugins/out_s3/s3.c (3 hunks)
  • plugins/out_skywalking/skywalking.c (5 hunks)
  • plugins/out_slack/slack.c (1 hunks)
  • plugins/out_splunk/splunk.c (5 hunks)
  • plugins/out_stackdriver/stackdriver.c (4 hunks)
  • plugins/out_stdout/stdout.c (1 hunks)
  • plugins/out_tcp/tcp.c (4 hunks)
  • plugins/out_udp/udp.c (3 hunks)
  • plugins/out_vivo_exporter/vivo.c (5 hunks)
  • plugins/out_websocket/websocket.c (1 hunks)
  • plugins/processor_content_modifier/cm_utils.c (1 hunks)
  • src/flb_config.c (3 hunks)
  • src/flb_help.c (1 hunks)
  • src/flb_pack.c (13 hunks)
  • src/flb_record_accessor.c (1 hunks)
  • src/flb_sds.c (1 hunks)
  • src/flb_utils.c (3 hunks)
  • src/fluent-bit.c (1 hunks)
  • src/http_server/api/v1/metrics.c (1 hunks)
  • src/http_server/api/v1/plugins.c (1 hunks)
  • src/http_server/api/v1/storage.c (1 hunks)
  • src/http_server/api/v1/trace.c (2 hunks)
  • src/http_server/api/v1/uptime.c (1 hunks)
  • src/http_server/api/v2/reload.c (2 hunks)
  • src/http_server/flb_hs_endpoints.c (1 hunks)
  • tests/internal/fuzzers/flb_json_fuzzer.c (3 hunks)
  • tests/internal/fuzzers/msgpack_parse_fuzzer.c (1 hunks)
  • tests/internal/fuzzers/utils_fuzzer.c (1 hunks)
  • tests/internal/log_event_decoder.c (2 hunks)
  • tests/internal/mp.c (4 hunks)
  • tests/internal/msgpack_append_message.c (1 hunks)
  • tests/internal/opentelemetry.c (8 hunks)
  • tests/internal/pack.c (6 hunks)
  • tests/internal/stream_processor.c (1 hunks)
  • tests/internal/utils.c (6 hunks)
  • tests/runtime/filter_lua.c (5 hunks)
  • tests/runtime/in_opentelemetry.c (3 hunks)
  • tests/runtime/in_systemd.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (57)
  • src/http_server/api/v1/trace.c
  • tests/internal/fuzzers/flb_json_fuzzer.c
  • plugins/out_kafka/kafka.c
  • src/flb_sds.c
  • plugins/out_slack/slack.c
  • plugins/out_null/null.c
  • plugins/filter_wasm/filter_wasm.c
  • plugins/out_cloudwatch_logs/cloudwatch_logs.c
  • tests/internal/stream_processor.c
  • plugins/out_cloudwatch_logs/cloudwatch_api.h
  • plugins/out_opensearch/opensearch.c
  • plugins/out_azure/azure.c
  • include/fluent-bit/flb_config.h
  • .github/workflows/unit-tests.yaml
  • plugins/out_file/file.c
  • plugins/out_pgsql/pgsql.c
  • tests/internal/fuzzers/utils_fuzzer.c
  • plugins/out_kafka_rest/kafka.c
  • plugins/out_kinesis_streams/kinesis.c
  • plugins/out_skywalking/skywalking.c
  • include/fluent-bit/flb_pack.h
  • plugins/out_stackdriver/stackdriver.c
  • plugins/out_udp/udp.c
  • plugins/out_kinesis_streams/kinesis_api.h
  • plugins/out_calyptia/calyptia.c
  • plugins/out_kinesis_firehose/firehose.c
  • plugins/out_oracle_log_analytics/oci_logan.c
  • src/flb_record_accessor.c
  • tests/runtime/in_opentelemetry.c
  • plugins/out_nats/nats.c
  • plugins/out_http/http.c
  • plugins/out_kinesis_firehose/firehose_api.c
  • tests/runtime/filter_lua.c
  • tests/internal/pack.c
  • plugins/out_azure_kusto/azure_kusto.c
  • plugins/filter_expect/expect.c
  • plugins/out_logdna/logdna.c
  • plugins/out_es/es.c
  • plugins/out_azure_logs_ingestion/azure_logs_ingestion.c
  • plugins/out_vivo_exporter/vivo.c
  • plugins/filter_nightfall/nightfall_api.c
  • plugins/out_datadog/datadog.c
  • plugins/out_splunk/splunk.c
  • plugins/out_azure_blob/azure_blob.c
  • tests/internal/log_event_decoder.c
  • plugins/out_cloudwatch_logs/cloudwatch_api.c
  • tests/internal/msgpack_append_message.c
  • plugins/out_tcp/tcp.c
  • plugins/out_lib/out_lib.c
  • src/http_server/flb_hs_endpoints.c
  • tests/internal/fuzzers/msgpack_parse_fuzzer.c
  • plugins/out_kinesis_streams/kinesis_api.c
  • plugins/out_bigquery/bigquery.c
  • src/http_server/api/v1/metrics.c
  • plugins/out_chronicle/chronicle.c
  • tests/runtime/in_systemd.c
  • src/flb_config.c
🧰 Additional context used
🧬 Code graph analysis (15)
src/flb_help.c (1)
src/flb_pack.c (1)
  • flb_msgpack_raw_to_json_sds (817-876)
src/http_server/api/v2/reload.c (1)
src/flb_pack.c (1)
  • flb_msgpack_raw_to_json_sds (817-876)
plugins/processor_content_modifier/cm_utils.c (1)
src/flb_pack.c (1)
  • flb_msgpack_raw_to_json_sds (817-876)
plugins/out_nrlogs/newrelic.c (1)
src/flb_pack.c (1)
  • flb_msgpack_raw_to_json_sds (817-876)
src/http_server/api/v1/storage.c (1)
src/flb_pack.c (1)
  • flb_msgpack_raw_to_json_sds (817-876)
src/fluent-bit.c (1)
src/flb_pack.c (1)
  • flb_msgpack_raw_to_json_sds (817-876)
tests/internal/mp.c (1)
src/flb_pack.c (2)
  • flb_msgpack_to_json (802-815)
  • flb_msgpack_raw_to_json_sds (817-876)
src/http_server/api/v1/plugins.c (1)
src/flb_pack.c (1)
  • flb_msgpack_raw_to_json_sds (817-876)
src/flb_utils.c (1)
include/fluent-bit/flb_simd.h (4)
  • flb_vector8_load (166-177)
  • flb_vector8_has_le (301-338)
  • flb_vector8_has (288-299)
  • flb_vector8_is_highbit_set (270-283)
tests/internal/utils.c (1)
src/flb_utils.c (1)
  • flb_utils_write_str (1278-1287)
plugins/out_loki/loki.c (1)
src/flb_pack.c (2)
  • flb_msgpack_to_json_str (1250-1291)
  • flb_msgpack_raw_to_json_sds (817-876)
plugins/out_s3/s3.c (1)
src/flb_pack.c (1)
  • flb_pack_msgpack_to_json_format (960-1241)
tests/internal/opentelemetry.c (2)
src/flb_pack.c (1)
  • flb_msgpack_to_json_str (1250-1291)
src/opentelemetry/flb_opentelemetry_utils.c (1)
  • flb_otel_utils_find_map_entry_by_key (28-73)
include/fluent-bit/flb_utils.h (1)
src/flb_utils.c (1)
  • flb_utils_write_str_buf (1289-1327)
src/http_server/api/v1/uptime.c (1)
src/flb_pack.c (1)
  • flb_msgpack_raw_to_json_sds (817-876)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (79)
  • GitHub Check: PR - container builds / Windows container images (2025)
  • GitHub Check: PR - container builds / Windows container images (2022)
  • GitHub Check: PR - packages build Linux / debian/trixie package build and stage to S3
  • GitHub Check: PR - packages build Linux / debian/bookworm.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / almalinux/8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / raspbian/bookworm package build and stage to S3
  • GitHub Check: PR - packages build Linux / debian/trixie.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / debian/buster package build and stage to S3
  • GitHub Check: PR - packages build Linux / debian/bullseye package build and stage to S3
  • GitHub Check: PR - packages build Linux / debian/bullseye.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / ubuntu/22.04 package build and stage to S3
  • GitHub Check: PR - packages build Linux / ubuntu/24.04 package build and stage to S3
  • GitHub Check: PR - packages build Linux / ubuntu/24.04.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / debian/buster.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / ubuntu/22.04.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / almalinux/9 package build and stage to S3
  • GitHub Check: PR - packages build Linux / debian/bookworm package build and stage to S3
  • GitHub Check: PR - packages build Linux / almalinux/9.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / rockylinux/10 package build and stage to S3
  • GitHub Check: PR - packages build Linux / almalinux/10.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / almalinux/8.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / almalinux/10 package build and stage to S3
  • GitHub Check: PR - packages build Linux / rockylinux/10.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / rockylinux/8.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / centos/10.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / rockylinux/9 package build and stage to S3
  • GitHub Check: PR - packages build Linux / centos/9.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / rockylinux/8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / centos/9 package build and stage to S3
  • GitHub Check: PR - packages build Linux / rockylinux/9.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / centos/8.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / centos/7 package build and stage to S3
  • GitHub Check: PR - packages build Linux / centos/8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / centos/10 package build and stage to S3
  • GitHub Check: PR - packages build Linux / amazonlinux/2023 package build and stage to S3
  • GitHub Check: PR - packages build Linux / amazonlinux/2023.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / amazonlinux/2.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / centos/7.arm64v8 package build and stage to S3
  • GitHub Check: PR - packages build Linux / amazonlinux/2 package build and stage to S3
  • GitHub Check: PR - container builds / arm/v7/production container image build
  • GitHub Check: PR - container builds / arm64/production container image build
  • GitHub Check: PR - container builds / arm/v7/debug container image build
  • GitHub Check: PR - container builds / arm64/debug container image build
  • GitHub Check: PR - container builds / amd64/production container image build
  • GitHub Check: PR - container builds / amd64/debug container image build
  • GitHub Check: PR - packages build Windows / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: PR - packages build Windows / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: PR - packages build Windows / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: PR - packages build MacOS / call-build-macos-package (Apple Silicon macOS runner, macos-14, 3.31.6)
  • GitHub Check: PR - packages build MacOS / call-build-macos-package (Intel macOS runner, macos-14-large, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7
🔇 Additional comments (24)
plugins/out_nrlogs/newrelic.c (1)

373-375: LGTM: config is correctly plumbed to payload builder.

Pass-through aligns with the global toggle for escaping. Please confirm end-to-end that New Relic accepts raw UTF‑8 when json_escape_unicode is Off.

plugins/out_loki/loki.c (7)

568-571: Config plumb-through looks correct

Forwarding config into pack_maps is consistent.


1643-1645: Config propagation through loki_compose_payload: LGTM

Signature change and usage align with the new API.


1735-1738: Nested calls now receive config: LGTM


1773-1775: Else-path nested calls receive config: LGTM


1781-1783: Final JSON generation honors escape flag: LGTM


1866-1868: cb_loki_flush passes config correctly: LGTM


2141-2145: structured_metadata_map_keys entry: LGTM

No functional concerns.

src/flb_pack.c (2)

630-691: Propagating escape_unicode through STR/BIN is correct; verify BIN with non‑UTF‑8 stays valid JSON.

If MSGPACK_OBJECT_BIN contains non‑UTF‑8 bytes and escape_unicode == FLB_FALSE, ensure flb_utils_write_str still produces valid JSON (e.g., escapes invalid sequences). Add a focused test for this case.

Also applies to: 692-700


817-876: LGTM — API threaded consistently end-to-end; ast-grep found no legacy arity calls.
Scanned for old arities (flb_msgpack_to_json, flb_msgpack_raw_to_json_sds, flb_pack_msgpack_to_json_format, flb_msgpack_to_json_str); no matches. Approve changes; manually verify external/generated code and plugins not present in the repo.

plugins/out_influxdb/influxdb.c (1)

216-222: Correct: honor json_escape_unicode for string fields.

Passing config->json_escape_unicode to flb_utils_write_str_buf aligns the plugin with the global option.

plugins/out_websocket/websocket.c (1)

253-259: LGTM: websocket output respects global unicode escaping.

Passing config->json_escape_unicode to flb_pack_msgpack_to_json_format is correct and consistent with other outputs.

src/fluent-bit.c (1)

379-381: Explicitly forcing JSON escaping here is fine; just confirming intent.

The inline comment says “Keep backward compatibility,” and passing FLB_TRUE hard-codes escaping regardless of config. If that’s the intended UX for CLI help output, LGTM; otherwise consider threading config->json_escape_unicode for consistency with the rest of the PR.

plugins/out_stdout/stdout.c (1)

260-262: LGTM: stdout JSON respects json_escape_unicode.

Plumbing the config flag into flb_pack_msgpack_to_json_format looks correct.

plugins/out_s3/s3.c (3)

3349-3351: Signature change LGTM.

Adding struct flb_config *config to propagate json_escape_unicode is correct.


3756-3760: LGTM: caller updated to pass config to extract_log_key.

Correctly threads the new parameter.


3766-3768: LGTM: S3 JSON formatting respects json_escape_unicode.

Propagating the global flag here matches the PR objective.

src/flb_utils.c (3)

1161-1273: LGTM: raw UTF‑8 writer path.

Valid UTF‑8 is copied verbatim; essential ASCII escapes preserved; invalid/truncated uses U+FFFD. Matches the feature goal.


1278-1287: Wrapper selection by flag looks good.

Clean split between escaped/raw paths via escape_unicode.


1289-1310: Buffer growth loop remains correct with new flag.

Delegates to the 6‑arg writer and retries on FLB_FALSE.

include/fluent-bit/flb_utils.h (1)

60-63: Public API change — flb_utils_write_str now has an extra parameter (breaking ABI).

In-repo callers are already updated (e.g. include/fluent-bit/flb_utils.h, src/flb_utils.c, src/flb_sds.c:315, plugins/out_cloudwatch_logs/cloudwatch_api.c:709, tests/internal/utils.c). This is still a breaking change for out-of-tree plugins linking against the shared library.

Either bump the SONAME and call out the change in release notes, or add a compatibility wrapper with the old 5-arg signature that forwards to the new 6-arg function (choosing a sensible default for escape_unicode) and keep it exported.

tests/internal/utils.c (2)

205-210: Good harness split: escaped vs raw paths are explicit.

Adding escape_unicode to the helper and introducing write_raw_str_test_cases keeps existing tests intact and makes raw-mode coverage straightforward.

Also applies to: 212-214


271-273: Call sites updated with the new flag look correct.

All direct flb_utils_write_str calls in this test use FLB_TRUE (escaped mode), preserving prior behavior.

Also applies to: 277-279, 282-284, 288-296

plugins/processor_content_modifier/cm_utils.c (1)

153-155: API update is correct; preserves prior behavior.

Passing FLB_TRUE to the new flb_msgpack_raw_to_json_sds(..., escape_unicode) matches the former default and keeps backward compatibility.

Comment on lines +561 to 563
msgpack_object *map,
struct flb_config *config)
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix wrong return type of pack_structured_metadata

Function returns 0 but declares flb_sds_t. Make it int for clarity and to prevent misuse.

-static flb_sds_t pack_structured_metadata(struct flb_loki *ctx,
+static int pack_structured_metadata(struct flb_loki *ctx,
                                           msgpack_packer *mp_pck,
                                           char *tag, int tag_len,
                                           msgpack_object *map,
                                           struct flb_config *config)

Note: pack_labels has the same pattern; consider making it int as well.

🤖 Prompt for AI Agents
In plugins/out_loki/loki.c around lines 561-563, the function
pack_structured_metadata is declared to return flb_sds_t but currently returns
integer constants (0), which is incorrect; change its return type to int, update
any return statements to return appropriate status codes (e.g., 0 on success, -1
on error), and update the function's callers if they expect a string to instead
check the integer status; also apply the same change to pack_labels (which
follows the same pattern) so both functions consistently return int status
codes.

Comment on lines 3456 to 3459
ret = flb_msgpack_to_json(val_buf + val_offset,
msgpack_size - val_offset, &val);
msgpack_size - val_offset, &val,
config->json_escape_unicode);
if (ret < 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle buffer growth when JSON serialization needs more space.

flb_msgpack_to_json returns <= 0 when the buffer is too small. Current code treats only ret < 0 as error and does not retry with a larger buffer, which can truncate output.

Apply this diff:

-                        ret = flb_msgpack_to_json(val_buf + val_offset,
-                                                  msgpack_size - val_offset, &val,
-                                                  config->json_escape_unicode);
-                        if (ret < 0) {
-                            break;
-                        }
+                        while (1) {
+                            ret = flb_msgpack_to_json(val_buf + val_offset,
+                                                      msgpack_size - val_offset, &val,
+                                                      config->json_escape_unicode);
+                            if (ret > 0) {
+                                break;
+                            }
+                            /* grow buffer and retry */
+                            size_t new_size = msgpack_size * 2;
+                            char *tmp_buf = flb_realloc(val_buf, new_size);
+                            if (!tmp_buf) {
+                                flb_errno();
+                                ret = -1;
+                                break;
+                            }
+                            val_buf = tmp_buf;
+                            msgpack_size = new_size;
+                        }
+                        if (ret < 0) {
+                            break;
+                        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ret = flb_msgpack_to_json(val_buf + val_offset,
msgpack_size - val_offset, &val);
msgpack_size - val_offset, &val,
config->json_escape_unicode);
if (ret < 0) {
while (1) {
ret = flb_msgpack_to_json(val_buf + val_offset,
msgpack_size - val_offset, &val,
config->json_escape_unicode);
if (ret > 0) {
break;
}
/* grow buffer and retry */
size_t new_size = msgpack_size * 2;
char *tmp_buf = flb_realloc(val_buf, new_size);
if (!tmp_buf) {
flb_errno();
ret = -1;
break;
}
val_buf = tmp_buf;
msgpack_size = new_size;
}
if (ret < 0) {
break;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-required ok-package-test Run PR packaging tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fluent Bit fails to ingest logs containing Chinese characters

5 participants