Skip to content

filter_log_to_metrics: Use optimized memory allocations#11414

Open
cosmo0920 wants to merge 1 commit intomasterfrom
cosmo0920-use-optimized-memory-allocations-on-log_to_metrics
Open

filter_log_to_metrics: Use optimized memory allocations#11414
cosmo0920 wants to merge 1 commit intomasterfrom
cosmo0920-use-optimized-memory-allocations-on-log_to_metrics

Conversation

@cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Jan 30, 2026

Currently, filter_log_to_metrics frequently allocates heap memory.
This causes memory fragmentation and take a longer time to allocate memory which corresponds to running period.
Instead, we need to optimize this kind of heap memory allocations and suppress CPU stale for waiting I/O operations for memory.

Before

Samples: 6K of event 'cpu_core/cycles/P', Event count (approx.): 77560880457, Thread: flb-pipeline
  Children      Self  Command       Shared Object         Symbol
<snip>
+   78.44%     0.06%  flb-pipeline  fluent-bit            [.] cb_log_to_metrics_filter
+   73.09%     0.34%  flb-pipeline  fluent-bit            [.] fill_labels
+   50.95%     0.48%  flb-pipeline  fluent-bit            [.] flb_ra_create
+   30.15%     0.26%  flb-pipeline  fluent-bit            [.] flb_env_create
+   16.83%     0.14%  flb-pipeline  fluent-bit            [.] flb_ra_get_value_object
+   16.40%     0.19%  flb-pipeline  fluent-bit            [.] flb_ra_key_to_value_ext
<snip>

After

Samples: 3K of event 'cpu_core/cycles/P', Event count (approx.): 14971900615
  Children      Self  Command       Shared Object         Symbol
<snip>
+   65.17%     0.84%  flb-pipeline  fluent-bit            [.] cb_log_to_metrics_filter
+   49.61%     0.53%  flb-pipeline  fluent-bit            [.] flb_ra_get_value_object
+   48.75%     1.61%  flb-pipeline  fluent-bit            [.] flb_ra_key_to_value_ext
<snip>

Call stack is simplified and the main difference is:

+   78.44%     0.06%  flb-pipeline  fluent-bit            [.] cb_log_to_metrics_filter
+   73.09%     0.34%  flb-pipeline  fluent-bit            [.] fill_labels

vs

+   65.17%     0.84%  flb-pipeline  fluent-bit            [.] cb_log_to_metrics_filter

So, we achieved to create optimized version of filter_log_to_metrics plugin for preventing fragmented heap regions.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

With tons of label_fields, there is no memory leaks:

==1363657== 
==1363657== HEAP SUMMARY:
==1363657==     in use at exit: 0 bytes in 0 blocks
==1363657==   total heap usage: 212,195 allocs, 212,195 frees, 95,802,946 bytes allocated
==1363657== 
==1363657== All heap blocks were freed -- no leaks are possible
==1363657== 
==1363657== For lists of detected and suppressed errors, rerun with: -s
==1363657== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Increased label capacity (32 → 128).
    • Two-stage label setup with pre-allocated runtime buffers and pre-created label accessors (Kubernetes-backed accessors prioritized) to reduce per-call allocations.
    • Automatic emitter aliasing when no explicit emitter name is provided.
  • Bug Fixes

    • Stronger label validation to prevent overflow and mismatches.
    • Improved cleanup of runtime label buffers and accessors to avoid memory leaks; removed legacy per-call label pathway for more stable label handling.

@coderabbitai
Copy link

coderabbitai bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

Pre-allocates and manages label runtime structures for log_to_metrics: adds label counting and preparation, creates/destroys pre-built record accessors and a contiguous label-values buffer, shifts label resolution to init-time accessors, consolidates emitter aliasing, and centralizes init/cleanup flows.

Changes

Cohort / File(s) Summary
Header Structure Updates
plugins/filter_log_to_metrics/log_to_metrics.h
Added <stddef.h>, increased MAX_LABEL_COUNT (32→128), and extended struct log_to_metrics_ctx with label_ras, label_values_buf, and label_values.
Label runtime helpers & init/refactor
plugins/filter_log_to_metrics/log_to_metrics.c
Added count_labels(...) and prepare_label_runtime(...); updated set_labels() signature; introduced two-pass label setup (count → allocate/configure) and centralized initialization.
Pre-allocated runtime storage & accessors
plugins/filter_log_to_metrics/log_to_metrics.c
Replaced per-call allocations with pre-allocated label_values_buf and label_values; introduced persistent ctx->label_ras and pre-created record accessors (Kubernetes-backed accessors placed first).
Filter path adjustments
plugins/filter_log_to_metrics/log_to_metrics.c
cb_log_to_metrics_filter() and helpers now use ctx->label_ras and ctx->label_values to populate labels via pre-created accessors; removed per-call accessor creation and direct k8s probing.
Emitter aliasing & wiring
plugins/filter_log_to_metrics/log_to_metrics.c
Emitter alias resolution now prefers explicit emitter_name, derives alias from filter name when absent, and applies alias to emitter input configuration with safe temporary buffers.
Teardown & error-path cleanup
plugins/filter_log_to_metrics/log_to_metrics.c
Expanded log_to_metrics_destroy() and error paths to free/destroy label_ras, label_values, label_values_buf, and per-label keys/accessors to avoid leaks.
Removed legacy per-call logic
plugins/filter_log_to_metrics/log_to_metrics.c
Removed former per-call fill_labels and related per-call accessor construction; label population routed exclusively through pre-created runtime accessors.

Sequence Diagram(s)

sequenceDiagram
    participant Init as Filter Init
    participant Count as count_labels()
    participant Prep as prepare_label_runtime()
    participant RA as RecordAccessors
    participant Emitter as EmitterSetup
    participant Filter as cb_log_to_metrics_filter

    Init->>Count: compute label_counter and k8s_count
    Count-->>Init: return counts (or error)
    Init->>Prep: allocate label_values_buf, label_values, label_ras
    Prep->>RA: create per-label record accessors (k8s first)
    RA-->>Prep: label_ras[] ready
    Prep-->>Init: runtime structures prepared
    Init->>Emitter: derive/apply emitter alias and configure emitter
    Note right of Filter: Runtime filter processing
    Filter->>RA: use pre-created label_ras to extract values into label_values
    RA-->>Filter: populated ctx->label_values[]
    Filter->>Emitter: emit metrics with populated labels
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

backport to v4.2.x

Suggested reviewers

  • edsiper
  • fujimotos

Poem

🐇 I counted labels, neat and quick,
I knitted buffers, soft and thick.
Accessors wait, all set in rows,
Metrics hop out where data flows.
Hooray — no leaks, just carrot glows! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: optimizing memory allocations in the filter_log_to_metrics plugin by moving from per-call allocations to pre-allocated runtime structures.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cosmo0920-use-optimized-memory-allocations-on-log_to_metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@plugins/filter_log_to_metrics/log_to_metrics.c`:
- Around line 810-821: The code assigns the result of flb_sds_printf directly to
emitter_alias_tmp which can return NULL and cause the original SDS to leak;
instead, call flb_sds_printf into a temporary pointer (e.g., tmp), check if tmp
is NULL, and if so call flb_sds_destroy(emitter_alias_tmp), flb_errno(),
log_to_metrics_destroy(ctx) and return -1; on success assign emitter_alias_tmp =
tmp. This uses the existing symbols emitter_alias_tmp, flb_sds_create_size,
flb_sds_printf, flb_sds_destroy and preserves current error handling via
log_to_metrics_destroy(ctx).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
plugins/filter_log_to_metrics/log_to_metrics.c (2)

360-475: ⚠️ Potential issue | 🟡 Minor

Tighten the bounds guard before indexing label arrays.
If the computed total and fill pass ever diverge (e.g., config mutation or unexpected properties), the current > check can still allow one out-of-bounds write before the final mismatch check. Use >= to fail fast before indexing.

🛠️ Proposed fix
-        if (counter > ctx->label_counter) {
+        if (counter >= ctx->label_counter) {
             flb_plg_error(ctx->ins, "internal label counter overflow");
             return -1;
         }

956-1079: ⚠️ Potential issue | 🟠 Major

Confirm thread-safety vulnerability in shared ctx->label_values buffer.

The label_values buffer is allocated once per filter instance and reused across all concurrent invocations from multiple input sources. Since Fluent Bit has input worker threads that independently process chunks and invoke filters (via flb_filter_do), multiple workers can simultaneously call cb_log_to_metrics_filter with the same ctx. The vulnerable window is between writing to label_values (lines 982–999) and passing it to cmt_counter_inc, cmt_gauge_set, or cmt_histogram_observe (lines 1009, 1030, 1051). A concurrent writer can corrupt label values mid-operation.

Locking exists on chunks and tasks but not on filter instances or their context. To fix: either allocate label_values per-invocation (stack or local scope), use thread-local storage, add a mutex around the vulnerable window, or buffer label values before the cmt call completes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
plugins/filter_log_to_metrics/log_to_metrics.c (2)

360-479: ⚠️ Potential issue | 🟠 Major

Fix error‑path leaks when label allocations fail.

set_labels returns on allocation failures without freeing partially allocated label_keys/label_accessors. If init aborts at this point, these allocations leak. Please add a shared cleanup path for partial allocations.

🛠️ Suggested cleanup on allocation failures
@@
-    ctx->label_accessors = flb_calloc(ctx->label_counter, sizeof(char *));
-    if (!ctx->label_accessors) {
-        flb_errno();
-        return -1;
-    }
+    ctx->label_accessors = flb_calloc(ctx->label_counter, sizeof(char *));
+    if (!ctx->label_accessors) {
+        flb_errno();
+        goto error;
+    }
@@
-            if (!ctx->label_keys[counter]) {
-                flb_errno();
-                return -1;
-            }
+            if (!ctx->label_keys[counter]) {
+                flb_errno();
+                goto error;
+            }
@@
-            if (!ctx->label_accessors[counter]) {
-                flb_errno();
-                return -1;
-            }
+            if (!ctx->label_accessors[counter]) {
+                flb_errno();
+                goto error;
+            }
@@
-            if (!ctx->label_keys[counter]) {
-                flb_errno();
-                flb_utils_split_free(split);
-                return -1;
-            }
+            if (!ctx->label_keys[counter]) {
+                flb_errno();
+                flb_utils_split_free(split);
+                goto error;
+            }
@@
-            if (!ctx->label_accessors[counter]) {
-                flb_errno();
-                flb_utils_split_free(split);
-                return -1;
-            }
+            if (!ctx->label_accessors[counter]) {
+                flb_errno();
+                flb_utils_split_free(split);
+                goto error;
+            }
@@
-    return ctx->label_counter;
+    return ctx->label_counter;
+
+error:
+    if (ctx->label_keys) {
+        for (i = 0; i < ctx->label_counter; i++) {
+            flb_free(ctx->label_keys[i]);
+        }
+        flb_free(ctx->label_keys);
+        ctx->label_keys = NULL;
+    }
+    if (ctx->label_accessors) {
+        for (i = 0; i < ctx->label_counter; i++) {
+            flb_free(ctx->label_accessors[i]);
+        }
+        flb_free(ctx->label_accessors);
+        ctx->label_accessors = NULL;
+    }
+    return -1;

940-996: ⚠️ Potential issue | 🟡 Minor

Use PRId64 for portable int64_t formatting.

Formatting rval->val.i64 with %ld and casting to (long) truncates on 32‑bit systems. Add #include <inttypes.h> and use PRId64 with proper type casting.

🛠️ Suggested fix
+#include <inttypes.h>
@@
-                    snprintf(ctx->label_values[i], MAX_LABEL_LENGTH - 1, "%ld",
-                             (long) rval->val.i64);
+                    snprintf(ctx->label_values[i], MAX_LABEL_LENGTH - 1, "%" PRId64,
+                             (int64_t) rval->val.i64);

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@plugins/filter_log_to_metrics/log_to_metrics.c`:
- Around line 438-441: Several error paths in the label parsing code (e.g.,
inside the label_field and add_label handling) return -1 directly and leak the
allocated arrays label_keys and label_accessors; change those direct returns
(the overflow checks, invalid label split, and counter mismatch checks) to jump
to the existing error cleanup path using goto error so that label_keys and
label_accessors are freed and ctx->ins error logging is preserved; update the
checks around label_field, add_label, the split validation, and the counter
mismatch to use goto error instead of return -1.
- Around line 168-184: The error cleanup path in the function fails to destroy
and free any partially-created record accessors stored in ctx->label_ras,
causing leaks; update the error block to iterate over the successfully created
entries (0..ctx->label_counter-1) and call flb_ra_destroy on each non-NULL
ctx->label_ras[i], then free ctx->label_ras and set it to NULL (similar to how
ctx->label_accessors is handled), and ensure ctx->label_counter is
handled/cleared as needed before returning -1.
- Around line 419-422: The direct return after flb_strdup failure leaks
allocated resources; replace the immediate "return -1" with "goto error" so the
existing cleanup path runs (preserve the flb_errno() call), ensuring
ctx->label_keys and ctx->label_accessors are freed by the error handler at the
function's "error" label (update any nearby error label if needed to free these
arrays).

@cosmo0920 cosmo0920 force-pushed the cosmo0920-use-optimized-memory-allocations-on-log_to_metrics branch from 1ffa521 to 65ff9c8 Compare February 5, 2026 06:54
@cosmo0920 cosmo0920 force-pushed the cosmo0920-use-optimized-memory-allocations-on-log_to_metrics branch from 65ff9c8 to 1530d6a Compare February 5, 2026 06:56
@edsiper
Copy link
Member

edsiper commented Feb 5, 2026

thanks @cosmo0920 !

pls cleanup the commit history so we can get this merged for v5

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Feb 6, 2026

I cleaned up the commit history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants