Skip to content

Conversation

cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Jul 28, 2025

We added an interface for configurable buffer limit for multiline.
Also, we implemented robust processing for multiline concatenations.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Configurable multiline buffer size limit for concatenated log messages.
    • Support for binary-size strings (KiB/MiB/GiB) when specifying sizes.
    • Per-input and per-filter multiline-truncated metrics; truncated events are annotated and emit warnings.
  • Bug Fixes

    • Multiline append/flush logic now propagates truncation status to prevent silent data loss.
  • Tests

    • Added tests for buffer-limit truncation and binary-size parsing.

Copy link

coderabbitai bot commented Aug 1, 2025

Walkthrough

Adds a configurable multiline buffer limit and binary-size parser; threads a FLB_MULTILINE_TRUNCATED status through multiline processing, enforces per-group buffer limits on append, records truncation via metrics/logs (filter and tail), updates parser creation API, and adds tests for truncation and size parsing.

Changes

Cohort / File(s) Change Summary
Config field & default
include/fluent-bit/flb_config.h, src/flb_config.c
Added multiline_buffer_limit config macro and char *multiline_buffer_limit field; registered config entry and initialized it to default string.
Multiline core types & defaults
include/fluent-bit/multiline/flb_ml.h
Added buffer-limit default macros, return codes (FLB_MULTILINE_OK, FLB_MULTILINE_PROCESSED, FLB_MULTILINE_TRUNCATED), buffer_limit in flb_ml, and truncated/stream fields in flb_ml_stream_group.
Group API & impl
include/fluent-bit/multiline/flb_ml_group.h, src/multiline/flb_ml_group.c
Declared and implemented flb_ml_group_cat() to append with buffer-limit enforcement, possible truncation, and return of OK/TRUNCATED status.
Multiline processing logic
src/multiline/flb_ml.c, src/multiline/flb_ml_rule.c, src/multiline/flb_ml_stream.c
Propagated truncation status through append/processing functions; attempt parsers for text without early failure; replaced direct buffer writes with flb_ml_group_cat(); set/clear group truncated flags; annotate flushed events with "multiline_truncated": true; initialize ml->buffer_limit from config.
Multiline parser API (params)
include/fluent-bit/multiline/flb_ml_parser.h, src/multiline/flb_ml_parser.c
Added flb_ml_parser_params, flb_ml_parser_params_default(), and flb_ml_parser_create_params(); legacy create wraps to new params API.
Filter plugin metrics
plugins/filter_multiline/ml.h, plugins/filter_multiline/ml.c
Added truncated metric ID/counter (FLB_MULTILINE_METRIC_TRUNCATED, cmt_truncated), register and increment on truncation alongside existing emitted metric handling.
Tail plugin metrics & instrumentation
plugins/in_tail/tail_config.h, plugins/in_tail/tail_config.c, plugins/in_tail/tail_file.c
Added per-tail truncated metric constant/counter (FLB_TAIL_METRIC_M_TRUNCATED, cmt_multiline_truncated), register legacy metric, log warning and increment metrics on truncation in process_content.
Size parsing utility & tests
include/fluent-bit/flb_utils.h, src/flb_utils.c, tests/internal/utils.c
Added flb_utils_size_to_binary_bytes() (KiB/MiB/GiB parsing with overflow guards) and tests validating binary-size parsing.
Tests - multiline truncation
tests/internal/multiline.c
Added buffer_limit_truncation test exercising truncation and adjusted flush callback expectations/guards.

Sequence Diagram(s)

sequenceDiagram
    participant Config
    participant Tail as TailInput
    participant ML as MultilineCore
    participant Group as StreamGroup
    participant Filter
    participant Metrics

    Config->>ML: init buffer_limit (string -> bytes)
    Tail->>ML: flb_ml_append_text/append_object(data)
    ML->>Group: flb_ml_group_cat(data,len)
    alt appended fully
        Group-->>ML: FLB_MULTILINE_OK
        ML->>Filter: emit / process
        Filter->>Metrics: inc emitted metric
    else truncated or partially appended
        Group-->>ML: FLB_MULTILINE_TRUNCATED
        ML->>Tail: log warning
        ML->>Metrics: inc truncated metric
        ML->>Group: mark truncated flag
    end
    ML->>Group: flush -> include "multiline_truncated": true if set
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • leonardo-albertovich
  • edsiper
  • koleini
  • fujimotos

Poem

I nibble bytes beneath the moon,
I guard the buffer, trim by tune.
When lines overflow, I flag the tide,
Log, count, and tag what could not hide.
Hooray — tests hop in, delighted, wide. 🐇

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cosmo0920-add-limit-for-multiline-concatenation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

/* Return codes */
#define FLB_MULTILINE_OK 0
#define FLB_MULTILINE_PROCESSED 1 /* Reserved */
#define FLB_MULTILINE_TRUNCATED 2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to status code 2 is needed because status code 1 will be collided for FLB_TRUE status.

Signed-off-by: Hiroshi Hatake <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
plugins/filter_multiline/ml.c (2)

387-391: Typo in metric help string (“occurence” → “occurrences”).

Correct the help text.

         ctx->cmt_truncated = cmt_counter_create(ins->cmt,
                                                 "fluentbit", "filter", "emit_truncated_total",
-                                                "Total number of truncated occurence of multiline",
+                                                "Total number of truncated multiline occurrences",
                                                 1, (char *[]) {"name"});

914-925: Avoid potential double-counting of TRUNCATED (buffered path).

If truncation is also surfaced during later flush or other paths, this may count twice. Please confirm invariants that FLB_MULTILINE_TRUNCATED is signaled exactly once per truncated message, per stream.

You can grep other increments to validate uniqueness:

#!/bin/bash
rg -nC2 -e 'FLB_MULTILINE_METRIC_TRUNCATED|emit_truncated_total|cmt_truncated|FLB_MULTILINE_TRUNCATED' --type=c
🧹 Nitpick comments (2)
plugins/filter_multiline/ml.c (2)

807-810: Keep label name const; cast only at call sites.

Avoid casting away const on the pointer variable; declare as const char* and cast per API call.

 #ifdef FLB_HAVE_METRICS
-    uint64_t ts;
-    char *name;
+    uint64_t ts;
+    const char *name;
 #endif
-                name = (char *) flb_filter_name(ctx->ins);
+                name = flb_filter_name(ctx->ins);
                 ts = cfl_time_now();
-                cmt_counter_inc(ctx->cmt_truncated, ts, 1, (char *[]) {name});
+                cmt_counter_inc(ctx->cmt_truncated, ts, 1, (char *[]) {(char *) name});
-                name = (char *) flb_filter_name(ctx->ins);
+                name = flb_filter_name(ctx->ins);
                 ts = cfl_time_now();
-                cmt_counter_inc(ctx->cmt_truncated, ts, 1, (char *[]) {name});
+                cmt_counter_inc(ctx->cmt_truncated, ts, 1, (char *[]) {(char *) name});
-                name = (char *) flb_filter_name(ctx->ins);
+                name = flb_filter_name(ctx->ins);
                 ts = cfl_time_now();
-                cmt_counter_inc(ctx->cmt_emitted, ts, 1, (char *[]) {name});
+                cmt_counter_inc(ctx->cmt_emitted, ts, 1, (char *[]) {(char *) name});

Also applies to: 855-858, 918-921, 932-935


926-939: Streamline conditional ordering for readability (buffered path).

Check OK first, then fall back to debug; same behavior, clearer flow.

-            else if (ret != FLB_MULTILINE_OK) {
-                flb_plg_debug(ctx->ins,
-                              "could not append object from tag: %s", tag);
-            }
-            else if (ret == FLB_MULTILINE_OK) {
+            else if (ret == FLB_MULTILINE_OK) {
 #ifdef FLB_HAVE_METRICS
                 name = (char *) flb_filter_name(ctx->ins);
                 ts = cfl_time_now();
                 cmt_counter_inc(ctx->cmt_emitted, ts, 1, (char *[]) {name});
 
                 /* old api */
                 flb_metrics_sum(FLB_MULTILINE_METRIC_EMITTED, 1, ctx->ins->metrics);
 #endif
-            }
+            }
+            else {
+                flb_plg_debug(ctx->ins,
+                              "could not append object from tag: %s", tag);
+            }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4788161 and 3575cca.

📒 Files selected for processing (2)
  • plugins/filter_multiline/ml.c (4 hunks)
  • src/multiline/flb_ml_rule.c (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/multiline/flb_ml_rule.c
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/filter_multiline/ml.c (4)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_metrics.c (2)
  • flb_metrics_add (129-178)
  • flb_metrics_sum (180-191)
lib/cmetrics/src/cmt_counter.c (2)
  • cmt_counter_create (26-81)
  • cmt_counter_inc (100-117)
src/flb_filter.c (1)
  • flb_filter_name (490-497)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-centos-7
🔇 Additional comments (2)
plugins/filter_multiline/ml.c (2)

851-862: Good: warn + metrics on truncation (unbuffered path).

The warning plus TRUNCATED metrics update looks correct and matches the new status propagation.


930-939: Confirm EMITTED metric scope (buffered-only).

EMITTED is only created/used when buffered; unbuffered path doesn’t increment it. If parity across modes is desired, mirror it for the unbuffered flow; otherwise, document that emit_records_total reflects buffered emissions only.

Comment on lines +372 to +377
if (!ctx->cmt_emitted) {
flb_errno();
flb_free(ctx);

return -1;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix early-exit leaks when cmt_emitted creation fails.

If cmt_counter_create() fails, the emitter instance created earlier is not torn down. Clean up ins_emitter before freeing ctx to avoid leaking an input instance.

         if (!ctx->cmt_emitted) {
             flb_errno();
-            flb_free(ctx);
-
-            return -1;
+            /* unwind emitter created earlier */
+            if (ctx->ins_emitter) {
+                flb_input_instance_exit(ctx->ins_emitter, ctx->config);
+                flb_input_instance_destroy(ctx->ins_emitter);
+                ctx->ins_emitter = NULL;
+            }
+            flb_free(ctx);
+            return -1;
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!ctx->cmt_emitted) {
flb_errno();
flb_free(ctx);
return -1;
}
if (!ctx->cmt_emitted) {
flb_errno();
/* unwind emitter created earlier */
if (ctx->ins_emitter) {
flb_input_instance_exit(ctx->ins_emitter, ctx->config);
flb_input_instance_destroy(ctx->ins_emitter);
ctx->ins_emitter = NULL;
}
flb_free(ctx);
return -1;
}
🤖 Prompt for AI Agents
In plugins/filter_multiline/ml.c around lines 372 to 377, the code returns early
when cmt_emitted creation fails but never tears down the previously created
emitter instance, leaking an input instance; before flb_free(ctx) call, destroy
the emitter instance (e.g. call flb_input_instance_destroy(ctx->ins_emitter) or
the repo's matching teardown function), set ctx->ins_emitter = NULL, then free
ctx and return -1.

Comment on lines +391 to +396
if (!ctx->cmt_truncated) {
flb_errno();
flb_free(ctx);

return -1;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix early-exit leaks when cmt_truncated creation fails (both modes).

Failure here can leak either the emitter (buffered mode) or the msgpack sbuffer (unbuffered mode). Unwind before returning.

         if (!ctx->cmt_truncated) {
             flb_errno();
-            flb_free(ctx);
-
-            return -1;
+            /* unwind resources allocated before metrics registration */
+            if (ctx->use_buffer == FLB_TRUE && ctx->ins_emitter) {
+                flb_input_instance_exit(ctx->ins_emitter, ctx->config);
+                flb_input_instance_destroy(ctx->ins_emitter);
+                ctx->ins_emitter = NULL;
+            }
+            else if (ctx->use_buffer == FLB_FALSE) {
+                /* destroy msgpack buffer initialized for unbuffered mode */
+                msgpack_sbuffer_destroy(&ctx->mp_sbuf);
+            }
+            flb_free(ctx);
+            return -1;
         }
🤖 Prompt for AI Agents
In plugins/filter_multiline/ml.c around lines 391 to 396, the early return when
ctx->cmt_truncated creation fails leaks resources; before returning you must
unwind and free mode-specific resources: call flb_errno() as shown, then if
ctx->out_mode is buffered (emitter created), destroy/free the emitter, otherwise
if unbuffered free/destroy the msgpack sbuffer used; finally free ctx and return
-1. Ensure both branches clean up their respective allocated objects before
freeing ctx and returning.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
src/multiline/flb_ml_parser.c (2)

29-41: Consider using designated initializers for better clarity and safety.

The current implementation uses positional field assignments after memset. Using designated initializers would be clearer and more maintainable.

 struct flb_ml_parser_params flb_ml_parser_params_default(const char *name)
 {
-    struct flb_ml_parser_params p;
-    memset(&p, 0, sizeof(p));
-
-    p.size      = sizeof(p);
-    p.name      = (char *) name;
-    p.type      = FLB_ML_REGEX;          /* sane default */
-    p.negate    = 0;
-    p.flush_ms  = FLB_ML_FLUSH_TIMEOUT;  /* header constant */
-    /* other pointers remain NULL by default */
+    struct flb_ml_parser_params p = {
+        .size      = sizeof(struct flb_ml_parser_params),
+        .name      = (char *) name,
+        .type      = FLB_ML_REGEX,          /* sane default */
+        .negate    = 0,
+        .flush_ms  = FLB_ML_FLUSH_TIMEOUT,  /* header constant */
+        /* other fields implicitly zero-initialized */
+    };
     return p;
 }

44-51: Size validation is good, but consider additional parameter validation.

The size check p->size < sizeof(*p) provides version safety. However, consider validating the type field is within expected bounds to prevent potential issues downstream.

 struct flb_ml_parser *flb_ml_parser_create_params(struct flb_config *ctx,
                                                   const struct flb_ml_parser_params *p)
 {
     struct flb_ml_parser *ml_parser;
 
     if (!ctx || !p || p->size < sizeof(*p) || !p->name) {
         return NULL;
     }
+
+    /* Validate type is within expected range */
+    if (p->type < FLB_ML_REGEX || p->type > FLB_ML_EQ) {
+        flb_error("[ml_parser] invalid type: %d", p->type);
+        return NULL;
+    }
include/fluent-bit/multiline/flb_ml_parser.h (1)

39-40: Document the valid values for the type field.

While the comment mentions the valid values, consider adding an explicit note about the valid range for defensive programming.

     /* creation parameters (mirror of old positional args) */
     char *name;
-    int   type;         /* FLB_ML_REGEX / FLB_ML_ENDSWITH / FLB_ML_EQ */
+    int   type;         /* FLB_ML_REGEX / FLB_ML_ENDSWITH / FLB_ML_EQ (valid: 0-2) */
     char *match_str;    /* used for ENDSWITH/EQ; NULL for REGEX */
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3575cca and 18dc299.

📒 Files selected for processing (2)
  • include/fluent-bit/multiline/flb_ml_parser.h (1 hunks)
  • src/multiline/flb_ml_parser.c (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
include/fluent-bit/multiline/flb_ml_parser.h (1)
src/multiline/flb_ml_parser.c (2)
  • flb_ml_parser_params_default (29-41)
  • flb_ml_parser_create_params (44-122)
src/multiline/flb_ml_parser.c (2)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)
src/flb_sds.c (2)
  • flb_sds_create (78-90)
  • flb_sds_destroy (389-399)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (4)
src/multiline/flb_ml_parser.c (1)

203-216: LGTM! Clean wrapper implementation.

The legacy API wrapper correctly maps all parameters to the new parameter bag structure and delegates to the new implementation. This maintains backward compatibility while leveraging the new code path.

include/fluent-bit/multiline/flb_ml_parser.h (3)

26-27: Good use of forward declaration.

The forward declaration avoids circular dependencies and reduces compilation dependencies.


34-51: Well-designed parameter structure with versioning support.

The structure design with size field for versioning and comprehensive documentation is excellent. This allows for future extensibility while maintaining backward compatibility.


49-50: Reserved flags field for future use.

Good practice to reserve a flags field for future feature toggles without breaking ABI compatibility.

Comment on lines +60 to +69
ml_parser->name = flb_sds_create(p->name);
ml_parser->type = p->type;
if (!ml_parser->name) {
flb_free(ml_parser);
return NULL;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Inconsistent error cleanup pattern.

When flb_sds_create(p->name) fails, the function directly frees ml_parser. However, in subsequent error paths (lines 71-74, 82-84), it calls flb_ml_parser_destroy() for cleanup. This inconsistency could lead to maintenance issues.

Either use flb_ml_parser_destroy() consistently for all cleanup paths after the name is set, or directly free resources consistently. The destroy function already handles NULL pointers safely.

     ml_parser->name = flb_sds_create(p->name);
-    ml_parser->type = p->type;
     if (!ml_parser->name) {
         flb_free(ml_parser);
         return NULL;
     }
+    ml_parser->type = p->type;

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/multiline/flb_ml_parser.c around lines 60-65, the error cleanup is
inconsistent: when flb_sds_create(p->name) fails the code calls
flb_free(ml_parser) but later error paths call flb_ml_parser_destroy(), which is
the agreed safe/complete cleanup. Replace the direct flb_free(ml_parser) with a
call to flb_ml_parser_destroy(ml_parser) (then return NULL) so all post-name-set
failure paths use the same destroy routine that safely handles NULLs and frees
all allocated resources.

Comment on lines +92 to +97
mk_list_init(&ml_parser->regex_rules);
mk_list_add(&ml_parser->_head, &ctx->multiline_parsers);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Potential use-after-free if list operations fail.

The parser is added to the global list before all initialization is complete. If subsequent initialization fails (lines 96-116), flb_ml_parser_destroy() will be called which removes it from the list (line 366). However, if the list operations themselves fail or if there's concurrent access, this could lead to issues.

Consider moving the list addition to the end of the function after all allocations succeed:

-    /* prepare rules list & link into registry */
+    /* prepare rules list */
     mk_list_init(&ml_parser->regex_rules);
-    mk_list_add(&ml_parser->_head, &ctx->multiline_parsers);
 
     /* optional keys */
     if (p->key_content) {
         ml_parser->key_content = flb_sds_create(p->key_content);
         if (!ml_parser->key_content) {
             flb_ml_parser_destroy(ml_parser);
             return NULL;
         }
     }
     // ... other key initializations ...
 
     /* keep back-pointer to config for later rule init */
     ml_parser->config = ctx;
+
+    /* link into registry after all initialization succeeds */
+    mk_list_add(&ml_parser->_head, &ctx->multiline_parsers);
 
     return ml_parser;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mk_list_init(&ml_parser->regex_rules);
mk_list_add(&ml_parser->_head, &ctx->multiline_parsers);
/* prepare rules list */
mk_list_init(&ml_parser->regex_rules);
/* optional keys */
if (p->key_content) {
ml_parser->key_content = flb_sds_create(p->key_content);
if (!ml_parser->key_content) {
flb_ml_parser_destroy(ml_parser);
return NULL;
}
}
// ... other key initializations ...
/* keep back-pointer to config for later rule init */
ml_parser->config = ctx;
/* link into registry after all initialization succeeds */
mk_list_add(&ml_parser->_head, &ctx->multiline_parsers);
return ml_parser;
🤖 Prompt for AI Agents
In src/multiline/flb_ml_parser.c around lines 92-93, the parser is being added
to the global ctx->multiline_parsers list before the rest of initialization
completes, which risks use-after-free if later init fails or list ops fail; move
the mk_list_add(&ml_parser->_head, &ctx->multiline_parsers) call to the end of
flb_ml_parser_create after all allocations and initializations (lines ~96-116)
have succeeded, and keep mk_list_init(&ml_parser->regex_rules) where it is;
additionally, if the codebase uses concurrent access, acquire the appropriate
mutex/lock around the list insertion or verify the list functions' return status
and on any subsequent failure perform a safe removal and cleanup path that
checks list membership before calling flb_ml_parser_destroy().

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/multiline/flb_ml_parser.c (1)

63-69: Fix unsafe destroy before list insertion; defer list linking; unify cleanup.

Calling flb_ml_parser_destroy() before mk_list_add() invokes mk_list_del() on an uninitialized node, risking memory corruption. Also, error paths mix manual free and destroy. Defer registry insertion until all allocations succeed and use a single fail path.

Apply these diffs:

  1. Use a single fail path instead of early frees/destroy:
@@
-    ml_parser->name = flb_sds_create(p->name);
-    ml_parser->type = p->type;
-    if (!ml_parser->name) {
-        flb_free(ml_parser);
-        return NULL;
-    }
+    ml_parser->name = flb_sds_create(p->name);
+    ml_parser->type = p->type;
+    if (!ml_parser->name) {
+        goto fail;
+    }
@@
-        ml_parser->match_str = flb_sds_create(p->match_str);
-        if (!ml_parser->match_str) {
-            flb_sds_destroy(ml_parser->name);
-            flb_free(ml_parser);
-            return NULL;
-        }
+        ml_parser->match_str = flb_sds_create(p->match_str);
+        if (!ml_parser->match_str) {
+            goto fail;
+        }
@@
-        ml_parser->parser_name = flb_sds_create(p->parser_name);
-        if (!ml_parser->parser_name) {
-            flb_ml_parser_destroy(ml_parser);
-            return NULL;
-        }
+        ml_parser->parser_name = flb_sds_create(p->parser_name);
+        if (!ml_parser->parser_name) {
+            goto fail;
+        }
  1. Don’t link into the registry until success:
@@
-    /* prepare rules list & link into registry */
-    mk_list_init(&ml_parser->regex_rules);
-    mk_list_add(&ml_parser->_head, &ctx->multiline_parsers);
+    /* prepare rules list (linking deferred until success) */
+    mk_list_init(&ml_parser->regex_rules);
  1. Route optional-key failures to the fail path:
@@
-        ml_parser->key_content = flb_sds_create(p->key_content);
-        if (!ml_parser->key_content) {
-            flb_ml_parser_destroy(ml_parser);
-            return NULL;
-        }
+        ml_parser->key_content = flb_sds_create(p->key_content);
+        if (!ml_parser->key_content) {
+            goto fail;
+        }
@@
-        ml_parser->key_group = flb_sds_create(p->key_group);
-        if (!ml_parser->key_group) {
-            flb_ml_parser_destroy(ml_parser);
-            return NULL;
-        }
+        ml_parser->key_group = flb_sds_create(p->key_group);
+        if (!ml_parser->key_group) {
+            goto fail;
+        }
@@
-        ml_parser->key_pattern = flb_sds_create(p->key_pattern);
-        if (!ml_parser->key_pattern) {
-            flb_ml_parser_destroy(ml_parser);
-            return NULL;
-        }
+        ml_parser->key_pattern = flb_sds_create(p->key_pattern);
+        if (!ml_parser->key_pattern) {
+            goto fail;
+        }
  1. Link into registry at the end, then return. Add a fail label for cleanup:
@@
-    /* keep back-pointer to config for later rule init */
-    ml_parser->config = ctx;
-
-    return ml_parser;
+    /* keep back-pointer to config for later rule init */
+    ml_parser->config = ctx;
+
+    /* link into registry after all initialization succeeds */
+    mk_list_add(&ml_parser->_head, &ctx->multiline_parsers);
+    return ml_parser;
+
+fail:
+    if (ml_parser) {
+        if (ml_parser->key_pattern) { flb_sds_destroy(ml_parser->key_pattern); }
+        if (ml_parser->key_group)   { flb_sds_destroy(ml_parser->key_group); }
+        if (ml_parser->key_content) { flb_sds_destroy(ml_parser->key_content); }
+        if (ml_parser->parser_name) { flb_sds_destroy(ml_parser->parser_name); }
+        if (ml_parser->match_str)   { flb_sds_destroy(ml_parser->match_str); }
+        if (ml_parser->name)        { flb_sds_destroy(ml_parser->name); }
+        flb_free(ml_parser);
+    }
+    return NULL;

Also applies to: 83-88, 95-98, 100-120, 122-126

🧹 Nitpick comments (2)
src/multiline/flb_ml_parser.c (2)

31-45: Avoid const-cast; prefer const-correct param fields.

p.name is assigned by casting away const. Consider making all string fields in flb_ml_parser_params const char* to reflect read-only intent and drop the cast here. If changing the public struct is too wide for this PR, at least document that inputs must outlive creation and won’t be modified.


51-55: Size-gate uses hard-coded width; use the actual field’s sizeof.

Replace sizeof(uint32_t) with sizeof(p->flags) to avoid mismatch if the flags type changes.

Apply this diff:

-    size_t min = offsetof(struct flb_ml_parser_params, flags) + sizeof(uint32_t);
+    size_t min = offsetof(struct flb_ml_parser_params, flags) + sizeof(p->flags);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 18dc299 and ee3ff12.

📒 Files selected for processing (2)
  • include/fluent-bit/multiline/flb_ml_parser.h (1 hunks)
  • src/multiline/flb_ml_parser.c (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • include/fluent-bit/multiline/flb_ml_parser.h
🧰 Additional context used
🧬 Code graph analysis (1)
src/multiline/flb_ml_parser.c (2)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)
src/flb_sds.c (2)
  • flb_sds_create (78-90)
  • flb_sds_destroy (389-399)
🔇 Additional comments (3)
src/multiline/flb_ml_parser.c (3)

23-23: LGTM: required header include.

Including flb_ml_parser.h is appropriate for the new params-based API.


28-30: LGTM: standard headers for offsetof/memset.

stddef.h and string.h are correctly added for offsetof() and memset().


196-221: Wrapper mapping looks correct.

Legacy flb_ml_parser_create() correctly maps arguments into params and delegates. No functional issues spotted.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/multiline/flb_ml_parser.c (2)

84-88: Bug: destroy called before list linkage can corrupt the list.

flb_ml_parser_destroy() unconditionally mk_list_del(&ml_parser->_head). At this point _head hasn’t been linked/initialized, so del may dereference garbage.

Minimal fix options:

  • Initialize _head right after allocation so del is harmless pre-link, and/or
  • Move mk_list_add to after successful initialization and avoid calling destroy before linking (use local cleanup).

See consolidated patch below.


57-61: Make cleanup safe + consistent; link after full init.

  • Initialize _head immediately to make destructor safe pre-link.
  • Replace early frees with flb_ml_parser_destroy for consistency.
  • Defer registry linkage until the end to avoid partially-initialized entries.
@@
     ml_parser = flb_calloc(1, sizeof(struct flb_ml_parser));
     if (!ml_parser) {
         flb_errno();
         return NULL;
     }
+    /* Make destructor safe before linking */
+    mk_list_init(&ml_parser->_head);
@@
-    if (!ml_parser->name) {
-        flb_free(ml_parser);
-        return NULL;
-    }
+    if (!ml_parser->name) {
+        flb_ml_parser_destroy(ml_parser);
+        return NULL;
+    }
@@
-        if (!ml_parser->match_str) {
-            flb_sds_destroy(ml_parser->name);
-            flb_free(ml_parser);
-            return NULL;
-        }
+        if (!ml_parser->match_str) {
+            flb_ml_parser_destroy(ml_parser);
+            return NULL;
+        }
@@
-    /* prepare rules list & link into registry */
-    mk_list_init(&ml_parser->regex_rules);
-    mk_list_add(&ml_parser->_head, &ctx->multiline_parsers);
+    /* prepare rules list */
+    mk_list_init(&ml_parser->regex_rules);
@@
     /* keep back-pointer to config for later rule init */
     ml_parser->config = ctx;
 
+    /* link into registry after all allocations succeed */
+    mk_list_add(&ml_parser->_head, &ctx->multiline_parsers);
+
     return ml_parser;

Also applies to: 66-69, 75-78, 95-98, 122-125

🧹 Nitpick comments (2)
include/fluent-bit/multiline/flb_ml_parser.h (1)

34-51: Future-proof the parameter bag: widen size field and add const-correctness.

  • Using uint16_t for size risks truncation as the struct evolves; prefer uint32_t.
  • Input pointers should be const char * to reflect read-only semantics and avoid casts.

Apply:

-struct flb_ml_parser_params {
-    uint16_t size;      /* sizeof(struct flb_ml_parser_params) */
+struct flb_ml_parser_params {
+    uint32_t size;      /* sizeof(struct flb_ml_parser_params) */
@@
-    char *name;
+    const char *name;
@@
-    char *match_str;    /* used for ENDSWITH/EQ; NULL for REGEX */
+    const char *match_str;    /* used for ENDSWITH/EQ; NULL for REGEX */
@@
-    char *key_content;
-    char *key_group;
-    char *key_pattern;
+    const char *key_content;
+    const char *key_group;
+    const char *key_pattern;
@@
-    char *parser_name;             /* delayed init */
+    const char *parser_name;       /* delayed init */

Also ensure <stdint.h> is included in this header (either directly here or via a prerequisite header).

src/multiline/flb_ml_parser.c (1)

31-45: Defaults helper looks sane.

Tiny nit if you adopt const-correctness: drop the cast on name.

-    p.name      = (char *) name;
+    p.name      = name;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ee3ff12 and 3384321.

📒 Files selected for processing (2)
  • include/fluent-bit/multiline/flb_ml_parser.h (1 hunks)
  • src/multiline/flb_ml_parser.c (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
include/fluent-bit/multiline/flb_ml_parser.h (1)
src/multiline/flb_ml_parser.c (2)
  • flb_ml_parser_params_default (32-44)
  • flb_ml_parser_create_params (47-126)
src/multiline/flb_ml_parser.c (2)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)
src/flb_sds.c (2)
  • flb_sds_create (78-90)
  • flb_sds_destroy (389-399)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-centos-7
🔇 Additional comments (5)
include/fluent-bit/multiline/flb_ml_parser.h (2)

26-28: Forward decl looks good.

Prevents header cycles and keeps the public surface minimal.


53-59: New APIs are clear and composable.

Default helper + params-based creator provide a clean v2 surface.

src/multiline/flb_ml_parser.c (3)

23-23: Correct header include.

Brings the new public APIs into scope.


28-30: Needed C headers included.

Required for offsetof and memset.


196-221: Legacy wrapper delegation LGTM.

Thin mapping to params keeps old API behavior intact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant