Skip to content

Conversation

@edsiper
Copy link
Member

@edsiper edsiper commented Oct 20, 2025

Fixes #11049

Add a safe check for active chunks on deletion. If a chunk is manually closed/deleted the fstore interface should validate and no crash.


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced file cleanup to properly handle chunks closed by external processes.
  • Tests

    • Added test coverage for external chunk closure behavior.

@edsiper edsiper changed the title Fstore safe delete fstore: safe check on fsf file deletion Oct 20, 2025
@edsiper edsiper added this to the Fluent Bit v4.2 milestone Oct 20, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 20, 2025

Walkthrough

The changes add defensive checks to prevent segmentation faults when a file's chunk is closed by external entities before cleanup. A new helper function validates chunk linkage to its stream before performing close operations, supported by a new stream reference field in the file structure. A test case verifies behavior when a chunk is closed externally.

Changes

Cohort / File(s) Change Summary
Defensive chunk linkage validation
src/flb_fstore.c
Added static helper chunk_is_linked_to_stream() to check if a chunk remains linked to its stream. Modified flb_fstore_file_inactive() and flb_fstore_file_delete() to conditionally close chunks only when linked. Added fsf->stream field initialization in map_chunks() to track stream association. Introduced new public field flb_fstore_file.stream to enable linkage verification.
External close scenario test
tests/internal/fstore.c
Added test cb_delete_after_external_close() that verifies file behavior when its chunk is closed externally via CIO, confirming graceful handling without crashes. Updated test suite list to include the new test. Added #include <errno.h>.

Sequence Diagram

sequenceDiagram
    participant Caller as External Caller
    participant FStore as flb_fstore
    participant Helper as chunk_is_linked_to_stream
    participant File as flb_fstore_file
    participant Stream as Stream Chunks

    Caller->>FStore: flb_fstore_file_inactive(file)<br/>or flb_fstore_file_delete(file)
    FStore->>Helper: chunk_is_linked_to_stream(file)
    Helper->>Stream: iterate stream->chunks
    alt Chunk is linked
        Stream-->>Helper: found
        Helper-->>FStore: FLB_TRUE
        FStore->>File: close chunk & set to NULL
        File-->>FStore: success
    else Chunk not linked (external close)
        Stream-->>Helper: not found
        Helper-->>FStore: FLB_FALSE
        FStore-->>Caller: skip close (prevent crash)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes involve defensive memory management patterns addressing crash scenarios (#11049). Review requires careful validation of the linkage check logic, proper initialization of the new stream field, and understanding how the defensive pattern prevents segmentation faults. The test case provides coverage but demands verification that it authentically reproduces the external close scenario.

Suggested labels

backport to v4.0.x

Suggested reviewers

  • fujimotos
  • koleini

Poem

🐰 A chunk once orphaned, lost to the void,
Now checks its stream before it's destroyed,
External closes no longer breed fear,
With linkage validation keeping crashes clear!
Safe files await, no SIGSEGV today,
Fluent Bit keeps crashing at bay. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues Check ✅ Passed The changes directly address the requirements from linked issue #11049 by implementing state validation before attempting chunk deletion. The new chunk_is_linked_to_stream() helper function validates whether a chunk is actually linked to its stream, preventing deletion attempts on orphaned chunks that may have been closed externally. The modifications to flb_fstore_file_inactive() and flb_fstore_file_delete() use this validation to conditionally close chunks only when safe, eliminating the crash scenario where deletion of non-existent files would trigger a segmentation fault. The accompanying test case cb_delete_after_external_close() validates that the system properly handles the external close scenario that was causing the original crash.
Out of Scope Changes Check ✅ Passed All changes in the pull request are directly scoped to the objective of fixing the segmentation fault by adding safe state validation during file deletion. The new chunk_is_linked_to_stream() helper function, the conditional deletion logic modifications, the addition of the fsf->stream field to track stream association, and the new test case for external close scenarios are all necessary and directly related to implementing robust state validation before deletion attempts. No unrelated or tangential changes appear to be present in this changeset.
Title Check ✅ Passed The pull request title "fstore: safe check on fsf file deletion" directly and accurately describes the primary change in the changeset. The modifications across both source files implement a safe check mechanism by introducing the chunk_is_linked_to_stream() helper function and applying it in deletion and deactivation paths to validate chunk state before attempting removal. This directly addresses the core objective of preventing segmentation faults when chunks are externally closed or deleted. The title is concise, uses clear terminology without vague language, and a teammate scanning the history would quickly understand that this PR adds safety validation to the file store deletion logic.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fstore-safe-delete

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/internal/fstore.c (1)

82-129: Good regression test; add one more assertion and a companion case.

  • Optional: set errno = 0 before stat() to make the ENOENT check crisper.
  • Add a sibling test that externally closes with CIO_TRUE, then calls flb_fstore_file_inactive() (not delete) to assert no crash and proper cleanup.

Example tweaks:

@@
-    ret = stat(FSF_STORE_PATH "/abc/example.txt", &st_data);
+    errno = 0;
+    ret = stat(FSF_STORE_PATH "/abc/example.txt", &st_data);
     TEST_CHECK(ret == -1 && errno == ENOENT);

And consider adding:

void cb_inactive_after_external_close() {
    /* same setup as this test, but call flb_fstore_file_inactive(fs, fsf) */
}
src/flb_fstore.c (1)

242-245: Apply the same defensive check across all CIO operations.

Great fix here. For consistency and safety, gate other methods that dereference fsf->chunk (append, content_copy, meta get/set) with chunk_is_linked_to_stream() to avoid UB after external deletion.

Proposed patches:

@@ int flb_fstore_file_content_copy(struct flb_fstore *fs,
-    ret = cio_chunk_get_content_copy(fsf->chunk, out_buf, out_size);
+    if (chunk_is_linked_to_stream(fsf) == FLB_FALSE) {
+        flb_warn("[fstore] content_copy skipped; chunk unlinked: %s", fsf->name);
+        return -1;
+    }
+    ret = cio_chunk_get_content_copy(fsf->chunk, out_buf, out_size);
@@ int flb_fstore_file_append(struct flb_fstore_file *fsf, void *data, size_t size)
-    /* Check if the chunk is up */
+    /* Validate chunk still linked before any CIO call */
+    if (chunk_is_linked_to_stream(fsf) == FLB_FALSE) {
+        flb_warn("[fstore] append skipped; chunk unlinked: %s", fsf->name);
+        return -1;
+    }
+    /* Check if the chunk is up */
@@ int flb_fstore_file_meta_set(struct flb_fstore *fs,
-    /* Check if the chunk is up */
+    /* Validate chunk still linked before any CIO call */
+    if (chunk_is_linked_to_stream(fsf) == FLB_FALSE) {
+        flb_warn("[fstore] meta_set skipped; chunk unlinked: %s", fsf->name);
+        return -1;
+    }
+    /* Check if the chunk is up */
@@ int flb_fstore_file_meta_get(struct flb_fstore *fs,
-    /* Check if the chunk is up */
+    /* Validate chunk still linked before any CIO call */
+    if (chunk_is_linked_to_stream(fsf) == FLB_FALSE) {
+        flb_warn("[fstore] meta_get skipped; chunk unlinked: %s", fsf->name);
+        return -1;
+    }
+    /* Check if the chunk is up */
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d87ec0d and d83191c.

📒 Files selected for processing (2)
  • src/flb_fstore.c (3 hunks)
  • tests/internal/fstore.c (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/internal/fstore.c (1)
src/flb_fstore.c (5)
  • flb_fstore_create (479-525)
  • flb_fstore_stream_create (335-402)
  • flb_fstore_destroy (527-564)
  • flb_fstore_file_create (153-190)
  • flb_fstore_file_delete (259-277)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (4)
tests/internal/fstore.c (2)

33-33: LGTM: errno include added.

Required for ENOENT assertion in the new test.


133-133: LGTM: test registered.

Ensures the scenario runs in CI.

src/flb_fstore.c (2)

218-236: Sound guard against UAF by verifying chunk linkage.

The stream-chunk membership check prevents closing an already-detached/freed chunk. Nice.

Are fstore operations and CIO list mutations guaranteed to run on the same thread or under a lock? If not, iterating stream->chunks without synchronization could race. Please confirm the threading model.


442-442: Code change is correct and verified—struct linkage properly initialized.

The assignment at line 442 (fsf->stream = stream;) is essential and correctly implemented. The function chunk_is_linked_to_stream() at line 227 directly accesses fsf->stream->chunks for iteration, confirming that initializing fsf->stream is required. The struct field is properly defined in the header, and all in-tree usages (s3, azure_kusto, azure_blob, calyptia) access the structure through pointers without direct field reinterpretation, so no breaking changes to in-tree code are detected.

Comment on lines +263 to +266
if (chunk_is_linked_to_stream(fsf) == FLB_TRUE) {
cio_chunk_close(fsf->chunk, CIO_TRUE);
fsf->chunk = NULL;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Comment says “don’t delete”; code deletes. Fix the comment.

The block uses CIO_TRUE (delete real file), but the comment says “don’t delete”. Correct the comment to avoid confusion.

Apply:

-    /* close the Chunk I/O reference, but don't delete it the real file */
+    /* close the Chunk I/O reference and delete the real file */

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/flb_fstore.c around lines 263 to 266, the surrounding comment incorrectly
states “don’t delete” while the code calls cio_chunk_close(fsf->chunk, CIO_TRUE)
which deletes the underlying file; update the comment to accurately reflect that
the chunk is closed and the real file is deleted (e.g., "close chunk and delete
underlying file") so the comment matches the behavior and avoids confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segmentation fault after [cio file] error deleting file at close on Fluent Bit v4.1.1

3 participants