-
Notifications
You must be signed in to change notification settings - Fork 1.8k
scheduler: engine: output_thread: Plug macOS abort on dry-run #11099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This is because macOS's TLS of pthread sometimes returns garbage values when uninitailized. So, we need to introduce a flag or a magic number to distinguish whether initailized correctly or not. Signed-off-by: Hiroshi Hatake <[email protected]>
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdded a TLS magic constant and a magic field to the scheduler timer callback params; the field is initialized when params are set and checked before freeing scheduler TLS in engine shutdown and output-thread paths. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Engine
participant TLS as sched_params (TLS)
participant Free as free()/cleanup
rect rgb(211,228,205)
note over Engine,TLS: Shutdown / output-thread path (validation added)
Engine->>TLS: check sched_params != NULL
alt sched_params exists
Engine->>TLS: check magic == FLB_SCHED_TLS_MAGIC
alt magic matches
Engine->>Free: free(sched_params)
Free-->>Engine: freed
else magic invalid
Engine-->>Engine: skip free (no-op)
end
else no sched_params
Engine-->>Engine: no-op
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
include/fluent-bit/flb_scheduler.h (2)
195-195: Consider adding a comment to document the magic constant.The magic constant lacks documentation explaining its purpose and why this specific value was chosen. This would help future maintainers understand the validation mechanism introduced to fix the macOS TLS garbage issue.
Apply this diff to add documentation:
+/* Magic constant to validate TLS initialization state (fixes macOS garbage data issue) */ #define FLB_SCHED_TLS_MAGIC 0x53544350u
202-202: Use unsigned type for the magic field.The magic field is declared as
int, but the constantFLB_SCHED_TLS_MAGICis defined as0x53544350u(unsigned). This type mismatch could lead to sign-related issues or compiler warnings during comparison.Apply this diff to use a consistent unsigned type:
- int magic; + unsigned int magic;or for better portability:
- int magic; + uint32_t magic;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
include/fluent-bit/flb_scheduler.h(2 hunks)src/flb_engine.c(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
🔇 Additional comments (2)
include/fluent-bit/flb_scheduler.h (1)
265-265: LGTM: Magic field initialization is correctly placed.The magic constant is set after successful parameter allocation and before storing in TLS, ensuring only properly initialized structures contain the magic value. This validates the TLS state during shutdown and prevents freeing garbage data on macOS.
src/flb_engine.c (1)
1230-1234: No issues found; code validation is correct.The verification confirms that the magic field validation in
flb_engine.c(lines 1231-1234) is appropriate and safe. All code paths that allocate and setsched_timer_coro_cb_paramsvia TLS initialize the magic field unconditionally (inflb_scheduler.h:267, always preceded bymagic = FLB_SCHED_TLS_MAGICat line 265). This makes the memory leak concern invalid—the validation logic correctly prevents freeing invalid or garbage TLS data that lacks the proper magic marker, which is exactly the intended defense against the macOS abort issue.
This is because TLS of macOS's pthread may return garbage values on unset. Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
include/fluent-bit/flb_scheduler.h (1)
271-288: Consider adding defensive magic validation.While the normal execution flow ensures
paramsis initialized beforesched_timer_coro_cb_runis called, adding a magic check after the NULL check would provide defense-in-depth against the same TLS garbage-pointer issue on macOS.Apply this diff for added safety:
static FLB_INLINE void sched_timer_coro_cb_run(void) { struct flb_coro *coro; struct flb_sched_timer *timer; struct flb_sched_timer_coro_cb_params *params; params = (struct flb_sched_timer_coro_cb_params *) FLB_TLS_GET(sched_timer_coro_cb_params); - if (!params) { + if (!params || params->magic != FLB_SCHED_TLS_MAGIC) { return; } coro = params->coro;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
include/fluent-bit/flb_scheduler.h(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: PR - fuzzing test
🔇 Additional comments (4)
include/fluent-bit/flb_scheduler.h (4)
195-195: LGTM! Well-chosen magic constant.The magic value
0x53544350u("STCP" in ASCII) provides a reliable sentinel to distinguish initialized TLS from garbage pointers on macOS. The value is distinct and unlikely to match uninitialized memory.
202-202: LGTM! Magic field properly added.The
uint32_t magicfield correctly matches the type ofFLB_SCHED_TLS_MAGICand integrates cleanly into theflb_sched_timer_coro_cb_paramsstructure.
265-265: LGTM! Magic properly initialized.The magic field is correctly set after allocation and field initialization, but before storing in TLS. This ensures the sentinel is always present in properly initialized params.
195-203: No issues found — cleanup paths properly validate magic sentinel.Verification confirms that all cleanup paths in both
src/flb_output_thread.c(line 405) andsrc/flb_engine.c(line 1231) now correctly validate the magic field before callingflb_free. The past concern aboutsrc/flb_output_thread.chas been resolved.

This is because macOS's TLS of pthread sometimes returns garbage values when uninitialized.
So, we need to introduce a flag or a magic number to distinguish whether initialized correctly or not.
Closes #11089.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
Before this change, the debug log with backtrace is:
After this change, the debug log became like:
Nothing aborted logs are shown.
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit