Skip to content

generate core dump file for easier debug#266

Merged
xiaoxichen merged 1 commit intoeBay:masterfrom
Besroy:sh_test
Aug 19, 2025
Merged

generate core dump file for easier debug#266
xiaoxichen merged 1 commit intoeBay:masterfrom
Besroy:sh_test

Conversation

@Besroy
Copy link
Contributor

@Besroy Besroy commented Apr 14, 2025

No description provided.

@Besroy Besroy force-pushed the sh_test branch 2 times, most recently from fe94883 to f1eb6e5 Compare April 14, 2025 03:14
@codecov-commenter
Copy link

codecov-commenter commented Apr 14, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 0% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.03%. Comparing base (370c772) to head (171e01a).
⚠️ Report is 32 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #266      +/-   ##
==========================================
- Coverage   64.29%   61.03%   -3.27%     
==========================================
  Files          72       74       +2     
  Lines        4406     4863     +457     
  Branches      555      663     +108     
==========================================
+ Hits         2833     2968     +135     
- Misses       1327     1631     +304     
- Partials      246      264      +18     
Components Coverage Δ
AuthManager 77.77% <ø> (ø)
Cache 40.50% <84.69%> (+10.56%) ⬆️
FDS 71.19% <100.00%> (+0.07%) ⬆️
FileWatcher 58.51% <60.00%> (+2.26%) ⬆️
Flip 65.34% <ø> (ø)
gRPC 76.04% <64.70%> (-1.01%) ⬇️
Logging 15.30% <3.62%> (-14.88%) ⬇️
Metrics 80.54% <ø> (+0.33%) ⬆️
Options 100.00% <ø> (ø)
Setting 56.79% <ø> (ø)
StatusObject 73.83% <ø> (ø)
Utility 83.47% <100.00%> (+0.75%) ⬆️
Version 95.83% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Besroy Besroy changed the title [TEST] generate coredump file when crash happend generate core dump file when SIGABRT to easier debug Apr 14, 2025
@Besroy Besroy requested review from szmyd, xiaoxichen and yuwmao April 14, 2025 05:01
Copy link
Contributor

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but wondering the rational behind original code. waiting for brain to confirm

@szmyd
Copy link
Collaborator

szmyd commented Apr 23, 2025

lgtm but wondering the rational behind original code. waiting for brain to confirm

IIRC Bryan Zimmerman wrote this stacktrace logic. My guess is that the intent was to avoid any kind of circular logic when doing normal process termination (as std::exit() does) using _exit instead. Raising an exception will cause this thread to unwind up to the top and then cause another Signal won't it?

I would rather investigate migrating to libcpptrace (https://github.com/jeremy-rifkin/cpptrace) like the BayDB project is using and just dump a stack from here rather than rely on core files.

@xiaoxichen
Copy link
Contributor

I see we have backtrace in the log? not sure if it is using cpptrace...

Previous code was trying to ignore SIGABORT for some reason :) As the signal handler all reset to default so it wont cause another signal .

@Besroy Besroy force-pushed the sh_test branch 4 times, most recently from 204589b to faa16e4 Compare August 5, 2025 08:11
@Besroy Besroy changed the title generate core dump file when SIGABRT to easier debug generate core dump file for easier debug Aug 5, 2025
@Besroy
Copy link
Contributor Author

Besroy commented Aug 5, 2025

I see we have backtrace in the log? not sure if it is using cpptrace...

Previous code was trying to ignore SIGABORT for some reason :) As the signal handler all reset to default so it wont cause another signal .

Agreed. We have backtrace information in the logs (without using cpptrace), but it's insufficient for thorough debugging. We still require a core dump file to access more detailed information. Raising signals such as SIGABRT, SIGFPE, SIGSEGV, and SIGILL will not lead to circular logic, as the signal handlers are reset to their default behavior, which is to terminate the process and generate a core dump file. @szmyd pls correct me if I'm wrong.

@szmyd
Copy link
Collaborator

szmyd commented Aug 6, 2025

We also at some point starting using minidumps since our cores were absolutely massive (many GiBs). Does this impact that in anyway? @raakella1 knows a little more about this than myself maybe, just want to make sure we don't start blowing out the local PVC on crashes in production.

@raakella1
Copy link
Contributor

raakella1 commented Aug 7, 2025

We disabled collecting the traditional stacktrace and use minidump instead for the release builds because it takes a very long time for the process to exit waiting for all the threads to dump the stacktrace (up to an hour and sometimes stuck even forever) which is not acceptable for release builds.
Only debug builds dump the stacktrace. check https://github.com/eBay/sisl/blob/master/src/logging/stacktrace.cpp#L35
The large size of the core files is also a concern in production. It will compete for space with the log files. Honestly, for release builds, (even if we build Rel with Deb Info) there is no useful information in the core file.
You can maybe just dump the core file for debug builds
@Besroy @xiaoxichen @szmyd

@Besroy
Copy link
Contributor Author

Besroy commented Aug 8, 2025

@raakella1 I completely understand and appreciate your concerns regarding the size and generation time of coredump files. However, during SH's testing process, we observed that even the corefile in RelWithDebugInfo has proven to be very helpful for debugging. This document highlights several issues that were successfully debugged using core files.

Would it be possible to consider adding a configuration option where coredump generation is disabled by default? This way, SM or other users could enable it by modifying the configuration if needed, ensuring minimal impact on other projects.

@raakella1
Copy link
Contributor

@Besroy Yes, config option seems like a good middle ground.

@Besroy Besroy force-pushed the sh_test branch 2 times, most recently from f2c4b8a to ee83bbb Compare August 18, 2025 09:18
@Besroy
Copy link
Contributor Author

Besroy commented Aug 18, 2025

Hi @raakella1 I add an option to enable core dump, please take a look, thanks!

@xiaoxichen xiaoxichen merged commit c71c41e into eBay:master Aug 19, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants