-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
Summary
In storage hammer tests, when the process crashes inside malloc() (with arena lock held), the signal handler's spdlog logging triggers another malloc() call, causing a self-deadlock. The same issue occurs with spdlog::shutdown() waiting for threads that are stuck in allocation.
Deadlock Chain
malloc() → [holds arena lock] → SIGSEGV
→ signal_handler()
→ spdlog::critical()
→ fwrite()
→ _IO_file_doallocate()
→ malloc() → [waits for same arena lock] → DEADLOCK
Stack Trace
Thread 67 (Thread 0x799a70ff9680 (LWP 32) "storage_mgr"):
#0 0x0000799a9eed3f0b in __lll_lock_wait_private () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000799a9eee8920 in malloc () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x0000799a9eec01b5 in _IO_file_doallocate () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x0000799a9eed0524 in _IO_doallocbuf () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000799a9eecdf90 in _IO_file_overflow () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x0000799a9eeceaaf in _IO_file_xsputn () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000799a9eec1a12 in fwrite () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x0000608c00c9462f in spdlog::details::file_helper::write() at file_helper-inl.h:104
#8 0x0000608c00c97318 in spdlog::sinks::rotating_file_sink<std::mutex>::sink_it_() at rotating_file_sink-inl.h:88
#9 0x0000608c00c7e27a in spdlog::sinks::base_sink<std::mutex>::log() at base_sink-inl.h:28
#10 0x0000608c00c79013 in spdlog::logger::sink_it_() at logger-inl.h:138
#11 0x0000608c00c79fba in spdlog::logger::log_it_() at logger-inl.h:128
#12 0x0000608c00c66970 in spdlog::logger::log_() at logger.h:332
#13 0x0000608c00c66330 in spdlog::logger::log() at logger.h:85
#14 0x0000608c00c66330 in spdlog::logger::critical() at logger.h:155
#15 sisl::logging::crash_handler (signal_number=<optimized out>) at stacktrace.cpp:133
#16 <signal handler called>
#17 0x0000799a9eee67fc in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x0000799a9eee86f4 in malloc () from /lib/x86_64-linux-gnu/libc.so.6 ← Original crash location
#19 0x0000799a9f226904 in operator new(unsigned long) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#20 0x0000608c0076cdee in folly::futures::detail::Core<bool>::make() at Core.h:563
#21 0x0000608c0076cdee in folly::makeFuture<bool>() at Future-inl.h:1357
#22 0x0000608c0076cdee in folly::makeFuture<bool>() at Future-inl.h:1306
#23 homestore::DataSvcCPCallbacks::cp_flush() at data_svc_cp.cpp:36
#24 0x0000608c007414ca in homestore::CPManager::cp_start_flush() at cp_mgr.cpp:280
#25 0x0000608c00741e14 in homestore::CPGuard::~CPGuard() at cp_mgr.cpp:400
#26 0x0000608c0074293a in homestore::CPManager::do_trigger_cp_flush() at cp_mgr.cpp:270
#27 0x0000608c007435c6 in homestore::CPManager::trigger_cp_flush() at cp_mgr.cpp:198
#28 0x0000608c0056089e in homeobject::HSHomeObject::destroy_pg_superblk() at unique_ptr.h:199
Root Cause
- Primary issue: Unknown crash in malloc() during checkpoint flush (Frame Revised btree with split of multiple files and seperated out #18-Merge Symbiosis sisl #28), possibly memory corruption
- Immediate cause of hang: Signal handler invokes non-async-signal-safe spdlog, which calls malloc() and deadlocks on the arena lock already held by the interrupted malloc()
Suggested Fix
Use async logging or async-signal-safe alternatives in signal handlers to avoid malloc-dependent functions.
Notes
This is a secondary issue tracking the hang behavior. Priority is low as the root cause of the initial crash remains unknown. Recording here for future investigation.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels