Skip to content

Commit 353cb94

Browse files
authored
[fix](be deadlock) avoid be deadlock because of MemTracker (#51321)
### What problem does this PR solve? In branch-2.0, we already refractor these codes in pr: #18590 Deadlock stack: 1、While load, we alloc MemTracker and need lock TrackerGroup.group_lock ``` NodeChannel::NodeChannel _node_channel_tracker = std::make_shared<MemTracker> MemTracker::bind_parent std::lock_guard<std::mutex> l(mem_tracker_pool[_parent_group_num].group_lock); _tracker_group_it = mem_tracker_pool[_parent_group_num].trackers.insert( mem_tracker_pool[_parent_group_num].trackers.end(), this); ``` 2、but while we try to call std::list::insert, we need alloc std::_List_node,here new_hook (in file tcmalloc_hook.h) is triggered, then we lock the same TrackerGroup.group_lock, make it deadlock ``` new_hook doris::ThreadMemTrackerMgr::consume doris::ThreadMemTrackerMgr::flush_untracked_mem<true, true> doris::ThreadMemTrackerMgr::exceeded doris::MemTrackerLimiter::print_log_usage doris::MemTracker::make_group_snapshot std::lock_guard<std::mutex> l(mem_tracker_pool[group_num].group_lock); ``` Full stack info: ``` (gdb) bt #0 0x00007f219772454d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f219771fe9b in _L_lock_883 () from /lib64/libpthread.so.0 #2 0x00007f219771fd68 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000563ca7a8a824 in __gthread_mutex_lock (__mutex=0x563cb18cbc58) at /var/local/ldb-toolchain/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749 #4 std::mutex::lock (this=0x563cb18cbc58) at /var/local/ldb-toolchain/include/c++/11/bits/std_mutex.h:100 #5 std::lock_guard<std::mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /var/local/ldb-toolchain/include/c++/11/bits/std_mutex.h:229 #6 doris::MemTracker::make_group_snapshot (snapshots=0x7f1f7fe06ea0, group_num=<optimized out>, parent_label=...) at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:115 #7 0x0000563ca7a7eeb4 in doris::MemTrackerLimiter::print_log_usage (this=0x5640ab7956c0, msg=...) at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker_limiter.cpp:198 #8 0x0000563ca7a8d1e4 in doris::ThreadMemTrackerMgr::exceeded (this=this@entry=0x563ce6a2c820, size=1048584) at /data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.cpp:59 #9 0x0000563ca78cfeb4 in doris::ThreadMemTrackerMgr::flush_untracked_mem<true, true> (this=0x563ce6a2c820) at /data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:223 #10 doris::ThreadMemTrackerMgr::consume (size=<optimized out>, this=0x563ce6a2c820) at /data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:188 #11 doris::ThreadMemTrackerMgr::consume (size=<optimized out>, this=0x563ce6a2c820) at /data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:178 #12 new_hook (ptr=<optimized out>, size=24) at /data/TCHouse-D-1.2/be/src/runtime/memory/tcmalloc_hook.h:39 #13 0x0000563caf3dfa78 in MallocHook::InvokeNewHookSlow (p=p@entry=0x56422d713b40, s=s@entry=24) at src/malloc_hook.cc:498 #14 0x0000563caf55f2c1 in MallocHook::InvokeNewHook (s=24, p=0x56422d713b40) at src/malloc_hook-inl.h:127 #15 tcmalloc::do_allocate_full<tcmalloc::cpp_throw_oom> (size=size@entry=24) at src/tcmalloc.cc:1805 #16 tcmalloc::allocate_full_cpp_throw_oom (size=size@entry=24) at src/tcmalloc.cc:1815 #17 0x0000563caf55f429 in tcmalloc::dispatch_allocate_full<tcmalloc::cpp_throw_oom> (size=24) at src/tcmalloc.cc:1822 #18 0x0000563ca7a8ab9a in __gnu_cxx::new_allocator<std::_List_node<doris::MemTracker*> >::allocate (__n=1, this=0x563cb18cbc40) at /var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:103 #19 std::allocator_traits<std::allocator<std::_List_node<doris::MemTracker*> > >::allocate (__n=1, __a=...) at /var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:460 #20 std::__cxx11::_List_base<doris::MemTracker*, std::allocator<doris::MemTracker*> >::_M_get_node (this=0x563cb18cbc40) at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:442 #21 std::__cxx11::list<doris::MemTracker*, std::allocator<doris::MemTracker*> >::_M_create_node<doris::MemTracker*> (this=0x563cb18cbc40) at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:634 #22 std::__cxx11::list<doris::MemTracker*, std::allocator<doris::MemTracker*> >::emplace<doris::MemTracker*> (__position=..., this=0x563cb18cbc40) at /var/local/ldb-toolchain/include/c++/11/bits/list.tcc:92 #23 std::__cxx11::list<doris::MemTracker*, std::allocator<doris::MemTracker*> >::insert (__x=<optimized out>, __position=..., this=0x563cb18cbc40) at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:1309 #24 doris::MemTracker::bind_parent (this=0x563eff46ad10, parent=<optimized out>) at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:79 #25 0x0000563ca7a8b608 in doris::MemTracker::MemTracker (this=this@entry=0x563eff46ad10, label=..., parent=parent@entry=0x0) at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:66 #26 0x0000563ca7967467 in __gnu_cxx::new_allocator<doris::MemTracker>::construct<doris::MemTracker, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__p=0x563eff46ad10, this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:154 #27 std::allocator_traits<std::allocator<doris::MemTracker> >::construct<doris::MemTracker, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__p=0x563eff46ad10, __a=...) at /var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:512 #28 std::_Sp_counted_ptr_inplace<doris::MemTracker, std::allocator<doris::MemTracker>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__a=..., this=0x563eff46ad00) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:519 #29 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<doris::MemTracker, std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__a=..., __p=<optimized out>, this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:650 #30 std::__shared_ptr<doris::MemTracker, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__tag=..., this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1337 #31 std::shared_ptr<doris::MemTracker>::shared_ptr<std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__tag=..., this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:409 #32 std::allocate_shared<doris::MemTracker, std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__a=...) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:861 #33 std::make_shared<doris::MemTracker, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > () at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:877 #34 doris::stream_load::NodeChannel::NodeChannel (this=this@entry=0x563d0ca7e110, parent=<optimized out>, index_channel=index_channel@entry=0x563d4b43f200, node_id=<optimized out>) at /data/TCHouse-D-1.2/be/src/exec/tablet_sink.cpp:49 #35 0x0000563cac74a82d in doris::stream_load::VNodeChannel::VNodeChannel (this=this@entry=0x563d0ca7e110, parent=<optimized out>, index_channel=index_channel@entry=0x563d4b43f200, node_id=<optimized out>) at /data/TCHouse-D-1.2/be/src/vec/sink/vtablet_sink.cpp:37 #36 0x0000563ca7967eaa in __gnu_cxx::new_allocator<doris::stream_load::VNodeChannel>::construct<doris::stream_load::VNodeChannel, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__p=0x563d0ca7e110, this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:154 #37 std::allocator_traits<std::allocator<doris::stream_load::VNodeChannel> >::construct<doris::stream_load::VNodeChannel, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__p=0x563d0ca7e110, __a=...) at /var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:512 #38 std::_Sp_counted_ptr_inplace<doris::stream_load::VNodeChannel, std::allocator<doris::stream_load::VNodeChannel>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__a=..., this=0x563d0ca7e100) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:519 #39 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<doris::stream_load::VNodeChannel, std::allocator<doris::stream_load::VNodeChannel>, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__a=..., __p=<optimized out>, this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:650 #40 std::__shared_ptr<doris::stream_load::VNodeChannel, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<doris::stream_load::VNodeChannel>, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__tag=..., this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1337 #41 std::shared_ptr<doris::stream_load::VNodeChannel>::shared_ptr<std::allocator<doris::stream_load::VNodeChannel>, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__tag=..., this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:409 #42 std::allocate_shared<doris::stream_load::VNodeChannel, std::allocator<doris::stream_load::VNodeChannel>, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__a=...) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:861 #43 std::make_shared<doris::stream_load::VNodeChannel, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> () at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:877 #44 doris::stream_load::IndexChannel::init (this=this@entry=0x563d4b43f200, state=state@entry=0x563e027a3500, tablets=...) at /data/TCHouse-D-1.2/be/src/exec/tablet_sink.cpp:705 #45 0x0000563ca79698d8 in doris::stream_load::OlapTableSink::prepare (this=this@entry=0x5644b9efe880, state=state@entry=0x563e027a3500) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1290 #46 0x0000563cac74db65 in doris::stream_load::VOlapTableSink::prepare (this=0x5644b9efe880, state=0x563e027a3500) at /data/TCHouse-D-1.2/be/src/vec/sink/vtablet_sink.cpp:450 #47 0x0000563ca7946c21 in doris::PlanFragmentExecutor::prepare (this=this@entry=0x563f55efd280, request=..., fragments_ctx=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/bits/unique_ptr.h:173 #48 0x0000563ca791f17c in doris::FragmentExecState::prepare (this=this@entry=0x563f55efd200, params=...) --Type <RET> for more, q to quit, c to continue without paging-- at /data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:238 #49 0x0000563ca7926629 in doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::PlanFragmentExecutor*)>) (this=this@entry=0x563cb6e93400, params=..., cb=...) at /data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:720 #50 0x0000563ca7928dab in doris::FragmentMgr::exec_plan_fragment (this=0x563cb6e93400, params=...) at /data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:564 #51 0x0000563ca7aaf507 in doris::PInternalServiceImpl::_exec_plan_fragment_impl (this=this@entry=0x563cb577b880, ser_request=..., version=<optimized out>, compact=<optimized out>) at /data/TCHouse-D-1.2/be/src/service/internal_service.cpp:480 #52 0x0000563ca7aaf703 in doris::PInternalServiceImpl::_exec_plan_fragment_in_pthread (this=0x563cb577b880, controller=<optimized out>, request=0x563d6c5bb9b0, response=0x564203b92ce0, done=0x563d885a60c0) at /data/TCHouse-D-1.2/be/src/service/internal_service.cpp:254 #53 0x0000563ca78d6dbd in std::function<void ()>::operator()() const (this=0x7f1f7fe090c8) at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560 #54 doris::PriorityThreadPool::work_thread (this=0x563cb577ba38, thread_id=<optimized out>) at /data/TCHouse-D-1.2/be/src/util/priority_thread_pool.hpp:145 #55 0x0000563caf526b00 in std::execute_native_thread_routine (__p=0x563cbfc81d10) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82 #56 0x00007f219771dea5 in start_thread () from /lib64/libpthread.so.0 #57 0x00007f2197a309fd in clone () from /lib64/libc.so.6 ``` ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
1 parent 1deb9d1 commit 353cb94

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

be/src/runtime/memory/thread_mem_tracker_mgr.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,9 @@ void ThreadMemTrackerMgr::exceeded(int64_t size) {
5656
if (_cb_func != nullptr) {
5757
_cb_func();
5858
}
59-
_limiter_tracker_raw->print_log_usage(_exceed_mem_limit_msg);
59+
60+
// avoid deadlock, do not print log here:
61+
// _limiter_tracker_raw->print_log_usage(_exceed_mem_limit_msg);
6062

6163
if (is_attach_query()) {
6264
if (_is_process_exceed && _wait_gc) {

0 commit comments

Comments
 (0)