Commit 353cb94
authored
[fix](be deadlock) avoid be deadlock because of MemTracker (#51321)
### What problem does this PR solve?
In branch-2.0, we already refractor these codes in pr:
#18590
Deadlock stack:
1、While load, we alloc MemTracker and need lock TrackerGroup.group_lock
```
NodeChannel::NodeChannel
_node_channel_tracker = std::make_shared<MemTracker>
MemTracker::bind_parent
std::lock_guard<std::mutex> l(mem_tracker_pool[_parent_group_num].group_lock);
_tracker_group_it = mem_tracker_pool[_parent_group_num].trackers.insert(
mem_tracker_pool[_parent_group_num].trackers.end(), this);
```
2、but while we try to call std::list::insert, we need alloc
std::_List_node,here new_hook (in file tcmalloc_hook.h) is triggered,
then we lock the same TrackerGroup.group_lock, make it deadlock
```
new_hook
doris::ThreadMemTrackerMgr::consume
doris::ThreadMemTrackerMgr::flush_untracked_mem<true, true>
doris::ThreadMemTrackerMgr::exceeded
doris::MemTrackerLimiter::print_log_usage
doris::MemTracker::make_group_snapshot
std::lock_guard<std::mutex> l(mem_tracker_pool[group_num].group_lock);
```
Full stack info:
```
(gdb) bt
#0 0x00007f219772454d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f219771fe9b in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007f219771fd68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000563ca7a8a824 in __gthread_mutex_lock (__mutex=0x563cb18cbc58)
at /var/local/ldb-toolchain/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749
#4 std::mutex::lock (this=0x563cb18cbc58) at /var/local/ldb-toolchain/include/c++/11/bits/std_mutex.h:100
#5 std::lock_guard<std::mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /var/local/ldb-toolchain/include/c++/11/bits/std_mutex.h:229
#6 doris::MemTracker::make_group_snapshot (snapshots=0x7f1f7fe06ea0, group_num=<optimized out>, parent_label=...)
at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:115
#7 0x0000563ca7a7eeb4 in doris::MemTrackerLimiter::print_log_usage (this=0x5640ab7956c0, msg=...)
at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker_limiter.cpp:198
#8 0x0000563ca7a8d1e4 in doris::ThreadMemTrackerMgr::exceeded (this=this@entry=0x563ce6a2c820, size=1048584)
at /data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.cpp:59
#9 0x0000563ca78cfeb4 in doris::ThreadMemTrackerMgr::flush_untracked_mem<true, true> (this=0x563ce6a2c820)
at /data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:223
#10 doris::ThreadMemTrackerMgr::consume (size=<optimized out>, this=0x563ce6a2c820)
at /data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:188
#11 doris::ThreadMemTrackerMgr::consume (size=<optimized out>, this=0x563ce6a2c820)
at /data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:178
#12 new_hook (ptr=<optimized out>, size=24) at /data/TCHouse-D-1.2/be/src/runtime/memory/tcmalloc_hook.h:39
#13 0x0000563caf3dfa78 in MallocHook::InvokeNewHookSlow (p=p@entry=0x56422d713b40, s=s@entry=24) at src/malloc_hook.cc:498
#14 0x0000563caf55f2c1 in MallocHook::InvokeNewHook (s=24, p=0x56422d713b40) at src/malloc_hook-inl.h:127
#15 tcmalloc::do_allocate_full<tcmalloc::cpp_throw_oom> (size=size@entry=24) at src/tcmalloc.cc:1805
#16 tcmalloc::allocate_full_cpp_throw_oom (size=size@entry=24) at src/tcmalloc.cc:1815
#17 0x0000563caf55f429 in tcmalloc::dispatch_allocate_full<tcmalloc::cpp_throw_oom> (size=24) at src/tcmalloc.cc:1822
#18 0x0000563ca7a8ab9a in __gnu_cxx::new_allocator<std::_List_node<doris::MemTracker*> >::allocate (__n=1, this=0x563cb18cbc40)
at /var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:103
#19 std::allocator_traits<std::allocator<std::_List_node<doris::MemTracker*> > >::allocate (__n=1, __a=...)
at /var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:460
#20 std::__cxx11::_List_base<doris::MemTracker*, std::allocator<doris::MemTracker*> >::_M_get_node (this=0x563cb18cbc40)
at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:442
#21 std::__cxx11::list<doris::MemTracker*, std::allocator<doris::MemTracker*> >::_M_create_node<doris::MemTracker*> (this=0x563cb18cbc40)
at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:634
#22 std::__cxx11::list<doris::MemTracker*, std::allocator<doris::MemTracker*> >::emplace<doris::MemTracker*> (__position=..., this=0x563cb18cbc40)
at /var/local/ldb-toolchain/include/c++/11/bits/list.tcc:92
#23 std::__cxx11::list<doris::MemTracker*, std::allocator<doris::MemTracker*> >::insert (__x=<optimized out>, __position=..., this=0x563cb18cbc40)
at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:1309
#24 doris::MemTracker::bind_parent (this=0x563eff46ad10, parent=<optimized out>) at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:79
#25 0x0000563ca7a8b608 in doris::MemTracker::MemTracker (this=this@entry=0x563eff46ad10, label=..., parent=parent@entry=0x0)
at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:66
#26 0x0000563ca7967467 in __gnu_cxx::new_allocator<doris::MemTracker>::construct<doris::MemTracker, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__p=0x563eff46ad10, this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:154
#27 std::allocator_traits<std::allocator<doris::MemTracker> >::construct<doris::MemTracker, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__p=0x563eff46ad10, __a=...) at /var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:512
#28 std::_Sp_counted_ptr_inplace<doris::MemTracker, std::allocator<doris::MemTracker>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__a=..., this=0x563eff46ad00)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:519
#29 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<doris::MemTracker, std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__a=..., __p=<optimized out>, this=<optimized out>)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:650
#30 std::__shared_ptr<doris::MemTracker, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__tag=..., this=<optimized out>)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1337
#31 std::shared_ptr<doris::MemTracker>::shared_ptr<std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__tag=..., this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:409
#32 std::allocate_shared<doris::MemTracker, std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__a=...) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:861
#33 std::make_shared<doris::MemTracker, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > ()
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:877
#34 doris::stream_load::NodeChannel::NodeChannel (this=this@entry=0x563d0ca7e110, parent=<optimized out>,
index_channel=index_channel@entry=0x563d4b43f200, node_id=<optimized out>) at /data/TCHouse-D-1.2/be/src/exec/tablet_sink.cpp:49
#35 0x0000563cac74a82d in doris::stream_load::VNodeChannel::VNodeChannel (this=this@entry=0x563d0ca7e110, parent=<optimized out>,
index_channel=index_channel@entry=0x563d4b43f200, node_id=<optimized out>) at /data/TCHouse-D-1.2/be/src/vec/sink/vtablet_sink.cpp:37
#36 0x0000563ca7967eaa in __gnu_cxx::new_allocator<doris::stream_load::VNodeChannel>::construct<doris::stream_load::VNodeChannel, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__p=0x563d0ca7e110, this=<optimized out>)
at /var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:154
#37 std::allocator_traits<std::allocator<doris::stream_load::VNodeChannel> >::construct<doris::stream_load::VNodeChannel, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__p=0x563d0ca7e110, __a=...) at /var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:512
#38 std::_Sp_counted_ptr_inplace<doris::stream_load::VNodeChannel, std::allocator<doris::stream_load::VNodeChannel>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__a=..., this=0x563d0ca7e100)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:519
#39 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<doris::stream_load::VNodeChannel, std::allocator<doris::stream_load::VNodeChannel>, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__a=..., __p=<optimized out>, this=<optimized out>)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:650
#40 std::__shared_ptr<doris::stream_load::VNodeChannel, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<doris::stream_load::VNodeChannel>, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__tag=..., this=<optimized out>)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1337
#41 std::shared_ptr<doris::stream_load::VNodeChannel>::shared_ptr<std::allocator<doris::stream_load::VNodeChannel>, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__tag=..., this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:409
#42 std::allocate_shared<doris::stream_load::VNodeChannel, std::allocator<doris::stream_load::VNodeChannel>, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__a=...) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:861
#43 std::make_shared<doris::stream_load::VNodeChannel, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> ()
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:877
#44 doris::stream_load::IndexChannel::init (this=this@entry=0x563d4b43f200, state=state@entry=0x563e027a3500, tablets=...)
at /data/TCHouse-D-1.2/be/src/exec/tablet_sink.cpp:705
#45 0x0000563ca79698d8 in doris::stream_load::OlapTableSink::prepare (this=this@entry=0x5644b9efe880, state=state@entry=0x563e027a3500)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1290
#46 0x0000563cac74db65 in doris::stream_load::VOlapTableSink::prepare (this=0x5644b9efe880, state=0x563e027a3500)
at /data/TCHouse-D-1.2/be/src/vec/sink/vtablet_sink.cpp:450
#47 0x0000563ca7946c21 in doris::PlanFragmentExecutor::prepare (this=this@entry=0x563f55efd280, request=..., fragments_ctx=<optimized out>)
at /var/local/ldb-toolchain/include/c++/11/bits/unique_ptr.h:173
#48 0x0000563ca791f17c in doris::FragmentExecState::prepare (this=this@entry=0x563f55efd200, params=...)
--Type <RET> for more, q to quit, c to continue without paging--
at /data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:238
#49 0x0000563ca7926629 in doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::PlanFragmentExecutor*)>) (this=this@entry=0x563cb6e93400, params=..., cb=...) at /data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:720
#50 0x0000563ca7928dab in doris::FragmentMgr::exec_plan_fragment (this=0x563cb6e93400, params=...)
at /data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:564
#51 0x0000563ca7aaf507 in doris::PInternalServiceImpl::_exec_plan_fragment_impl (this=this@entry=0x563cb577b880, ser_request=...,
version=<optimized out>, compact=<optimized out>) at /data/TCHouse-D-1.2/be/src/service/internal_service.cpp:480
#52 0x0000563ca7aaf703 in doris::PInternalServiceImpl::_exec_plan_fragment_in_pthread (this=0x563cb577b880, controller=<optimized out>,
request=0x563d6c5bb9b0, response=0x564203b92ce0, done=0x563d885a60c0) at /data/TCHouse-D-1.2/be/src/service/internal_service.cpp:254
#53 0x0000563ca78d6dbd in std::function<void ()>::operator()() const (this=0x7f1f7fe090c8)
at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560
#54 doris::PriorityThreadPool::work_thread (this=0x563cb577ba38, thread_id=<optimized out>)
at /data/TCHouse-D-1.2/be/src/util/priority_thread_pool.hpp:145
#55 0x0000563caf526b00 in std::execute_native_thread_routine (__p=0x563cbfc81d10) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#56 0x00007f219771dea5 in start_thread () from /lib64/libpthread.so.0
#57 0x00007f2197a309fd in clone () from /lib64/libc.so.6
```
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->1 parent 1deb9d1 commit 353cb94
1 file changed
+3
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
60 | 62 | | |
61 | 63 | | |
62 | 64 | | |
| |||
0 commit comments