Skip to content

Commit 76ae70b

Browse files
nsrip-ddtaegyunkim
andauthored
fix(profiling): fix data race when accessing span for thread [backport 2.15] (#11211)
Backport 64b3374 from #11167 to 2.15. The ThreadSpanLinks singleton holds the active span (if one exists) for a given thread ID. The `get_active_span_from_thread_id` member function returns a pointer to the active span for a thread. The `link_span` member function sets the active span for a thread. `get_active_span_from_thread_id` accesses the map of spans under a mutex, but returns the pointer after releasing the mutex, meaning `link_span` can modify the members of the Span while the caller of `get_active_span_from_thread_id` is reading them. Fix this by returning a copy of the `Span`. Use a `std::optional` to wrap the return value of `get_active_span_from_thread_id`, rather than returning a pointer. We want to tell whether or not there actually was a span associated with the thread, but returning a pointer would require us to heap allocate the copy of the Span. I added a simplistic regression test which fails reliably without this fix when built with the thread sanitizer enabled. Output like: ``` WARNING: ThreadSanitizer: data race (pid=2971510) Read of size 8 at 0x7b2000004080 by thread T2: #0 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:823 (libtsan.so.0+0x42313) #1 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:815 (libtsan.so.0+0x42313) #2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) <null> (libstdc++.so.6+0x1432b4) #3 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::__invoke_impl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>(std::__invoke_other, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*&&)()) <null> (thread_span_links+0xe46e) #4 std::__invoke_result<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>::type std::__invoke<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*&&)()) <null> (thread_span_links+0xe2fe) #5 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (thread_span_links+0xe1cf) #6 std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> >::operator()() <null> (thread_span_links+0xe0f6) #7 std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> > >::_M_run() <null> (thread_span_links+0xdf40) #8 <null> <null> (libstdc++.so.6+0xd6df3) Previous write of size 8 at 0x7b2000004080 by thread T1 (mutexes: write M47): #0 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:823 (libtsan.so.0+0x42313) #1 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:815 (libtsan.so.0+0x42313) #2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocato r<char> > const&) <null> (libstdc++.so.6+0x1432b4) #3 get() <null> (thread_span_links+0xb570) #4 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) <null> (thread_span_links+0xe525) #5 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) <null> (thread_span_links+0xe3b5) #6 void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (thread_span_links+0xe242) #7 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() <null> (thread_span_links+0xe158) [ ... etc ... ] ``` (cherry picked from commit 64b3374) ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) Co-authored-by: Taegyun Kim <[email protected]>
1 parent 8a28a5d commit 76ae70b

File tree

7 files changed

+95
-8
lines changed

7 files changed

+95
-8
lines changed

ddtrace/internal/datadog/profiling/stack_v2/CMakeLists.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,3 +117,9 @@ if(LIB_INSTALL_DIR)
117117
ARCHIVE DESTINATION ${LIB_INSTALL_DIR}
118118
RUNTIME DESTINATION ${LIB_INSTALL_DIR})
119119
endif()
120+
121+
if(BUILD_TESTING)
122+
enable_testing()
123+
add_subdirectory(test)
124+
endif()
125+

ddtrace/internal/datadog/profiling/stack_v2/include/thread_span_links.hpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
#include <memory>
44
#include <mutex>
5+
#include <optional>
56
#include <stdint.h>
67
#include <string>
78
#include <unordered_map>
@@ -36,8 +37,7 @@ class ThreadSpanLinks
3637
ThreadSpanLinks& operator=(ThreadSpanLinks const&) = delete;
3738

3839
void link_span(uint64_t thread_id, uint64_t span_id, uint64_t local_root_span_id, std::string span_type);
39-
40-
const Span* get_active_span_from_thread_id(uint64_t thread_id);
40+
const std::optional<Span> get_active_span_from_thread_id(uint64_t thread_id);
4141

4242
static void postfork_child();
4343

ddtrace/internal/datadog/profiling/stack_v2/src/stack_renderer.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,8 @@ StackRenderer::render_thread_begin(PyThreadState* tstate,
5555
ddup_push_threadinfo(sample, static_cast<int64_t>(thread_id), static_cast<int64_t>(native_id), name);
5656
ddup_push_walltime(sample, thread_state.wall_time_ns, 1);
5757

58-
const Span* active_span = ThreadSpanLinks::get_instance().get_active_span_from_thread_id(thread_id);
59-
if (active_span != nullptr) {
58+
const std::optional<Span> active_span = ThreadSpanLinks::get_instance().get_active_span_from_thread_id(thread_id);
59+
if (active_span) {
6060
ddup_push_span_id(sample, active_span->span_id);
6161
ddup_push_local_root_span_id(sample, active_span->local_root_span_id);
6262
ddup_push_trace_type(sample, std::string_view(active_span->span_type));

ddtrace/internal/datadog/profiling/stack_v2/src/thread_span_links.cpp

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
#include <iostream>
44
#include <mutex>
5+
#include <optional>
56
#include <stdint.h>
67
#include <string>
78

@@ -19,15 +20,17 @@ ThreadSpanLinks::link_span(uint64_t thread_id, uint64_t span_id, uint64_t local_
1920
thread_id_to_span[thread_id]->span_type = span_type;
2021
}
2122

22-
const Span*
23+
const std::optional<Span>
2324
ThreadSpanLinks::get_active_span_from_thread_id(uint64_t thread_id)
2425
{
2526
std::lock_guard<std::mutex> lock(mtx);
2627

27-
if (thread_id_to_span.find(thread_id) == thread_id_to_span.end()) {
28-
return nullptr;
28+
std::optional<Span> span;
29+
auto it = thread_id_to_span.find(thread_id);
30+
if (it != thread_id_to_span.end()) {
31+
span = *(it->second);
2932
}
30-
return thread_id_to_span[thread_id].get();
33+
return span;
3134
}
3235

3336
void
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
FetchContent_Declare(
2+
googletest
3+
GIT_REPOSITORY https://github.com/google/googletest.git
4+
GIT_TAG release-1.11.0)
5+
set(gtest_force_shared_crt
6+
ON
7+
CACHE BOOL "" FORCE)
8+
set(INSTALL_GTEST
9+
OFF
10+
CACHE BOOL "" FORCE)
11+
FetchContent_MakeAvailable(googletest)
12+
include(GoogleTest)
13+
include(AnalysisFunc)
14+
15+
function(dd_wrapper_add_test name)
16+
add_executable(${name} ${ARGN})
17+
target_include_directories(${name} PRIVATE ../include)
18+
target_link_libraries(${name} PRIVATE gmock gtest_main _stack_v2)
19+
add_ddup_config(${name})
20+
21+
gtest_discover_tests(${name})
22+
endfunction()
23+
24+
# Add the tests
25+
dd_wrapper_add_test(thread_span_links thread_span_links.cpp)
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
#include <gtest/gtest.h>
2+
3+
#include <string>
4+
#include <thread>
5+
6+
#include "thread_span_links.hpp"
7+
8+
static void
9+
get()
10+
{
11+
for (int i = 0; i < 100; i++) {
12+
std::string span_type;
13+
for (int j = 0; j < i; j++) {
14+
span_type.append("a");
15+
}
16+
Datadog::ThreadSpanLinks::get_instance().link_span(42, 1, 2, span_type);
17+
}
18+
}
19+
20+
static std::string
21+
set()
22+
{
23+
std::string s;
24+
for (int i = 0; i < 100; i++) {
25+
auto thing = Datadog::ThreadSpanLinks::get_instance().get_active_span_from_thread_id(42);
26+
if (!thing) {
27+
continue;
28+
}
29+
s = thing->span_type;
30+
}
31+
return s;
32+
}
33+
34+
TEST(ThreadSpanLinksConcurrency, GetSetRace)
35+
{
36+
std::thread t1(get);
37+
std::thread t2(set);
38+
t1.join();
39+
t2.join();
40+
}
41+
42+
int
43+
main(int argc, char** argv)
44+
{
45+
::testing::InitGoogleTest(&argc, argv);
46+
(void)(::testing::GTEST_FLAG(death_test_style) = "threadsafe");
47+
return RUN_ALL_TESTS();
48+
}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
fixes:
3+
- |
4+
profiling: fix a data race where span information associated with a thread
5+
was read and updated concurrently, leading to segfaults

0 commit comments

Comments
 (0)