Skip to content

Commit c11edb8

Browse files
fix(profiling): reset all profiling c++ mutexes on fork [backport 2.19] (#11993)
Backport d855c4a from #11768 to 2.19. I'm not sure why it took so long to surface this defect, but it turns out that stack v2 can deadlock applications because not all mutices are reset. The repro in #11762 appears to be pretty durable. I need to investigate it a bit more in order to distill it down into a native stress test we can use moving forward. In practice, this patch suppresses the noted behavior in the repro. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) Co-authored-by: David Sanchez <[email protected]>
1 parent c9ea15c commit c11edb8

File tree

10 files changed

+25
-8
lines changed

10 files changed

+25
-8
lines changed

ddtrace/internal/datadog/profiling/dd_wrapper/include/uploader_builder.hpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,8 @@ class UploaderBuilder
4343
static void set_output_filename(std::string_view _output_filename);
4444

4545
static std::variant<Uploader, std::string> build();
46+
47+
static void postfork_child();
4648
};
4749

4850
} // namespace Datadog

ddtrace/internal/datadog/profiling/dd_wrapper/src/code_provenance.cpp

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,8 @@ namespace Datadog {
1414
void
1515
Datadog::CodeProvenance::postfork_child()
1616
{
17-
get_instance().mtx.~mutex(); // Destroy the mutex
17+
// NB placement-new to re-init and leak the mutex because doing anything else is UB
1818
new (&get_instance().mtx) std::mutex(); // Recreate the mutex
19-
get_instance().reset(); // Reset the state
2019
}
2120

2221
void

ddtrace/internal/datadog/profiling/dd_wrapper/src/ddup_interface.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ ddup_postfork_child()
2424
Datadog::Uploader::postfork_child();
2525
Datadog::SampleManager::postfork_child();
2626
Datadog::CodeProvenance::postfork_child();
27+
Datadog::UploaderBuilder::postfork_child();
2728
}
2829

2930
void

ddtrace/internal/datadog/profiling/dd_wrapper/src/profile.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,6 @@ Datadog::Profile::collect(const ddog_prof_Sample& sample, int64_t endtime_ns)
203203
void
204204
Datadog::Profile::postfork_child()
205205
{
206-
profile_mtx.unlock();
206+
new (&profile_mtx) std::mutex();
207207
cycle_buffers();
208208
}

ddtrace/internal/datadog/profiling/dd_wrapper/src/sample.cpp

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -408,7 +408,6 @@ Datadog::Sample::push_absolute_ns(int64_t _timestamp_ns)
408408
return true;
409409
}
410410

411-
412411
bool
413412
Datadog::Sample::push_monotonic_ns(int64_t _monotonic_ns)
414413
{

ddtrace/internal/datadog/profiling/dd_wrapper/src/uploader.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,5 +175,6 @@ Datadog::Uploader::postfork_parent()
175175
void
176176
Datadog::Uploader::postfork_child()
177177
{
178-
unlock();
178+
// NB placement-new to re-init and leak the mutex because doing anything else is UB
179+
new (&upload_lock) std::mutex();
179180
}

ddtrace/internal/datadog/profiling/dd_wrapper/src/uploader_builder.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,3 +186,10 @@ Datadog::UploaderBuilder::build()
186186

187187
return Datadog::Uploader{ output_filename, ddog_exporter };
188188
}
189+
190+
void
191+
Datadog::UploaderBuilder::postfork_child()
192+
{
193+
// NB placement-new to re-init and leak the mutex because doing anything else is UB
194+
new (&tag_mutex) std::mutex();
195+
}

ddtrace/internal/datadog/profiling/stack_v2/src/sampler.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,11 @@ _stack_v2_atfork_child()
6767
// so we don't even reveal this function to the user
6868
_set_pid(getpid());
6969
ThreadSpanLinks::postfork_child();
70+
71+
// `thread_info_map_lock` and `task_link_map_lock` are global locks held in echion
72+
// NB placement-new to re-init and leak the mutex because doing anything else is UB
73+
new (&thread_info_map_lock) std::mutex;
74+
new (&task_link_map_lock) std::mutex;
7075
}
7176

7277
__attribute__((constructor)) void

ddtrace/internal/datadog/profiling/stack_v2/src/thread_span_links.cpp

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,10 +53,8 @@ ThreadSpanLinks::reset()
5353
void
5454
ThreadSpanLinks::postfork_child()
5555
{
56-
// Explicitly destroy and reconstruct the mutex to avoid undefined behavior
57-
get_instance().mtx.~mutex();
56+
// NB placement-new to re-init and leak the mutex because doing anything else is UB
5857
new (&get_instance().mtx) std::mutex();
59-
6058
get_instance().reset();
6159
}
6260

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
fixes:
3+
- |
4+
profiling: Fixes a bug where profiling mutexes were not cleared on fork in the child process. This could
5+
cause deadlocks in certain configurations.

0 commit comments

Comments
 (0)