Skip to content

Commit 7c23716

Browse files
lateralusXUnityAlex
authored andcommitted
Wait on runtime threads to park on joinable thread list during shutdown.
mono#5599 fixed a race condition during shutdown when runtime threads have come parts of their way through detach, but still depend on runtime resources, like GC memory. The fix added runtime threads to the joinable thread list just before they vanished from mono_thread_manage radar making sure shutdown waited upon the thread before cleaning up. The above fix slightly changed the behavior of the finalizer thread since it waits on joinable threads and will now potential block on threads still executing code (that involves runtime resources). There’s was an assumption around the threads on the joinable thread list that they should be very close to complete when added, so join calls coming from the finalizer thread should almost never block and if it does, the code that remains to execute should not involve runtime operations risking deadlock situations. Adding the thread to the list earlier than previously done expose the shutdown to some potential theoretical problems. To mitigate the risk and still solve the race condition this commit adds a mechanism to keep track of active runtime threads until they park on joinable thread list. The pending counter will be waited upon by the shutdown thread, just before it does its regular wait on all joinable threads (after finalizer thread has stopped) to make sure all runtime threads have been added to the joinable thread list before waiting upon them. Threads are added to the joinable thread as late as possible, exactly how it’s been done in the past by sgen_client_thread_detach_with_lock. Shutdown thread will wait on runtime threads to appear on the list for a short time and if timeout (pending runtime thread count not reaching 0 before timeout), it will just print a warning and continue shutdown. Getting into a wait state during shutdown due to runtime threads not yet added to joinable threads list should be very rare (hitting previous race condition that was rare), triggering the timeout should be even more rare, and if that ever happens, we are exposed to shutdown race condition as we have had in the past, but now we at least get a warning in the log making it simpler to analyze further. This commit also fixes a problem with the debugger thread hitting the same race condition as above. The shutdown thread stopping the debugger thread didn't completely wait for it to stop using runtime resources before continue shutdown sequence. This triggers the same race condition as when shutting down regular runtime threads. This commit makes sure stop_debugger_thread waits on the debugger thread handle to become signaled (happens at the very end of thread lifetime) before continuing the shutdown logic.
1 parent 2823f4f commit 7c23716

File tree

3 files changed

+125
-25
lines changed

3 files changed

+125
-25
lines changed

mono/metadata/threads.c

Lines changed: 123 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,14 @@ static MonoGHashTable *threads_starting_up = NULL;
160160
static GHashTable *joinable_threads;
161161
static gint32 joinable_thread_count;
162162

163+
static GHashTable *pending_joinable_threads;
164+
static gint32 pending_joinable_thread_count;
165+
166+
static mono_cond_t zero_pending_joinable_thread_event;
167+
168+
static void threads_add_pending_joinable_runtime_thread (MonoThreadInfo *mono_thread_info);
169+
static gboolean threads_wait_pending_joinable_threads (uint32_t timeout);
170+
163171
#define SET_CURRENT_OBJECT(x) mono_tls_set_thread (x)
164172
#define GET_CURRENT_OBJECT() (MonoInternalThread*) mono_tls_get_thread ()
165173

@@ -763,6 +771,17 @@ mono_thread_detach_internal (MonoInternalThread *thread)
763771

764772
MONO_PROFILER_RAISE (thread_stopping, (thread->tid));
765773

774+
/*
775+
* Prevent race condition between thread shutdown and runtime shutdown.
776+
* Including all runtime threads in the pending joinable count will make
777+
* sure shutdown will wait for it to get onto the joinable thread list before
778+
* critical resources have been cleanup (like GC memory). Threads getting onto
779+
* the joinable thread list should just about to exit and not blocking a potential
780+
* join call. Owner of threads attached to the runtime but not identified as runtime
781+
* threads needs to make sure thread detach calls won't race with runtime shutdown.
782+
*/
783+
threads_add_pending_joinable_runtime_thread (info);
784+
766785
#ifndef HOST_WIN32
767786
mono_w32mutex_abandon ();
768787
#endif
@@ -775,17 +794,6 @@ mono_thread_detach_internal (MonoInternalThread *thread)
775794
thread->abort_exc = NULL;
776795
thread->current_appcontext = NULL;
777796

778-
/*
779-
* Prevent race condition between execution of this method and runtime shutdown.
780-
* Adding runtime thread to the joinable threads list will make sure runtime shutdown
781-
* won't complete until added runtime thread have exited. Owner of threads attached to the
782-
* runtime but not identified as runtime threads needs to make sure thread detach calls won't
783-
* race with runtime shutdown.
784-
*/
785-
#ifdef HOST_WIN32
786-
mono_threads_add_joinable_runtime_thread (info);
787-
#endif
788-
789797
/*
790798
* thread->synch_cs can be NULL if this was called after
791799
* ves_icall_System_Threading_InternalThread_Thread_free_internal.
@@ -3031,6 +3039,8 @@ void mono_thread_init (MonoThreadStartCB start_cb,
30313039

30323040
mono_os_event_init (&background_change_event, FALSE);
30333041

3042+
mono_os_cond_init (&zero_pending_joinable_thread_event);
3043+
30343044
mono_init_static_data_info (&thread_static_info);
30353045
mono_init_static_data_info (&context_static_info);
30363046

@@ -3122,6 +3132,13 @@ mono_thread_callbacks_init (void)
31223132
void
31233133
mono_thread_cleanup (void)
31243134
{
3135+
/* Wait for pending threads to park on joinable threads list */
3136+
/* NOTE, waiting on this should be extremely rare and will only happen */
3137+
/* under certain specific conditions. */
3138+
gboolean wait_result = threads_wait_pending_joinable_threads (2000);
3139+
if (!wait_result)
3140+
g_warning ("Waiting on threads to park on joinable thread list timed out.");
3141+
31253142
mono_threads_join_threads ();
31263143

31273144
#if !defined(RUN_IN_SUBTHREAD) && !defined(HOST_WIN32)
@@ -3143,6 +3160,7 @@ mono_thread_cleanup (void)
31433160
mono_os_mutex_destroy (&interlocked_mutex);
31443161
mono_os_mutex_destroy (&delayed_free_table_mutex);
31453162
mono_os_mutex_destroy (&small_id_mutex);
3163+
mono_os_cond_destroy (&zero_pending_joinable_runtime_thread_event);
31463164
mono_os_event_destroy (&background_change_event);
31473165
#endif
31483166
}
@@ -3367,6 +3385,7 @@ mono_thread_manage (void)
33673385
mono_threads_unlock ();
33683386
return;
33693387
}
3388+
33703389
mono_threads_unlock ();
33713390

33723391
do {
@@ -5221,15 +5240,105 @@ threads_add_joinable_thread_nolock (gpointer tid)
52215240
}
52225241
#endif
52235242

5243+
static void
5244+
threads_add_pending_joinable_thread (gpointer tid)
5245+
{
5246+
joinable_threads_lock ();
5247+
5248+
if (!pending_joinable_threads)
5249+
pending_joinable_threads = g_hash_table_new (NULL, NULL);
5250+
5251+
gpointer orig_key;
5252+
gpointer value;
5253+
5254+
if (!g_hash_table_lookup_extended (pending_joinable_threads, tid, &orig_key, &value)) {
5255+
g_hash_table_insert (pending_joinable_threads, tid, tid);
5256+
UnlockedIncrement (&pending_joinable_thread_count);
5257+
}
5258+
5259+
joinable_threads_unlock ();
5260+
}
5261+
5262+
static void
5263+
threads_add_pending_joinable_runtime_thread (MonoThreadInfo *mono_thread_info)
5264+
{
5265+
g_assert (mono_thread_info);
5266+
5267+
if (mono_thread_info->runtime_thread) {
5268+
threads_add_pending_joinable_thread ((gpointer)(MONO_UINT_TO_NATIVE_THREAD_ID (mono_thread_info_get_tid (mono_thread_info))));
5269+
}
5270+
}
5271+
5272+
static void
5273+
threads_remove_pending_joinable_thread_nolock (gpointer tid)
5274+
{
5275+
gpointer orig_key;
5276+
gpointer value;
5277+
5278+
if (pending_joinable_threads && g_hash_table_lookup_extended (pending_joinable_threads, tid, &orig_key, &value)) {
5279+
g_hash_table_remove (pending_joinable_threads, tid);
5280+
if (UnlockedDecrement (&pending_joinable_thread_count) == 0)
5281+
mono_os_cond_broadcast (&zero_pending_joinable_thread_event);
5282+
}
5283+
}
5284+
5285+
static gboolean
5286+
threads_wait_pending_joinable_threads (uint32_t timeout)
5287+
{
5288+
if (UnlockedRead (&pending_joinable_thread_count) > 0) {
5289+
joinable_threads_lock ();
5290+
if (timeout == MONO_INFINITE_WAIT) {
5291+
while (UnlockedRead (&pending_joinable_thread_count) > 0)
5292+
mono_os_cond_wait (&zero_pending_joinable_thread_event, &joinable_threads_mutex);
5293+
} else {
5294+
gint64 start = mono_msec_ticks ();
5295+
gint64 elapsed = 0;
5296+
while (UnlockedRead (&pending_joinable_thread_count) > 0 && elapsed < timeout) {
5297+
mono_os_cond_timedwait (&zero_pending_joinable_thread_event, &joinable_threads_mutex, timeout - (uint32_t)elapsed);
5298+
elapsed = mono_msec_ticks () - start;
5299+
}
5300+
}
5301+
joinable_threads_unlock ();
5302+
}
5303+
5304+
return UnlockedRead (&pending_joinable_thread_count) == 0;
5305+
}
5306+
5307+
static void
5308+
threads_add_unique_joinable_thread_nolock (gpointer tid)
5309+
{
5310+
if (!joinable_threads)
5311+
joinable_threads = g_hash_table_new (NULL, NULL);
5312+
5313+
gpointer orig_key;
5314+
gpointer value;
5315+
5316+
if (!g_hash_table_lookup_extended (joinable_threads, tid, &orig_key, &value)) {
5317+
threads_add_joinable_thread_nolock (tid);
5318+
UnlockedIncrement (&joinable_thread_count);
5319+
}
5320+
}
5321+
52245322
void
52255323
mono_threads_add_joinable_runtime_thread (gpointer thread_info)
52265324
{
52275325
g_assert (thread_info);
52285326
MonoThreadInfo *mono_thread_info = (MonoThreadInfo*)thread_info;
52295327

52305328
if (mono_thread_info->runtime_thread) {
5231-
if (mono_atomic_cas_i32 (&mono_thread_info->thread_pending_native_join, TRUE, FALSE) == FALSE)
5232-
mono_threads_add_joinable_thread ((gpointer)(MONO_UINT_TO_NATIVE_THREAD_ID (mono_thread_info_get_tid (mono_thread_info))));
5329+
gpointer tid = (gpointer)(MONO_UINT_TO_NATIVE_THREAD_ID (mono_thread_info_get_tid (mono_thread_info)));
5330+
5331+
joinable_threads_lock ();
5332+
5333+
// Add to joinable thread list, if not already included.
5334+
threads_add_unique_joinable_thread_nolock (tid);
5335+
5336+
// Remove thread from pending joinable list, if present.
5337+
threads_remove_pending_joinable_thread_nolock (tid);
5338+
5339+
joinable_threads_unlock ();
5340+
5341+
mono_gc_finalize_notify ();
52335342
}
52345343
}
52355344

@@ -5248,15 +5357,7 @@ mono_threads_add_joinable_thread (gpointer tid)
52485357
* we have time (in the finalizer thread).
52495358
*/
52505359
joinable_threads_lock ();
5251-
if (!joinable_threads)
5252-
joinable_threads = g_hash_table_new (NULL, NULL);
5253-
5254-
gpointer orig_key;
5255-
gpointer value;
5256-
if (!g_hash_table_lookup_extended (joinable_threads, tid, &orig_key, &value)) {
5257-
threads_add_joinable_thread_nolock (tid);
5258-
UnlockedIncrement (&joinable_thread_count);
5259-
}
5360+
threads_add_unique_joinable_thread_nolock (tid);
52605361
joinable_threads_unlock ();
52615362

52625363
mono_gc_finalize_notify ();

mono/mini/debugger-agent.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1861,7 +1861,8 @@ stop_debugger_thread (void)
18611861
mono_coop_mutex_unlock (&debugger_thread_exited_mutex);
18621862
} while (!debugger_thread_exited);
18631863

1864-
mono_native_thread_join (debugger_thread_id);
1864+
if (debugger_thread_handle)
1865+
mono_thread_info_wait_one_handle (debugger_thread_handle, MONO_INFINITE_WAIT, TRUE);
18651866
}
18661867

18671868
transport_close2 ();

mono/utils/mono-threads.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -231,8 +231,6 @@ typedef struct {
231231
*/
232232
gint32 profiler_signal_ack;
233233

234-
gint32 thread_pending_native_join;
235-
236234
#ifdef USE_WINDOWS_BACKEND
237235
gint32 win32_apc_info;
238236
gpointer win32_apc_info_io_handle;

0 commit comments

Comments
 (0)