Skip to content

Commit 0950778

Browse files
committed
MINOR: debug: add a function to dump a stuck thread
There's currently no way to just emit a warning informing that a thread is stuck without crashing. This is a problem because sometimes users would benefit from this info to clean up their configuration (e.g. abuse of map_regm, lua-load etc). This commit adds a new function ha_stuck_warning() that will emit a warning indicating that the designated thread has been stuck for XX milliseconds, with a number of streams blocked, and will make that thread dump its own state. The warning will then be sent to stderr, along with some reminders about the impacts of such situations to encourage users to fix their configuration. In order not to disrupt operations, a local 4kB buffer is allocated in the stack. This should be quite sufficient. For now the function is not used.
1 parent 3f4d646 commit 0950778

File tree

2 files changed

+79
-0
lines changed

2 files changed

+79
-0
lines changed

include/haproxy/debug.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ void ha_thread_dump_one(int thr, int from_signal);
3030
void ha_dump_backtrace(struct buffer *buf, const char *prefix, int dump);
3131
void ha_backtrace_to_stderr(void);
3232
void ha_panic(void);
33+
void ha_stuck_warning(int thr);
3334

3435
void post_mortem_add_component(const char *name, const char *version,
3536
const char *toolchain, const char *toolchain_opts,

src/debug.c

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -729,6 +729,84 @@ void ha_panic()
729729
abort();
730730
}
731731

732+
/* Dumps a state of the current thread on fd #2 and returns. It takes a great
733+
* care about not using any global state variable so as to gracefully recover.
734+
*/
735+
void ha_stuck_warning(int thr)
736+
{
737+
char msg_buf[4096];
738+
struct buffer buf;
739+
ullong n, p;
740+
741+
if (get_tainted() & TAINTED_PANIC) {
742+
/* a panic dump is already in progress, let's not disturb it,
743+
* we'll be called via signal DEBUGSIG. By returning we may be
744+
* able to leave a current signal handler (e.g. WDT) so that
745+
* this will ensure more reliable signal delivery.
746+
*/
747+
return;
748+
}
749+
750+
buf = b_make(msg_buf, sizeof(msg_buf), 0, 0);
751+
752+
p = HA_ATOMIC_LOAD(&ha_thread_ctx[thr].prev_cpu_time);
753+
n = now_cpu_time_thread(thr);
754+
755+
chunk_printf(&buf,
756+
"\nWARNING! thread %u has stopped processing traffic for %llu milliseconds\n"
757+
" with %d streams currently blocked, prevented from making any progress.\n"
758+
" While this may occasionally happen with inefficient configurations\n"
759+
" involving excess of regular expressions, map_reg, or heavy Lua processing,\n"
760+
" this must remain exceptional because the system's stability is now at risk.\n"
761+
" Timers in logs may be reported incorrectly, spurious timeouts may happen,\n"
762+
" some incoming connections may silently be dropped, health checks may\n"
763+
" randomly fail, and accesses to the CLI may block the whole process. Please\n"
764+
" check the trace below for any clues about configuration elements that need\n"
765+
" to be corrected:\n\n",
766+
thr + 1, (n - p) / 1000000ULL,
767+
HA_ATOMIC_LOAD(&ha_thread_ctx[thr].stream_cnt));
768+
769+
DISGUISE(write(2, buf.area, buf.data));
770+
771+
/* Note below: the target thread will dump itself */
772+
chunk_reset(&buf);
773+
if (ha_thread_dump_fill(&buf, thr)) {
774+
DISGUISE(write(2, buf.area, buf.data));
775+
/* restore the thread's dump pointer for easier post-mortem analysis */
776+
ha_thread_dump_done(NULL, thr);
777+
}
778+
779+
chunk_printf(&buf, " => Trying to gracefully recover now.\n");
780+
DISGUISE(write(2, buf.area, buf.data));
781+
782+
#ifdef USE_LUA
783+
if (get_tainted() & TAINTED_LUA_STUCK_SHARED && global.nbthread > 1) {
784+
chunk_printf(&buf,
785+
"### Note: at least one thread was stuck in a Lua context loaded using the\n"
786+
" 'lua-load' directive, which is known for causing heavy contention\n"
787+
" when used with threads. Please consider using 'lua-load-per-thread'\n"
788+
" instead if your code is safe to run in parallel on multiple threads.\n");
789+
DISGUISE(write(2, buf.area, buf.data));
790+
}
791+
else if (get_tainted() & TAINTED_LUA_STUCK) {
792+
chunk_printf(&buf,
793+
"### Note: at least one thread was stuck in a Lua context in a way that suggests\n"
794+
" heavy processing inside a dependency or a long loop that can't yield.\n"
795+
" Please make sure any external code you may rely on is safe for use in\n"
796+
" an event-driven engine.\n");
797+
DISGUISE(write(2, buf.area, buf.data));
798+
}
799+
#endif
800+
if (get_tainted() & TAINTED_MEM_TRIMMING_STUCK) {
801+
chunk_printf(&buf,
802+
"### Note: one thread was found stuck under malloc_trim(), which can run for a\n"
803+
" very long time on large memory systems. You way want to disable this\n"
804+
" memory reclaiming feature by setting 'no-memory-trimming' in the\n"
805+
" 'global' section of your configuration to avoid this in the future.\n");
806+
DISGUISE(write(2, buf.area, buf.data));
807+
}
808+
}
809+
732810
/* Complain with message <msg> on stderr. If <counter> is not NULL, it is
733811
* atomically incremented, and the message is only printed when the counter
734812
* was zero, so that the message is only printed once. <taint> is only checked

0 commit comments

Comments
 (0)