-
Notifications
You must be signed in to change notification settings - Fork 95
Open
Description
We observe a situation where UDT completely hangs with many threads stuck waiting for the m_ControlLock.
At this point the lock is held by the garbage collection thread (in checkBrokenSockets) which is waiting for a rcv queue worker thread termination:
(gdb) bt
#0 0x00007f5b9f593ef7 in pthread_join (threadid=140028744247040, thread_return=0x0) at pthread_join.c:92
#1 0x00007f5b5c3b6221 in CRcvQueue::~CRcvQueue() () from /tmp/udt_jndi_lib/lib/amd64-Linux-gpp/jni/libbarchart-udt-core-2.3.0-SNAPSHOT.so
#2 0x00007f5b5c39b0bd in CUDTUnited::removeSocket(int) () from /tmp/udt_jndi_lib/lib/amd64-Linux-gpp/jni/libbarchart-udt-core-2.3.0-SNAPSHOT.so
#3 0x00007f5b5c39baa2 in CUDTUnited::checkBrokenSockets() () from /tmp/udt_jndi_lib/lib/amd64-Linux-gpp/jni/libbarchart-udt-core-2.3.0-SNAPSHOT.so
#4 0x00007f5b5c39bc64 in CUDTUnited::garbageCollect(void*) () from /tmp/udt_jndi_lib/lib/amd64-Linux-gpp/jni/libbarchart-udt-core-2.3.0-SNAPSHOT.so
#5 0x00007f5b9f592dc5 in start_thread (arg=0x7f5b17fff700) at pthread_create.c:308
#6 0x00007f5b9eea628d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) frame 0
#0 0x00007f5b9f593ef7 in pthread_join (threadid=140028744247040, thread_return=0x0) at pthread_join.c:92
92 lll_wait_tid (pd->tid);
(gdb) print pd->tid
$3 = 17122
The worker thread seems to be stuck in recvmsg:
Thread 7 (Thread 0x7f5afb8f2700 (LWP 17122)):
#0 0x00007f5b9f59967d in recvmsg () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f5b5c3a0b2b in CChannel::recvfrom(sockaddr*, CPacket&) const () from /tmp/udt_jndi_lib/lib/amd64-Linux-gpp/jni/libbarchart-udt-core-2.3.0-SNAPSHOT.so
#2 0x00007f5b5c3b6fee in CRcvQueue::worker(void*) () from /tmp/udt_jndi_lib/lib/amd64-Linux-gpp/jni/libbarchart-udt-core-2.3.0-SNAPSHOT.so
#3 0x00007f5b9f592dc5 in start_thread (arg=0x7f5afb8f2700) at pthread_create.c:308
#4 0x00007f5b9eea628d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
This doesn't seem to be a classical deadlock, maybe it's more a problem with the blocking recvmsg call.
Has anyone an idea how this could happen?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels