[UPDATED: TclX patch available] Potential solution to "Make TclX's signal trap handlers safe to use with threaded Tcl"

This issue is not a TclX-specific problem but a bug/shortcoming in the Tcl core, TclX just happens to be the simplest way to expose it at script level. The ticket below provides such a script, however it is entirely possible to reproduce this phenomenon using the C API (more specifically, a combination of Tcl async and low-level signal handlers on which TclX relies).

https://core.tcl.tk/tcl/tktview/f4f44174

I was able to reproduce the problem eventually after several hours, and the deadlock is indeed caused by the async thread self-deadlocking while attempting to lock its mutex twice.

The bug is difficult to reproduce because it's very timing-sensitive, however one can force the hand of destiny by artificially slowing down the async thread, that's what I did and it makes the process deadlock immediately. `Tcl_AsyncInvoke` has a mutex-protected loop over its registered handlers, so I've just added a `sleep(0)` within the loop and the test script hangs immediately for the exact same reason (double locking).

Tcl uses pthreads on Unix and Win32 critical sections on Windows. Win32 CSs are reentrant but pthread mutexes are not by default (you have to use the `PTHREAD_MUTEX_RECURSIVE` and this feature is not available on all systems). So when the async thread is interrupted by a signal while in the middle of a mutex-protected operation, it deadlocks itself on Unix but not on Windows.

The Tcl docs say: 

> The result of locking a mutex twice from the same thread is undefined. On some platforms it will result in a deadlock.

So this is the expected behavior, however nothing prevents the core from using reentrant mutexes on all platforms that Tcl supports. Implementing reentrancy using non-reentrant mutexes is a trivial task, so OS support is a non-issue. I'm going to write a TIP to make `Tcl_Mutex` reentrant on all platforms starting at version 8.7.

I've already implemented a quick hack to make mutexes reentrant on Unix:

- a first version uses the native `PTHREAD_MUTEX_RECURSIVE`,
- a second version uses regular mutexes with a per-thread call counter.

Both versions fix the deadlock problem with or without my extra `sleep(0)`, and with no impact on Tcl tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UPDATED: TclX patch available] Potential solution to "Make TclX's signal trap handlers safe to use with threaded Tcl" #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[UPDATED: TclX patch available] Potential solution to "Make TclX's signal trap handlers safe to use with threaded Tcl" #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions