Many threads locking two mutexes can cause a crash

**Describe the bug**
In some complex scenarios involving several threads and a few mutexes we have experienced some intermittent crashes depending on timing of the system. I've narrowed this down to an issue with the scalable waitq implementation not re-inserting when a priority changes, potentially making the rb tree invalid. Without the scalable waitq the wrong thread may be run which may or may not cause an issue depending on the application. 



**To Reproduce**
This is not limited to mutexes, but the issue was most easily reproduced with mutexes with scalable waitq enabled. Consider 4 threads in decreasing priority order: A, B, C, and D along with two mutexes, m0 and m1:
1. D locks m1
2. C locks m0
3. C pends on m1
4. B pends on m1
5. A pends on m0, boosts C's priority, now tree on m1 is not sorted
6. D unlocks m1, left-most thread on tree is B. When removing B from tree it cannot be found because it searches to the right of C due to C's boosted priority when the node is actually on the left. rb_remove silently fails.
7. B unlocks m1, left-most thread on tree is still B and it tries to unpend itself, resulting in a NULL pointer dereference on B->base.pended_on.


**Expected behavior**
System does not crash.


**Impact**
Intermittent but fairly consistent crashes on our system.


**Logs and console output**
See test: https://github.com/zephyrproject-rtos/zephyr/pull/87235/files#diff-8adc688dcc6c66e2f0a064f4ed385d3ff51e325b66ab8e4e9b7570cf80d1bf22


**Environment (please complete the following information):**

 - OS: WSL
 - Toolchain gcc-arm-none-eabi
 - v3.3, v3.7

**Additional context**
Fixed in https://github.com/zephyrproject-rtos/zephyr/pull/87235
Looking to get backported:
https://github.com/zephyrproject-rtos/zephyr/pull/87840
https://github.com/zephyrproject-rtos/zephyr/pull/87839
https://github.com/zephyrproject-rtos/zephyr/pull/87841

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Many threads locking two mutexes can cause a crash #89331

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Many threads locking two mutexes can cause a crash #89331

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions