Skip to content

Many threads locking two mutexes can cause a crash #89331

@dewitt-garmin

Description

@dewitt-garmin

Describe the bug
In some complex scenarios involving several threads and a few mutexes we have experienced some intermittent crashes depending on timing of the system. I've narrowed this down to an issue with the scalable waitq implementation not re-inserting when a priority changes, potentially making the rb tree invalid. Without the scalable waitq the wrong thread may be run which may or may not cause an issue depending on the application.

To Reproduce
This is not limited to mutexes, but the issue was most easily reproduced with mutexes with scalable waitq enabled. Consider 4 threads in decreasing priority order: A, B, C, and D along with two mutexes, m0 and m1:

  1. D locks m1
  2. C locks m0
  3. C pends on m1
  4. B pends on m1
  5. A pends on m0, boosts C's priority, now tree on m1 is not sorted
  6. D unlocks m1, left-most thread on tree is B. When removing B from tree it cannot be found because it searches to the right of C due to C's boosted priority when the node is actually on the left. rb_remove silently fails.
  7. B unlocks m1, left-most thread on tree is still B and it tries to unpend itself, resulting in a NULL pointer dereference on B->base.pended_on.

Expected behavior
System does not crash.

Impact
Intermittent but fairly consistent crashes on our system.

Logs and console output
See test: https://github.com/zephyrproject-rtos/zephyr/pull/87235/files#diff-8adc688dcc6c66e2f0a064f4ed385d3ff51e325b66ab8e4e9b7570cf80d1bf22

Environment (please complete the following information):

  • OS: WSL
  • Toolchain gcc-arm-none-eabi
  • v3.3, v3.7

Additional context
Fixed in #87235
Looking to get backported:
#87840
#87839
#87841

Metadata

Metadata

Labels

BackportBackport PR and backport failure issuesarea: KernelbugThe issue is a bug, or the PR is fixing a bugpriority: mediumMedium impact/importance bug

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions