Skip to content

kernel: thread: race condition between create and join #58116

@cfriedt

Description

@cfriedt

Describe the bug
While investigating #56163 and digging through pthreads, testing showed that k_thread_create() and k_thread_join() also exhibit a race condition when re-using the same struct k_threads over and over again.

It's not something regularly seen in production at the moment, and was only detected by accident. Originally reported here.

On the kernel side, this mainly seems to be an issue with smp platforms.

Please also mention any information which could help others to understand
the problem you're facing:

  • What target platform are you using? qemu_x86_64, qemu_cortex_a53_smp, qemu_riscv64_smp, qemu_riscv32_smp
  • What have you tried to diagnose or workaround this issue? Wrote a testsuite ([DNM]: tests: posix: stress test for pthread_create and pthread_join #58115). Note, pthreads are disabled in this suite currently, so all failures are k_thread at the moment. It happens on all libc configurations and most smp platforms.
  • Is this a regression? Probably not although it's hard to say.
  • ...

To Reproduce
Steps to reproduce the behavior:

  1. twister -i -T tests/posix/pthread_pressure
  2. see errors

Expected behavior
Tests passing ideally 100% of the time on all platforms.

Impact
It seems to be the opposite of what several contributors and maintainers expect, and is possibly just a corner case that did not receive a lot of traffic.

Logs and console output
E.g.

ERROR   - *** Booting Zephyr OS build zephyr-v3.3.0-4124-gf67ff9c38640 ***
Running TESTSUITE pthread_pressure
===================================================================
START - test_k_thread_create_join
I: NUM_THREADS: 2
I: TEST_NUM_CPUS: 2
I: TEST_DURATION_S: 10
I: TEST_DELAY_US: 0
ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/kernel/sched.c:1785
	aborted _current back from dead
E:      a0: 0000000000000004    t0: 0000000000000000
E:      a1: 00000000000006f9    t1: 0000000000000009
E:      a2: 0000000080009d98    t2: 0000000000000000
E:      a3: 0000000000000000    t3: 0000000000000001
E:      a4: 0000000000000000    t4: 0000000000000023
E:      a5: 0000000000000001    t5: 000000008000b2b0
E:      a6: 0000000000000001    t6: 0000000080006514
E:      a7: 0000000000000001
E:      ra: 00000000800063c4
E:    mepc: 00000000800017e0
E: mstatus: 0000000a00021880
E: 
E: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
E: Current thread: (nil) (unknown)
E: Halting system

Environment (please complete the following information):

  • OS: Linux
  • Toolchain: Zephyr SDK v0.16.1
  • Commit SHA or Version used: d01780f (main), v2.7.4

Additional context
#56163
#57637

Metadata

Metadata

Assignees

Labels

area: KernelbugThe issue is a bug, or the PR is fixing a bugpriority: mediumMedium impact/importance bug

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions