Skip to content

Conversation

npitre
Copy link

@npitre npitre commented Oct 17, 2025

Disable a few samples that aren't behaving well on SMP.

Nicolas Pitre added 2 commits October 17, 2025 14:41
The "_too_small" test variants intentionally configure an isolated buffer
pool with only 2 buffers to validate proper error handling when the pool
is exhausted during notification of 16 msg_subscriber observers. These
tests are likely to fail on SMP systems due to a buffer recycling race
condition:

Expected behavior (uniprocessor):
  Publisher Thread:
    1. Allocate buf1 for bar_msg_sub1 ✓
    2. k_fifo_put(sub1_fifo, buf1)
    3. Allocate buf2 for bar_msg_sub2 ✓
    4. k_fifo_put(sub2_fifo, buf2)
    5. Try allocate buf3 for bar_msg_sub3 → FAIL -ENOMEM

  Subscriber threads process messages after notification completes,
  pool exhausts at subscriber zephyrproject-rtos#3 as expected.

SMP race condition:
  CPU 0 (Publisher):         CPU 1 (bar_msg_sub1):  CPU 2 (bar_msg_sub2):
  ------------------         ---------------------  ---------------------
  Alloc buf1 ✓
  k_fifo_put(sub1, buf1)
                             k_fifo_get() → buf1
                             zbus_sub_wait_msg()
                             net_buf_unref()
                             → buf1 FREED!
  Alloc buf2 ✓
  (reuses buf1!)
  k_fifo_put(sub2, buf2)
                                                    k_fifo_get() → buf2
                                                    net_buf_unref()
                                                    → buf2 FREED!
  Alloc buf3 ✓
  (reuses buf1 again!)
  ...continues...
  All 16 allocations succeed!

On SMP systems, subscriber threads on other CPUs may consume and free
buffers quickly enough that they are recycled back to the pool before the
publisher's notification loop can exhaust it. The test's assumption that
notification completes before subscribers run does not hold with parallel
execution.

Since this is a test design limitation (not a zbus bug), filter SMP
configurations from these specific test variants rather than attempt to
artificially slow down subscribers or change thread priorities.

Signed-off-by: Nicolas Pitre <[email protected]>
The benchmark_sync test produces incorrect results on SMP systems due to
a race condition between the producer thread and subscriber threads that
only occurs with parallel execution.

Thread configuration:
- Producer thread: priority 5 (lower priority, runs later)
- Subscriber threads (8): priority 3 (higher priority, runs first)

Expected behavior on uniprocessor:
  1. Producer publishes message
  2. Subscriber immediately preempts producer (priority 3 < 5)
  3. Subscriber processes message, calls atomic_add(&count, 256)
  4. Producer resumes, continues to next message

  Result: All messages counted before producer checks final count.

Race condition on SMP:
  CPU 0 (Producer):              CPU 1-3 (Subscribers):
  -----------------              ----------------------
  Publish msg 1
  k_msgq_put(sub1) ✓
                                 k_msgq_get() → processing
  Publish msg 2
  k_msgq_put(sub2) ✓
                                 k_msgq_get() → processing
  ...continues...
  Publish msg 128 ✓
  atomic_get(&count)             Still processing...
  → Reports incomplete count!    atomic_add() comes later

On SMP, the producer doesn't get preempted since it runs on a different
CPU from the subscribers. It races ahead and checks atomic_get(&count)
while subscriber threads on other CPUs are still processing messages.

Observed results:
- Non-SMP: Bytes sent = 262144, received = 262144 ✓
- SMP: Bytes sent = 262144, received = 260352 ✗ (7 messages lost)

This is a benchmark test design issue, not a zbus bug. The test assumes
subscribers complete before the producer finishes, which doesn't hold on
SMP systems. Filter SMP configurations from this test variant for now.

Signed-off-by: Nicolas Pitre <[email protected]>
@npitre npitre closed this Oct 18, 2025
@npitre npitre reopened this Oct 18, 2025
Copy link

@npitre npitre added this to the v4.3.0 milestone Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants