Skip to content

Conversation

alwa-nordic
Copy link
Contributor

@alwa-nordic alwa-nordic commented Oct 9, 2025

This PR moves tx_processor off the system workqueue to a dedicated workqueue to prevent deadlocks. See commits for details.

Core changes

  • New bt_taskq workqueue - "for quick non-blocking Bluetooth tasks"
  • Move tx_processor to bt_taskq - Now it's safer to block on system work queue (which the Host does unfortunately)

Fallout fixes

  • Defer ATT user cb - write_cmd_cb banished from tx_processor to rx_queue to fix deadlock
  • Grab some RAM - BT_MAX_CONN reduced from 62 to 61 in peripheral_identity sample to fit bt_taskq

Cleanups

  • Less workaround - bt_cmd_send_sync workaround disabled when tx_processor uses dedicated thread

@alwa-nordic alwa-nordic force-pushed the bt-taskq branch 4 times, most recently from 8a8f17c to ade7122 Compare October 10, 2025 15:06
There are other places in the Host that would make sense to run on
bt_workq.

This change exposes `bt_workq_chosen` in `hci_core.h` for use in other
parts of the Host. `bt_workq_chosen` is set according to the
`BT_RECV_CONTEXT` choice.

Signed-off-by: Aleksander Wasaznik <[email protected]>
ATT is invoking user callbacks in its net_buf destroy function. It is
common practice that these callbacks can block on bt_hci_cmd_alloc(), so
we must consider these callbacks as blocking. This is a deadlock when
the net_buf_unref() happens inside the HCI driver, driver invoked from
tx_processor. Blocking callbacks like this appear in our own samples.
See further down about how this problem was detected.

Currently, tx_processor not protect against blocking callbacks so it is
de-facto forbidden. The Host should not equip net_bufs with dangerous
destroy callbacks.

This commit makes ATT defer its net_buf destruction and user callback
invocation to the system workqueue, so that net_buf_unref is safe to
call from non-blocking threads. Unsafe code is banished to the system
workqueue wild west. Future improvement may be to allow the user to
provide their own workqueue for ATT callbacks.

This deadlock was detected because the following test was failing while
tx_processor to the bt_taskq:

    tests/bsim/bluetooth/ll/throughput/tests_scripts/gatt_write.sh

The above test has an ATT callback `write_cmd_cb` invokes
`bt_conn_le_param_update` can block waiting for `tx_processor`.

Signed-off-by: Aleksander Wasaznik <[email protected]>
Reduce BT_MAX_CONN from 62 to 61 to make it build on integration
platform qemu_cortex_m3/ti_lm3s6965 when we add bt_taskq in subsequent
commit.

Signed-off-by: Aleksander Wasaznik <[email protected]>
@alwa-nordic alwa-nordic force-pushed the bt-taskq branch 2 times, most recently from a9fc732 to 03c365b Compare October 10, 2025 16:56
@nashif
Copy link
Member

nashif commented Oct 10, 2025

please set a proper title

@alwa-nordic alwa-nordic changed the title Bt taskq Bluetooth: Host: Move tx_processor to bt_taskq Oct 10, 2025
Add a new workqueue bt_taskq specifically designed for quick
non-blocking work items in the Bluetooth subsystem. This workqueue is
always available and does not depend on any Kconfig option.

Signed-off-by: Aleksander Wasaznik <[email protected]>
It's not safe for the tx_processor to share the system workqueue with
work items that block the thread until tx_processor runs. This is a
deadlock.

The Bluetooth Host itself performs these operations, usually involving
bt_hci_cmd_alloc(), on the system workqueue.

This change effectively gives tx_processor its own thread, like the BT
TX thread that used to exist. But, this time the thread is intended to
be shared with any other non-blocking Bluetooth Host tasks.

The bt_taskq rules tx_processor is supposed to be non-blocking and only
have code under our control on the thread stack. Unfortunately, this is
not entirely true currently. But we consider it close enough for now and
will ensure it starts adhering to the rules in the future. Examples of
problems:

 - The tx_processor invokes bt_hci_send(), driver code which has no
   rules limiting what it can do on our thread.
 - The tx_processor invokes net_buf_unref() on stack-external net_buf
   which executes user code on our thread.

Signed-off-by: Aleksander Wasaznik <[email protected]>
This commit disables the deadlock workaround in bt_cmd_send_sync when
it's not needed, when tx_processor runs on bt_taskq and not on system
workqueue.

Signed-off-by: Aleksander Wasaznik <[email protected]>
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants