Skip to content

Conversation

jhedberg
Copy link
Member

@jhedberg jhedberg commented Jul 11, 2025

Avoid the system workqueue, and introduce a general purpose Bluetooth workqueue that can be used in most situations.

@jhedberg jhedberg requested a review from Copilot July 11, 2025 16:09
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a dedicated Bluetooth workqueue and migrates existing SMP, L2CAP, RFCOMM, GATT, ATT, and other host timers and work submissions from the system workqueue to the new bt_work_* API.

  • Added bt_work_submit, bt_work_schedule, and bt_work_reschedule wrappers and initialized a Bluetooth workqueue in hci_core.c.
  • Replaced all uses of k_work_submit, k_work_schedule, and k_work_reschedule with the new bt_work_* variants across host subsystems.
  • Updated Kconfig to introduce CONFIG_BT_WQ, deprecate the old BT receive workqueue option, and adjusted various sample configuration files.

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
subsys/bluetooth/host/smp.c Replaced SMP work reschedules with bt_work_reschedule
subsys/bluetooth/host/settings.c Updated settings store work submissions to bt_work_submit
subsys/bluetooth/host/l2cap.c Converted L2CAP work resubmit/submit to bt_work_*
subsys/bluetooth/host/keys.c Changed ID add work submission to bt_work_submit
subsys/bluetooth/host/id.c Swapped RPA update scheduling to bt_work_schedule
subsys/bluetooth/host/hci_core.h Added prototypes for bt_work_submit, bt_work_schedule, bt_work_reschedule
subsys/bluetooth/host/hci_core.c Implemented the Bluetooth workqueue and API wrappers
subsys/bluetooth/host/gatt.c Migrated service‐changed and notify work calls to bt_work_*
subsys/bluetooth/host/conn.c Updated auto-init work submission (one schedule call still uses k_work_schedule)
subsys/bluetooth/host/classic/rfcomm.c Converted various RFCOMM timers to use bt_work_reschedule
subsys/bluetooth/host/classic/l2cap_br.c Updated BR L2CAP monitoring and retransmission timers to bt_work_*
subsys/bluetooth/host/classic/br.c Replaced BR discoverable timeout scheduling with bt_work_reschedule
subsys/bluetooth/host/att.c Swapped ATT timeout and EATT connection scheduling to bt_work_*
subsys/bluetooth/host/adv.c Changed advertising timeout to bt_work_reschedule
subsys/bluetooth/host/Kconfig Introduced CONFIG_BT_WQ, deprecated BT_RECV_WORKQ_SYS
subsys/bluetooth/controller/Kconfig.ll_sw_split Added conditional default stack sizes for nRF51 controller RX threads
samples/bluetooth/peripheral_hr/prj_minimal.conf Enabled console, UART, serial and adjusted workqueue stack sizes
samples/bluetooth/mesh_demo/src/microbit.c Refactored button/PWM setup and removed device‐ready checks
samples/bluetooth/mesh_demo/boards/bbc_microbit.conf Updated ISR, main, and WQ stack sizes and mesh buffer counts
samples/bluetooth/central_hr/prj_minimal.conf Enabled console/UART and tuned stack sizes for BT and system workqueues
Comments suppressed due to low confidence (3)

subsys/bluetooth/host/Kconfig:131

  • [nitpick] Add a help description for CONFIG_BT_WQ to explain its purpose and default behavior.
config BT_WQ

samples/bluetooth/peripheral_hr/prj_minimal.conf:73

  • [nitpick] Lowering the system workqueue stack size significantly may impact sample stability; add a validation test or runtime check for stack usage.
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=400


k_work_schedule(&conn->deferred_work,
CONN_UPDATE_TIMEOUT);
k_work_schedule(&conn->deferred_work, CONN_UPDATE_TIMEOUT);
Copy link
Preview

Copilot AI Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call should use the new bt_work_schedule API to schedule on the Bluetooth workqueue instead of k_work_schedule.

Suggested change
k_work_schedule(&conn->deferred_work, CONN_UPDATE_TIMEOUT);
bt_work_schedule(&conn->deferred_work, CONN_UPDATE_TIMEOUT);

Copilot uses AI. Check for mistakes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conn->deferred_work related code still uses the system workqueue, since there are application callbacks involved in several cases. The goal is to try to keep the new Bluetooth workqueue purely for Bluetooth host stack internal functionality, and anything that might reach over to the application side should be in another context.

Copy link

@Yagoor
Copy link
Contributor

Yagoor commented Jul 16, 2025

Hi @jhedberg,

This is a great improvement, however, I believe you will trigger a deadlock that is not possible with the current design.

Suppose the following scenario: A simple connection from the peripheral view.

A message is received and will be handled in the "BT WQ", it will then trigger the connected_cb, which will then block the "BT WQ" because it will defer some functionality to a higher priority context (let's call it "APP WQ"), the functionality in the "APP WQ" will call some function that will then send some hci which will then block until the "BT WQ" handles the tx_work. We have a deadlock.

With the current design this would not happen.

A message is received and will be handled in the "BT RX WQ", it will then trigger the connected_cb, which will then block the "BT RX WQ" because it will defer some functionality to a higher priority context (let's call it "APP WQ"), the functionality in the "APP WQ" will call some function that will then send some hci which will then block until the "syswq" handles the tx_work. We don't have a deadlock.

I still believe we should move away from the syswq, that is for sure an improvement, but having TX/RX in the same context will be a challenge with a synchronous stack.

@jhedberg
Copy link
Member Author

@Yagoor the next step, which I haven't gotten around to implementing yet since it's more complicated (and may come with more memory overhead) is to ensure that we don't do any callbacks within the BT WQ context. Or to put it another way, we ensure that there is no way for the application side to get access to the BT WQ context. My thinking is that the callbacks could be done from the system workqueue. Do you think that should take care of the deadlock you describe?

@jhedberg
Copy link
Member Author

@Yagoor another change I'd like to do is to move away from doing synchronous HCI (bt_hci_cmd_send_sync()) everywhere else except the HCI initialization procedure. That would however mean substantial redesign of our APIs, since many of them are designed to return an immediate error if an operation wasn't successful.

@Yagoor
Copy link
Contributor

Yagoor commented Jul 17, 2025

@jhedberg thanks for the reply.

@Yagoor the next step, which I haven't gotten around to implementing yet since it's more complicated (and may come with more memory overhead) is to ensure that we don't do any callbacks within the BT WQ context. Or to put it another way, we ensure that there is no way for the application side to get access to the BT WQ context. My thinking is that the callbacks could be done from the system workqueue. Do you think that should take care of the deadlock you describe?

I see, that would solve the deadlock if the callbacks are asynchronous, meaning that the BT WQ is not waiting for the callback result. I think the challenge here will be on the att/gatt layers (and maybe others that I am not familiar with), it is also synchronous, for example the att_write_rsp will call a write cb that can be in the application and it needs the response from that callback to send the response to the remote side.
One solution would be for the stack to not send the response and leave it for the application since the host layer has a much more relaxed timing but that will certainly have the expense of net bufs because now they need to survive for longer and if the application does something wronge, they will leak. It will be a trade off and this is certainly open for discussion on what is the best way.

@Yagoor another change I'd like to do is to move away from doing synchronous HCI (bt_hci_cmd_send_sync()) everywhere else except the HCI initialization procedure. That would however mean substantial redesign of our APIs, since many of them are designed to return an immediate error if an operation wasn't successful.

I believe this would be a good place to start in making the stack asynchronous because the stack is in full control here (there is no application interaction) so api changes can be done with less issues but the solution chosen here on how to make this operation asynchronous can be used to solve the above challenge as well.

Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Sep 16, 2025
@jhedberg jhedberg removed the Stale label Sep 19, 2025
These haven't really been maintained ever since they were introduced, and
with lack of understanding them (among Bluetooth maintainers) they're more
of a burden for development than a benefit.

Signed-off-by: Johan Hedberg <[email protected]>
Update the minimal configurations for peripheral_hr and central_hr to match
what the Bluetooth stack current needs & uses. Also enable the serial
console, since that's possible to do while still fitting within 16k RAM
and is very helpful for basic verification and debugging.

Signed-off-by: Johan Hedberg <[email protected]>
The actual usage is substantially less than the default when using minimal
configurations on nRF51:

 BT CTLR RX          : STACK: unused 32 usage 288 / 320 (90 %); CPU: 0 %
 BT CTLR RX pri      : STACK: unused 32 usage 344 / 376 (91 %); CPU: 0 %

Signed-off-by: Johan Hedberg <[email protected]>
Fine tune the mesh_demo app for bbc_microbit.

Signed-off-by: Johan Hedberg <[email protected]>
Avoid the system workqueue, and introduce a general purpose Bluetooth
workqueue that can be used in most situations.

Signed-off-by: Johan Hedberg <[email protected]>
Copy link

sonarqubecloud bot commented Oct 6, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants