Skip to content

Conversation

@KyraLengfeld
Copy link
Contributor

NOTE:
This is testing upstream changes I'd like to merge downstream. This PR should NOT be merged. Added DNM label.

This commit alignes the timeout value for allocating buffers within att on the BT RX thread, making it consistent within att.c, see bt_att_req_alloc.

We are inferring in many bt_gatt_* functions that if called from a BT RX thread (which is inherently the case if called from a callback when running a Bluetooth application), we don't block and instead return -ENOMEM when the ATT request queue is full, avoiding a deadlock. This promise is fulfilled within bt_att_req_alloc, where the timeout for allocation of the request slab is set to K_NO_WAIT if we are on the BT RX thread. Unfortunately, we break this promise in bt_att_chan_create_pdu, where the timeout for allocation of the att pool is still K_FOREVER and deadlocks can (and do) occur when too many requests are sent yet the pool is depleted.

Note: Both req_slab and att_pool sizes are defined by CONFIG_BT_ATT_TX_COUNT. If applications start getting -ENOMEM with this change, they were at risk of such a deadlock, and may increase CONFIG_BT_ATT_TX_COUNT to allocate the att pool for their requests.

Note: This possible deadlock has been flying under the radar, as att_pools are freed when the HCI driver has sent it to the controller (instead of when receiving the response, as it happens with req_slabs) and due to the att_pool and the req_slab being both sized by CONFIG_BT_ATT_TX_COUNT, and req_slab being allocated before and returning -ENOMEM already if there is no space, it takes a more specific situation to deplete the att_pool but not the req_slab pool at this point.

Note: Ideally, we don't want functions to behave differently depending on which thread they are running, and while this commit makes it more consistent, it should be considered a workaround solution.

@KyraLengfeld KyraLengfeld force-pushed the fix_bt_att_chan_create_pdu_deadlock_test branch 4 times, most recently from a8729b3 to ecc1900 Compare February 27, 2025 12:33
@KyraLengfeld KyraLengfeld changed the title Bluetooth: Host: Fix deadlock when failing to alloc on BT RX thread TESTING: Bluetooth: Host: Fix deadlock when failing to alloc on BT RX thread Feb 27, 2025
@KyraLengfeld KyraLengfeld deleted the fix_bt_att_chan_create_pdu_deadlock_test branch March 3, 2025 09:33
@KyraLengfeld KyraLengfeld restored the fix_bt_att_chan_create_pdu_deadlock_test branch March 3, 2025 09:45
@KyraLengfeld KyraLengfeld reopened this Mar 3, 2025
This is just to test a still open PR on RTOS Zephyr.

Signed-off-by: Kyra Lengfeld <[email protected]>
@KyraLengfeld KyraLengfeld force-pushed the fix_bt_att_chan_create_pdu_deadlock_test branch from ecc1900 to b2eb415 Compare March 3, 2025 09:48
@KyraLengfeld KyraLengfeld deleted the fix_bt_att_chan_create_pdu_deadlock_test branch March 4, 2025 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant