TESTING: Bluetooth: Host: Fix deadlock when failing to alloc on BT RX thread #2563

KyraLengfeld · 2025-02-27T12:14:39Z

NOTE:
This is testing upstream changes I'd like to merge downstream. This PR should NOT be merged. Added DNM label.

This commit alignes the timeout value for allocating buffers within att on the BT RX thread, making it consistent within att.c, see bt_att_req_alloc.

We are inferring in many bt_gatt_* functions that if called from a BT RX thread (which is inherently the case if called from a callback when running a Bluetooth application), we don't block and instead return -ENOMEM when the ATT request queue is full, avoiding a deadlock. This promise is fulfilled within bt_att_req_alloc, where the timeout for allocation of the request slab is set to K_NO_WAIT if we are on the BT RX thread. Unfortunately, we break this promise in bt_att_chan_create_pdu, where the timeout for allocation of the att pool is still K_FOREVER and deadlocks can (and do) occur when too many requests are sent yet the pool is depleted.

Note: Both req_slab and att_pool sizes are defined by CONFIG_BT_ATT_TX_COUNT. If applications start getting -ENOMEM with this change, they were at risk of such a deadlock, and may increase CONFIG_BT_ATT_TX_COUNT to allocate the att pool for their requests.

Note: This possible deadlock has been flying under the radar, as att_pools are freed when the HCI driver has sent it to the controller (instead of when receiving the response, as it happens with req_slabs) and due to the att_pool and the req_slab being both sized by CONFIG_BT_ATT_TX_COUNT, and req_slab being allocated before and returning -ENOMEM already if there is no space, it takes a more specific situation to deplete the att_pool but not the req_slab pool at this point.

Note: Ideally, we don't want functions to behave differently depending on which thread they are running, and while this commit makes it more consistent, it should be considered a workaround solution.

This is just to test a still open PR on RTOS Zephyr. Signed-off-by: Kyra Lengfeld <[email protected]>

KyraLengfeld added the DNM label Feb 27, 2025

NordicBuilder mentioned this pull request Feb 27, 2025

TESTING: manifest: sdk-zephyr: Bluetooth: Host: Fix deadlock when failing to alloc on BT RX thread nrfconnect/sdk-nrf#20654

Closed

KyraLengfeld force-pushed the fix_bt_att_chan_create_pdu_deadlock_test branch 4 times, most recently from a8729b3 to ecc1900 Compare February 27, 2025 12:33

KyraLengfeld changed the title ~~Bluetooth: Host: Fix deadlock when failing to alloc on BT RX thread~~ TESTING: Bluetooth: Host: Fix deadlock when failing to alloc on BT RX thread Feb 27, 2025

KyraLengfeld closed this Mar 3, 2025

KyraLengfeld deleted the fix_bt_att_chan_create_pdu_deadlock_test branch March 3, 2025 09:33

KyraLengfeld restored the fix_bt_att_chan_create_pdu_deadlock_test branch March 3, 2025 09:45

KyraLengfeld reopened this Mar 3, 2025

[nrf noup] Bluetooth: Host: deadlock fix testing.

b2eb415

This is just to test a still open PR on RTOS Zephyr. Signed-off-by: Kyra Lengfeld <[email protected]>

KyraLengfeld force-pushed the fix_bt_att_chan_create_pdu_deadlock_test branch from ecc1900 to b2eb415 Compare March 3, 2025 09:48

KyraLengfeld closed this Mar 4, 2025

KyraLengfeld deleted the fix_bt_att_chan_create_pdu_deadlock_test branch March 4, 2025 08:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TESTING: Bluetooth: Host: Fix deadlock when failing to alloc on BT RX thread #2563

TESTING: Bluetooth: Host: Fix deadlock when failing to alloc on BT RX thread #2563

Uh oh!

KyraLengfeld commented Feb 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

TESTING: Bluetooth: Host: Fix deadlock when failing to alloc on BT RX thread #2563

TESTING: Bluetooth: Host: Fix deadlock when failing to alloc on BT RX thread #2563

Uh oh!

Conversation

KyraLengfeld commented Feb 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant