Skip to content

Conversation

dottspina
Copy link
Contributor

@dottspina dottspina commented May 8, 2025

This PR is mostly about updating the API documentation for some connection callbacks: commits titled "Amend xxx() callback documentation", we can also squash them once reviewed.

Additionally, I came across the use of a local variable as an atomic target in bt_conn_unref(): either I miss something obvious or it's typo in the source code, may be not deserving a dedicated PR or issue. Since this involves a Zephyr commit that we'll already mention, I've taken the liberty to discuss it briefly here as well.

EDIT: added a trivial commit to fix the MISRA "All if ... else if constructs shall be terminated
with an else clause" (c:M23_112) issues reported by SonarQube and blocking this PR.

Update connection callbacks API documentation

The API documentation for some connection callbacks appears to not reflect some changes in the Bluetooth Host stack [1] [2] [3].

  • bt_conn_cb.disconnected(): warns that the connection object is not actually reusable at this stage, but some of the advice given may be deprecated (auto-resume of advertising), and does not mention recycled() which IIUC is now the most reliable callback to track free connections and/or initiate operations that depend on them
  • bt_conn_cb.recycled(): still warns users to "treat this callback as an ISR", although it now runs on the system work-queue thread, as does disconnected()
  • bt_conn_cb.connected(): surprisingly, no special precautions are indicated, while it runs on the same thread that also invokes GATT callbacks such as bt_gatt_attr.read() or bt_gatt_attr.write() (usually the BT RX WQ thread)

The respective commit messages provide some further details.
My wording may not be ideal, edit requests are welcome.

Use of a local variable as an atomic target

While reading subsys/bluetooth/host/conn.c I came across something that I can't quite grasp, and I'm taking advantage of this PR to address it.

In bt_conn_unref(), the local, non shared, variable old is used as an atomic target:

atomic_val_t old = atomic_dec(&conn->ref);
conn = NULL;
bool deallocated = (atomic_get(&old) == 1);

I can't see what data race or value obsolescence the above call to atomic_get() could prevent ?
I considered that this might be a trick to force the compiler to generate some additional memory barrier, but I don't see the need for that either (if so, that might deserve a comment).

If pedantic, one could also note that atomic_get() expects an atomic_t* argument (called a target in Zephyr API), not an atomic_val_t* (value).
This compiles without even a warning since Zephyr Atomic defines both to be the same integer type.
The equivalent C11 code, where _Atomic(T) and T are different types, wouldn't compile.

Either I miss something obvious, or there's some kind of typo introduced with [1].
Please enlighten me ;-)

Thanks.

[1] efb5d83: Bluetooth: Host: Added Recycled evt notifying conn object is available
[2] 2ca59e7: Bluetooth: Host: Defer conn.recycled() to the syswq
[3] 8cfad44: Bluetooth: Deprecate adv auto-resume

Comment on lines 1663 to 1666
* @note This callback will run on the same, non-preemptible, work-queue thread
* that processes incoming low priority HCI packets and invokes GATT callbacks
* such as @ref bt_gatt_attr.read and @ref bt_gatt_attr.write.
* Blocking operations are therefore discouraged.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add this, but it's going to be pretty short lived. I'm working on moving all host-internal processing to a single internal work queue (or rather renaming and extending the purpose of the existing rx workqueue), while at the same time deferring any application callbacks to the system workqueue. Once that change happens the above comment will not be correct anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on moving all host-internal processing to a single internal work queue [...] while at the same time deferring any application callbacks to the system workqueue.

This will indeed be a welcome improvement: we'd better wait until you have made progress with this refactoring before updating the related API documentation (is there already an RFC or some draft ?).

Copy link
Member

@jhedberg jhedberg May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(is there already an RFC or some draft ?)

Not yet. I have a bt_wq branch in my Zephyr fork, but it's still pretty messy. I'll create a draft PR once it's a bit cleaned up. Also, I don't think I can do all the wanted changes in one go. First step is to create the generic internal workqueue and move as much as possible of internal processing to it. Moving all callbacks then to the system workqueue will need to be a separate (later) step, since that will require some more intrusive changes.

@jhedberg jhedberg requested a review from ubieda May 9, 2025 07:22
@dottspina dottspina force-pushed the pr-upd-bt-conn-callbacks-doc branch from e85c7de to 791dd18 Compare May 12, 2025 14:39
Copy link

@dottspina
Copy link
Contributor Author

Sorry @Thalley , our messages from Friday crossed.
I think we all agree on preventing access to a possibly recycled connection object, I just reworded the comment which I too found misleading: either it is unnecessary, assuming that the purpose of the conn = NULL; statement is obvious, or we should be careful not to suggest that this statement is, in some opaque way, about the actual connection object.

@jhedberg , understood, move forward quietly, we keep my proposals for the moment, and we'll review the documentation of all related callbacks when you're ready.

I also added a trivial commit to fix the MISRA "All if ... else if constructs shall be terminated
with an else clause" (c:M23_112) issues reported by SonarQube and blocking this PR.

Thanks.

@dottspina dottspina changed the title Update connection callbacks API documentation Blutooth: Host: Update connection callbacks API documentation and minor fixes May 12, 2025
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Jul 12, 2025
@Thalley Thalley removed the Stale label Jul 14, 2025
Comment on lines 1709 to 1868
* This callback runs on the system-worqueue thread,
* the usual precautions apply.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tricky specifying the threads that certain callbacks come on, as that means that if the BT subsystem starts using it's own workqueue, then there aren't any real functional changes, but the documentation would be wrong.

Not 100% opposed to this, but not sure it should be added either

Copy link
Contributor Author

@dottspina dottspina Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tricky specifying the threads that certain callbacks come on, as that means that if the BT subsystem starts using it's own workqueue, then there aren't any real functional changes, but the documentation would be wrong.

Absolutely right.

However:

  • today recycled() runs on the system work-queue: knowing this can help, e.g. to avoid inversion of priority issues
  • AFAIK @jhedberg is working on deferring all application callbacks to the system work-queue

Not 100% opposed to this, but not sure it should be added either

Not sure either.
My two cents: it's worth having documentation that's faithful to current behavior, to better guide users, and it will make it easier to update next time after @jhedberg's work is complete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, my ongoing work is over here: #93033 however it's still in quite immature state, and I can't promise when it'll be stable enough to be promoted to a normal PR from a Draft.

@dottspina dottspina force-pushed the pr-upd-bt-conn-callbacks-doc branch 2 times, most recently from 386320c to a6ee0ad Compare August 8, 2025 15:04
@dottspina dottspina force-pushed the pr-upd-bt-conn-callbacks-doc branch 2 times, most recently from b3c4dfb to aac405a Compare August 8, 2025 18:03
Copy link

sonarqubecloud bot commented Aug 8, 2025

Comment on lines 1867 to 1868
* This callback runs on the system workqueue thread,
* the usual precautions apply.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the usual precautions? I think that if we want to write such as statement, we should define those and refer to it. If you don't know the usual precautions, then this isn't helpful :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, I too would like to be able to refer to some sort of summary of the recommendations and limitations for each of the system threads (e.g. syswq, BT RX WQ).

For the system workqueue, usual precautions might be:

  • avoid "blocking operations that would delay other system workqueue processing to an unacceptable degree", which is emphasized in Workqueue Threads, System Workqueue
  • inversion of priority issues, which are well explained in Workqueue Threads, Workqueue Best Practices

Perhaps the most coherent thing to do is to briefly redirect to the references above, rather than duplicating or paraphrasing the content. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think references to the 2 pages you linked makes sense if it can be done in a brief way.

One concern about this is that we have many callbacks that run in the system workqueue, and where they are not documented to be as such, and thus documenting just a few of these cases may actually be hurtful rather than helpful, as it somewhat implies that the other callbacks do not have these same precautions. (i.e. if there is a only a written "warning" for 2 out of 10 things, people will assume that the "warning" does not apply the to remaining 8 things).

If we move towards documenting this for a few callbacks here, then I think we should move towards doing it for all Bluetooth callbacks. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I share your concerns: there are many callbacks, and documenting the execution context of only a few of them can be misleading. We should do that consistently, either documenting all callbacks that run on a specific thread (e.g. the system workqueue), or all callbacks within a subsystem/API.

We also need to find a way to avoid copy-pasting these warnings across the API documentation, perhaps using Doxygen aliases, text or even qualifiers like "isr-ok".

I think this deserves a distinct PR (may be an RFC), and thus removed all related changes from this one.

@Thalley Thalley requested review from Copilot and removed request for theob-pro August 11, 2025 08:22
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR primarily updates the API documentation for Bluetooth connection callbacks to reflect recent changes in the Bluetooth Host stack, particularly around connection lifecycle management and threading context. The updates clarify when callbacks run, deprecate certain behaviors, and provide better guidance on connection management patterns.

  • Updates connection callback documentation to reflect current threading behavior and best practices
  • Fixes a subtle bug in bt_conn_unref() where a local variable was incorrectly used as an atomic target
  • Adds missing else clauses to fix MISRA compliance issues

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
include/zephyr/bluetooth/conn.h Updates API documentation for connection callbacks with accurate threading context and usage guidance
subsys/bluetooth/host/conn.c Fixes atomic variable usage bug, improves comments, and adds MISRA-compliant else clauses

IF_ENABLED(CONFIG_BT_CONN_TX,
(__ASSERT(!(deallocated && k_work_is_pending(&conn->tx_complete_work)),
"tx_complete_work is pending when conn is deallocated")));
(bool conn_tx_is_pending = k_work_is_pending(&conn->tx_complete_work);));
Copy link
Preview

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semicolon after the closing parenthesis creates an empty statement. The line should end with just )).

Suggested change
(bool conn_tx_is_pending = k_work_is_pending(&conn->tx_complete_work);));
(bool conn_tx_is_pending = k_work_is_pending(&conn->tx_complete_work)));

Copilot uses AI. Check for mistakes.

IF_ENABLED(CONFIG_BT_CONN_TX,
(__ASSERT(!(deallocated && k_work_is_pending(&conn->tx_complete_work)),
"tx_complete_work is pending when conn is deallocated")));
(bool conn_tx_is_pending = k_work_is_pending(&conn->tx_complete_work);));
Copy link
Preview

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Variable declaration inside IF_ENABLED macro is unusual and reduces readability. Consider declaring conn_tx_is_pending before the IF_ENABLED block for better clarity.

Copilot uses AI. Check for mistakes.

* a connection object has actually been freed.
*
* This callback runs on the system workqueue thread,
* the usual precautions apply.
Copy link
Preview

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The documentation should be more specific about which precautions apply when running on the system workqueue thread. Consider referencing specific documentation or providing examples of the limitations.

Suggested change
* the usual precautions apply.
* This callback runs on the system workqueue thread.
* When running in this context, avoid blocking operations,
* long-running computations, or using APIs that are not safe
* to call from the system workqueue thread. For more details,
* see the Zephyr documentation on system workqueues:
* https://docs.zephyrproject.org/latest/kernel/services/threads/workqueue.html

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the next version of the documentation will be fine and address most of the requested improvements and fixes.

An important question we have left to decide relates to "Bluetooth: Host: Fix use of local variable as atomic target", on which @Thalley and I differ (see #89661 (comment)).

I have a last concern, I'm not sure how to cleanly integrate the "tx_complete_work is pending when conn is deallocated" assertion:

  • to be consistent, we should not access conn->tx_complete_work after decrementing the reference count
  • whatever we do, we will end up with two IF_ENABLED(CONFIG_BT_CONN_TX,...) blocks, one before atomic_dec() to store some state, and another after with the assertion itself, which is already a bit ugly
  • the test can't happen within the __ASSERT() statement: when assertions are disabled, the underlying variable will be unused, possibly causing build failures depending on compiler flags (see CI failures); I couldn't figure out how to fix that elegantly, or at least without adding even more ugliness (e.g. a dummy statement using the unused variable outside of the assertion)

I will include what I think is the least ugly approach in the next version of this PR. In the meantime, does anyone have any ideas?

Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • the test can't happen within the __ASSERT() statement: when assertions are disabled, the underlying variable will be unused, possibly causing build failures depending on compiler flags (see CI failures); I couldn't figure out how to fix that elegantly, or at least without adding even more ugliness (e.g. a dummy statement using the unused variable outside of the assertion)

Hmm, it's odd that we see a CI issue here because ASSERT should be enabled in those, and even then we have many other cases where local variables are used only for __ASSERT without causing CI issues (sonarqube and coverity sometimes complain about those though).

Some not-very-nice-looking solutions could be to modify the IF_ENABLED(CONFIG_BT_CONN_TX to also check CONFIG_ASSERT or to apply __maybe_unused to the local variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up using a standard #if directive:

  • I could'nt figure out how to test both CONFIG_BT_CONN_TX and CONFIG_ASSERT in a single IF_ENABLED()
  • checkpatch will parse anything like IF_ENABLED(CONFIG_BT_CONN_TX, (if (IS_ENABLED(CONFIG_ASSERT)) {assignment})) as an assignment in if condition (error ASSIGN_IN_IF), plus various warnings depending on formatting

Applying __maybe_unused is then necessary since the preprocessor can actually remove all uses of the variable.

It's not that ugly when you read it, and we don't call k_work_is_pending() unnecessarily.

Do you have a better idea?
Thanks for your help.

In bt_conn_unref(), a local variable is used as atomic target:

    atomic_val_t old = atomic_dec(&conn->ref);
    /* Prevent from accessing connection object */
    bool deallocated = (atomic_get(&old) == 1);

Using atomic_get() to access a non-shared local variable
cannot prevent any data race on that variable,
and only causes confusion.

Moreover, this call to atomic_get() is incorrect: the API expects
an atomic_t* argument (target), not an atomic_val_t* (value).
This compiles and /works/ only because Zephyr defines both to be
the same integer type, and thus: atomic_get(&old) == old.
The equivalent C11 code, where _Atomic(T) and T are different types,
wouldn't compile.

Signed-off-by: Christophe Dufaza <[email protected]>
@dottspina dottspina force-pushed the pr-upd-bt-conn-callbacks-doc branch 2 times, most recently from 1211868 to 365df2f Compare September 19, 2025 12:54
An assertion in bt_conn_unref() accesses the connection's state
after decrementing its reference count.
This is not consistent since, if we removed the last reference,
the Bluetooth Host stack may reuse the connection object
before the assertion is checked.
Instead, retrieve the connection property tested by the assertion
before decrementing the counter, as we do for other properties.

Simplify the code path by returning early when we did not remove
the last reference.

Remind that automatic advertiser resumption is deprecated.

Signed-off-by: Christophe Dufaza <[email protected]>
The API documentation for the recycled() callback predates [1],
and still warns users to "treat this callback as an ISR",
although it now runs on the system workqueue thread,
as does disconnected().
"Making Bluetooth API calls" to "re-start connectable advertising or
scanning" should no longer be "strongly discouraged".

On the contrary, we can emphasize that this is the right event
to listen for to initiate operations that will try to re-allocate
a freed connection object.

Mention that BT_MAX_CONN configures the size of the connection pool.

Refs:
- [1] efb5d83: Bluetooth: Host: Defer `conn.recycled()` to the syswq

Signed-off-by: Christophe Dufaza <[email protected]>
The API documentation for the disconnected() callback warns that
the listener can't assume that the corresponding connection object
has been freed and may me available to the application.

The recommendations given to still start a new connection
or connectable advertiser are outdated or misleading:
- "start connectable advertising": the options that "will attempt
  to resume the advertiser under some conditions" are deprecated
  since Zephyr 4.0 (BT_LE_ADV_OPT_CONNECTABLE and related)
- "using k_work_submit()": assuming everything will be fine
  when the work is actually processed is not reliable
- "increase CONFIG_BT_MAX_CONN": setting BT_MAX_CONN to N+1
  when planning N simultaneous connections is a work-around
  that users may have gotten used to (despite its footprint),
  but there is no longer any reason to advise it

Stop documenting creating new connections or restarting advertising
from the disconnected() callback and instead recommend relying
on recycled() for these use cases.

Signed-off-by: Christophe Dufaza <[email protected]>
Fix some "All if ... else if constructs shall be terminated
with an else clause" (c:M23_112) issues reported by SonarQube.

Signed-off-by: Christophe Dufaza <[email protected]>
@dottspina dottspina force-pushed the pr-upd-bt-conn-callbacks-doc branch from 365df2f to 48c5722 Compare September 19, 2025 15:29
Copy link

@dottspina
Copy link
Contributor Author

@Thalley, thank you for your review.

This PR should now only contain changes we agree on:

  • In bt_conn_unref() (2 commits): fix use of local variable as atomic target, fix connection usage after decrementing its reference count
  • API documentation update (2 commits): clearly state that recycled() is now the event to listen for to start a new connection or connectable advertiser, do not recommend workarounds for disconnected() (confirmed by @alwa-nordic)

Other considerations that emerged from our discussion:

  • It is useful to know the execution context of a callback that one implements, and the API documentation could help with this. Perhaps it's enough to tell users which thread, and assume they do their homework to understand the implications. What to do and how (e.g. Doxygen aliases ?) is beyond the scope of this PR and may deserve an RFC (e.g. starting with the Bluetooth subsystem).
  • Automatic advertiser resumption is deprecated since Zephyr 4.0: I considered removing the relevant part from bt_conn_unref(), but it wouldn't be consistent if we didn't remove its support altogether (configuration options and related API). I simply added a comment so we don't forget, but it might be worth opening an RFC (AFAICT samples and tests have already been updated).

Thanks.

Copy link
Contributor

@Thalley Thalley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

If you want to follow up with another PR, I noticed that our ADV API does not mentioned that it reserves a bt_conn object if starting connectable advertising, and that it is also not mentioned that if you stop connectable advertising, then the recycled callback is likewise called :)

@cfriedt cfriedt merged commit 69cab3f into zephyrproject-rtos:main Sep 22, 2025
28 checks passed
@dottspina dottspina deleted the pr-upd-bt-conn-callbacks-doc branch September 24, 2025 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Host Bluetooth Host (excluding BR/EDR) area: Bluetooth
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants