-
Notifications
You must be signed in to change notification settings - Fork 8k
Blutooth: Host: Update connection callbacks API documentation and minor fixes #89661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blutooth: Host: Update connection callbacks API documentation and minor fixes #89661
Conversation
include/zephyr/bluetooth/conn.h
Outdated
* @note This callback will run on the same, non-preemptible, work-queue thread | ||
* that processes incoming low priority HCI packets and invokes GATT callbacks | ||
* such as @ref bt_gatt_attr.read and @ref bt_gatt_attr.write. | ||
* Blocking operations are therefore discouraged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add this, but it's going to be pretty short lived. I'm working on moving all host-internal processing to a single internal work queue (or rather renaming and extending the purpose of the existing rx workqueue), while at the same time deferring any application callbacks to the system workqueue. Once that change happens the above comment will not be correct anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on moving all host-internal processing to a single internal work queue [...] while at the same time deferring any application callbacks to the system workqueue.
This will indeed be a welcome improvement: we'd better wait until you have made progress with this refactoring before updating the related API documentation (is there already an RFC or some draft ?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(is there already an RFC or some draft ?)
Not yet. I have a bt_wq
branch in my Zephyr fork, but it's still pretty messy. I'll create a draft PR once it's a bit cleaned up. Also, I don't think I can do all the wanted changes in one go. First step is to create the generic internal workqueue and move as much as possible of internal processing to it. Moving all callbacks then to the system workqueue will need to be a separate (later) step, since that will require some more intrusive changes.
e85c7de
to
791dd18
Compare
|
Sorry @Thalley , our messages from Friday crossed. @jhedberg , understood, move forward quietly, we keep my proposals for the moment, and we'll review the documentation of all related callbacks when you're ready. I also added a trivial commit to fix the MISRA "All if ... else if constructs shall be terminated Thanks. |
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
include/zephyr/bluetooth/conn.h
Outdated
* This callback runs on the system-worqueue thread, | ||
* the usual precautions apply. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky specifying the threads that certain callbacks come on, as that means that if the BT subsystem starts using it's own workqueue, then there aren't any real functional changes, but the documentation would be wrong.
Not 100% opposed to this, but not sure it should be added either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky specifying the threads that certain callbacks come on, as that means that if the BT subsystem starts using it's own workqueue, then there aren't any real functional changes, but the documentation would be wrong.
Absolutely right.
However:
- today
recycled()
runs on the system work-queue: knowing this can help, e.g. to avoid inversion of priority issues - AFAIK @jhedberg is working on deferring all application callbacks to the system work-queue
Not 100% opposed to this, but not sure it should be added either
Not sure either.
My two cents: it's worth having documentation that's faithful to current behavior, to better guide users, and it will make it easier to update next time after @jhedberg's work is complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, my ongoing work is over here: #93033 however it's still in quite immature state, and I can't promise when it'll be stable enough to be promoted to a normal PR from a Draft.
386320c
to
a6ee0ad
Compare
b3c4dfb
to
aac405a
Compare
|
include/zephyr/bluetooth/conn.h
Outdated
* This callback runs on the system workqueue thread, | ||
* the usual precautions apply. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the usual precautions? I think that if we want to write such as statement, we should define those and refer to it. If you don't know the usual precautions, then this isn't helpful :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree, I too would like to be able to refer to some sort of summary of the recommendations and limitations for each of the system threads (e.g. syswq, BT RX WQ).
For the system workqueue, usual precautions might be:
- avoid "blocking operations that would delay other system workqueue processing to an unacceptable degree", which is emphasized in Workqueue Threads, System Workqueue
- inversion of priority issues, which are well explained in Workqueue Threads, Workqueue Best Practices
Perhaps the most coherent thing to do is to briefly redirect to the references above, rather than duplicating or paraphrasing the content. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think references to the 2 pages you linked makes sense if it can be done in a brief way.
One concern about this is that we have many callbacks that run in the system workqueue, and where they are not documented to be as such, and thus documenting just a few of these cases may actually be hurtful rather than helpful, as it somewhat implies that the other callbacks do not have these same precautions. (i.e. if there is a only a written "warning" for 2 out of 10 things, people will assume that the "warning" does not apply the to remaining 8 things).
If we move towards documenting this for a few callbacks here, then I think we should move towards doing it for all Bluetooth callbacks. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I share your concerns: there are many callbacks, and documenting the execution context of only a few of them can be misleading. We should do that consistently, either documenting all callbacks that run on a specific thread (e.g. the system workqueue), or all callbacks within a subsystem/API.
We also need to find a way to avoid copy-pasting these warnings across the API documentation, perhaps using Doxygen aliases, text or even qualifiers like "isr-ok".
I think this deserves a distinct PR (may be an RFC), and thus removed all related changes from this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR primarily updates the API documentation for Bluetooth connection callbacks to reflect recent changes in the Bluetooth Host stack, particularly around connection lifecycle management and threading context. The updates clarify when callbacks run, deprecate certain behaviors, and provide better guidance on connection management patterns.
- Updates connection callback documentation to reflect current threading behavior and best practices
- Fixes a subtle bug in
bt_conn_unref()
where a local variable was incorrectly used as an atomic target - Adds missing else clauses to fix MISRA compliance issues
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
include/zephyr/bluetooth/conn.h |
Updates API documentation for connection callbacks with accurate threading context and usage guidance |
subsys/bluetooth/host/conn.c |
Fixes atomic variable usage bug, improves comments, and adds MISRA-compliant else clauses |
subsys/bluetooth/host/conn.c
Outdated
IF_ENABLED(CONFIG_BT_CONN_TX, | ||
(__ASSERT(!(deallocated && k_work_is_pending(&conn->tx_complete_work)), | ||
"tx_complete_work is pending when conn is deallocated"))); | ||
(bool conn_tx_is_pending = k_work_is_pending(&conn->tx_complete_work);)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semicolon after the closing parenthesis creates an empty statement. The line should end with just ))
.
(bool conn_tx_is_pending = k_work_is_pending(&conn->tx_complete_work);)); | |
(bool conn_tx_is_pending = k_work_is_pending(&conn->tx_complete_work))); |
Copilot uses AI. Check for mistakes.
subsys/bluetooth/host/conn.c
Outdated
IF_ENABLED(CONFIG_BT_CONN_TX, | ||
(__ASSERT(!(deallocated && k_work_is_pending(&conn->tx_complete_work)), | ||
"tx_complete_work is pending when conn is deallocated"))); | ||
(bool conn_tx_is_pending = k_work_is_pending(&conn->tx_complete_work);)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Variable declaration inside IF_ENABLED macro is unusual and reduces readability. Consider declaring conn_tx_is_pending
before the IF_ENABLED block for better clarity.
Copilot uses AI. Check for mistakes.
include/zephyr/bluetooth/conn.h
Outdated
* a connection object has actually been freed. | ||
* | ||
* This callback runs on the system workqueue thread, | ||
* the usual precautions apply. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The documentation should be more specific about which precautions apply when running on the system workqueue thread. Consider referencing specific documentation or providing examples of the limitations.
* the usual precautions apply. | |
* This callback runs on the system workqueue thread. | |
* When running in this context, avoid blocking operations, | |
* long-running computations, or using APIs that are not safe | |
* to call from the system workqueue thread. For more details, | |
* see the Zephyr documentation on system workqueues: | |
* https://docs.zephyrproject.org/latest/kernel/services/threads/workqueue.html |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the next version of the documentation will be fine and address most of the requested improvements and fixes.
An important question we have left to decide relates to "Bluetooth: Host: Fix use of local variable as atomic target", on which @Thalley and I differ (see #89661 (comment)).
I have a last concern, I'm not sure how to cleanly integrate the "tx_complete_work is pending when conn is deallocated" assertion:
- to be consistent, we should not access
conn->tx_complete_work
after decrementing the reference count - whatever we do, we will end up with two
IF_ENABLED(CONFIG_BT_CONN_TX,...)
blocks, one beforeatomic_dec()
to store some state, and another after with the assertion itself, which is already a bit ugly - the test can't happen within the
__ASSERT()
statement: when assertions are disabled, the underlying variable will be unused, possibly causing build failures depending on compiler flags (see CI failures); I couldn't figure out how to fix that elegantly, or at least without adding even more ugliness (e.g. a dummy statement using the unused variable outside of the assertion)
I will include what I think is the least ugly approach in the next version of this PR. In the meantime, does anyone have any ideas?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the test can't happen within the
__ASSERT()
statement: when assertions are disabled, the underlying variable will be unused, possibly causing build failures depending on compiler flags (see CI failures); I couldn't figure out how to fix that elegantly, or at least without adding even more ugliness (e.g. a dummy statement using the unused variable outside of the assertion)
Hmm, it's odd that we see a CI issue here because ASSERT should be enabled in those, and even then we have many other cases where local variables are used only for __ASSERT without causing CI issues (sonarqube and coverity sometimes complain about those though).
Some not-very-nice-looking solutions could be to modify the IF_ENABLED(CONFIG_BT_CONN_TX
to also check CONFIG_ASSERT
or to apply __maybe_unused
to the local variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up using a standard #if
directive:
- I could'nt figure out how to test both
CONFIG_BT_CONN_TX
andCONFIG_ASSERT
in a singleIF_ENABLED()
- checkpatch will parse anything like
IF_ENABLED(CONFIG_BT_CONN_TX, (if (IS_ENABLED(CONFIG_ASSERT)) {assignment}))
as an assignment in if condition (error ASSIGN_IN_IF), plus various warnings depending on formatting
Applying __maybe_unused
is then necessary since the preprocessor can actually remove all uses of the variable.
It's not that ugly when you read it, and we don't call k_work_is_pending()
unnecessarily.
Do you have a better idea?
Thanks for your help.
In bt_conn_unref(), a local variable is used as atomic target: atomic_val_t old = atomic_dec(&conn->ref); /* Prevent from accessing connection object */ bool deallocated = (atomic_get(&old) == 1); Using atomic_get() to access a non-shared local variable cannot prevent any data race on that variable, and only causes confusion. Moreover, this call to atomic_get() is incorrect: the API expects an atomic_t* argument (target), not an atomic_val_t* (value). This compiles and /works/ only because Zephyr defines both to be the same integer type, and thus: atomic_get(&old) == old. The equivalent C11 code, where _Atomic(T) and T are different types, wouldn't compile. Signed-off-by: Christophe Dufaza <[email protected]>
1211868
to
365df2f
Compare
An assertion in bt_conn_unref() accesses the connection's state after decrementing its reference count. This is not consistent since, if we removed the last reference, the Bluetooth Host stack may reuse the connection object before the assertion is checked. Instead, retrieve the connection property tested by the assertion before decrementing the counter, as we do for other properties. Simplify the code path by returning early when we did not remove the last reference. Remind that automatic advertiser resumption is deprecated. Signed-off-by: Christophe Dufaza <[email protected]>
The API documentation for the recycled() callback predates [1], and still warns users to "treat this callback as an ISR", although it now runs on the system workqueue thread, as does disconnected(). "Making Bluetooth API calls" to "re-start connectable advertising or scanning" should no longer be "strongly discouraged". On the contrary, we can emphasize that this is the right event to listen for to initiate operations that will try to re-allocate a freed connection object. Mention that BT_MAX_CONN configures the size of the connection pool. Refs: - [1] efb5d83: Bluetooth: Host: Defer `conn.recycled()` to the syswq Signed-off-by: Christophe Dufaza <[email protected]>
The API documentation for the disconnected() callback warns that the listener can't assume that the corresponding connection object has been freed and may me available to the application. The recommendations given to still start a new connection or connectable advertiser are outdated or misleading: - "start connectable advertising": the options that "will attempt to resume the advertiser under some conditions" are deprecated since Zephyr 4.0 (BT_LE_ADV_OPT_CONNECTABLE and related) - "using k_work_submit()": assuming everything will be fine when the work is actually processed is not reliable - "increase CONFIG_BT_MAX_CONN": setting BT_MAX_CONN to N+1 when planning N simultaneous connections is a work-around that users may have gotten used to (despite its footprint), but there is no longer any reason to advise it Stop documenting creating new connections or restarting advertising from the disconnected() callback and instead recommend relying on recycled() for these use cases. Signed-off-by: Christophe Dufaza <[email protected]>
Fix some "All if ... else if constructs shall be terminated with an else clause" (c:M23_112) issues reported by SonarQube. Signed-off-by: Christophe Dufaza <[email protected]>
365df2f
to
48c5722
Compare
|
@Thalley, thank you for your review. This PR should now only contain changes we agree on:
Other considerations that emerged from our discussion:
Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If you want to follow up with another PR, I noticed that our ADV API does not mentioned that it reserves a bt_conn
object if starting connectable advertising, and that it is also not mentioned that if you stop connectable advertising, then the recycled
callback is likewise called :)
This PR is mostly about updating the API documentation for some connection callbacks: commits titled "Amend xxx() callback documentation", we can also squash them once reviewed.
Additionally, I came across the use of a local variable as an atomic target in
bt_conn_unref()
: either I miss something obvious or it's typo in the source code, may be not deserving a dedicated PR or issue. Since this involves a Zephyr commit that we'll already mention, I've taken the liberty to discuss it briefly here as well.EDIT: added a trivial commit to fix the MISRA "All if ... else if constructs shall be terminated
with an else clause" (c:M23_112) issues reported by SonarQube and blocking this PR.
Update connection callbacks API documentation
The API documentation for some connection callbacks appears to not reflect some changes in the Bluetooth Host stack [1] [2] [3].
bt_conn_cb.disconnected()
: warns that the connection object is not actually reusable at this stage, but some of the advice given may be deprecated (auto-resume of advertising), and does not mentionrecycled()
which IIUC is now the most reliable callback to track free connections and/or initiate operations that depend on thembt_conn_cb.recycled()
: still warns users to "treat this callback as an ISR", although it now runs on the system work-queue thread, as doesdisconnected()
bt_conn_cb.connected()
: surprisingly, no special precautions are indicated, while it runs on the same thread that also invokes GATT callbacks such asbt_gatt_attr.read()
orbt_gatt_attr.write()
(usually theBT RX WQ
thread)The respective commit messages provide some further details.
My wording may not be ideal, edit requests are welcome.
Use of a local variable as an atomic target
While reading
subsys/bluetooth/host/conn.c
I came across something that I can't quite grasp, and I'm taking advantage of this PR to address it.In
bt_conn_unref()
, the local, non shared, variableold
is used as an atomic target:I can't see what data race or value obsolescence the above call to
atomic_get()
could prevent ?I considered that this might be a trick to force the compiler to generate some additional memory barrier, but I don't see the need for that either (if so, that might deserve a comment).
If pedantic, one could also note that
atomic_get()
expects anatomic_t*
argument (called a target in Zephyr API), not anatomic_val_t*
(value).This compiles without even a warning since Zephyr Atomic defines both to be the same integer type.
The equivalent C11 code, where
_Atomic(T)
andT
are different types, wouldn't compile.Either I miss something obvious, or there's some kind of typo introduced with [1].
Please enlighten me ;-)
Thanks.
[1] efb5d83: Bluetooth: Host: Added Recycled evt notifying conn object is available
[2] 2ca59e7: Bluetooth: Host: Defer
conn.recycled()
to the syswq[3] 8cfad44: Bluetooth: Deprecate adv auto-resume