-
Notifications
You must be signed in to change notification settings - Fork 8.3k
bluetooth: gatt: add err param to discover cb
#81432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
err param to discover cb
|
I recognize this needs to go through the stable API change process, looking for an initial indication that the change is acceptable before sending through to the dev list. |
e9317a1 to
d4afa91
Compare
But that's begs the question: What is the use case for this? If no-one is checking for the connection state, they do care?
So there are a lot of different errors is what you are saying? So the way I see it there are 3 overall cases:
In cases 2) and 3) an application cannot do anything anyways. That's why I am arguing that we need to have a use case for the |
I have a wrapper around
I'm saying there are a lot of code paths that can result in errors.
The correct action to take to "move on" depends on why it failed, which cannot currently be determined.
I disagree, in situation 2) there are many things the application can do, depending on whether or not the characteristic you were searching for is a requirement or just a nice to have.
My discovery process has failed, do I as a user need to call |
I am confused, what is the purpose of the API lifecycle then? |
Update all instances of `bt_gatt_discover_func_t` in the tree. Signed-off-by: Jordan Yates <[email protected]>
baa1f66 to
29803c5
Compare
If we change it so that the stack will initiate a ATT disconnect in case of invalid GATT values, then we are left with 2 cases:
Both are currently possible to determine without any API changes. However if we do not decide to disconnect in case of invalid response from the GATT server, then I agree that there's a use case for the |
You are right. I did not mean we can't do it, just that we shouldn't. I think it's very feasible to do this change with a smooth transition. |
It is not useful if the application just needs to know when "the current connection has this specific service" and react to that. It is necessary when the application needs to know that "the remote definitively did not have this specific service". For example if this is cached. |
| memset(fixture, 0, sizeof(*fixture)); | ||
|
|
||
| err = bt_bap_unicast_server_register(¶m); | ||
| (void)bt_bap_unicast_server_register(¶m); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the correct solution. If you want to change this in this PR (IMO should be a separate PR), then it should be something like zassert_equal(bt_bap_unicast_server_register(¶m), 0); or zassert_equal(err, 0);`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only made the change to pass CI, since this is currently failing on main I assumed it would be properly fixed for the release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the CI error? Unused variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'm having a look at how to properly fix this, but it seems non-trivial. Will let you know when I have a PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the non-trivial and proper fix: #81515
To me this is just a breaking API change with extra steps. At some stage you are going to hit the definition of "A breaking API change is defined as one that forces users to modify their existing code in order to maintain the current behavior of their application". Forcing a change now vs later is the same thing from the project perspective, and it turns one PR into multiple PR's with additional implementation complexities. Unless there is some other advantage I am missing. |
There are definitely benefits for a large project: It turns some errors into warnings, which allows a downstream project chug along as normal and plan better. The downstream project can easily upgrade to a new Zephyr version, get the new features and bug-fixes and make a plan to upgrade each individual use site of the deprecated APIs. It makes it easy for downstream users to run their CI with both their adopted version of Zephyr and the next to get advance warning about deprecation and find regressions. There is a smooth upgrade path by using only APIs that exists in both versions, so there is no ifdef-version-soup around the breakage. For this the deprecated "A" and new API "B" have to coexist for at least two releases, so that downstream can upgrade from API A to API B at a point when their CI is testing two Zephyr versions both with both A and B. It makes it so that upgrading Zephyr is not a "stop the world" period, where merges may have to be stopped for some time to allow someone to upgrade and fix all breakage without several rounds of "Oh. A new use of the old API just got merged, you have to fix that too.", or worse, a broken main branch for some period. For large projects people stepping on each other's feet is not unlikely. |
@JordanYates Do you have a response for this comment? If we disconnect the ATT bearer on receiving invalid ATT PDUs, then we have ways to handle the other scenarios with the existing API |
Sounds reasonable, I am not able to make that change though. |
Right, so we are missing value check on responses (e.g. if we are discovering from handle 100 to 200 and the response contains a handle 50, then we assume the remote device or the link is broken. In that case we can either (as suggested) disconnect the ATT bearer and the ACL, or we can consider that similar to "attribute not found". Which makes the most sense? (@alwa-nordic please chime in here as well) Once that is resolved, then we are down to either "Not found" or "disconnected", both of which is possible to handle without modifying the API. |
Disconnecting is fine. I don't think we can treat it as "not found", since the remote never said that. We can also be a little more robust. We can skip over handles that we did not request. For example, as long as we got at least one handle we were interested in, we can increment the handle range to discover and continue. |
Aren't you disagreeing with yourself here? :D Let's take a simple case where we send a discovery request from handle 100 to handle 200 for some specific type, and the server responds with handles You seem to both suggest that we disconnect given the invalid value In the case that the server only respond with So the way I see it, we have the following options when discovering for handle 100 to 200:
So we have the options to 1) always disconnect due to invalid values, 2) only disconnect if we only get invalid values or 3) treat invalid values are no values |
|
I've changed my mind. All three options are fine by me. |
I'll check the core spec if it mentions what the expected behavior is :) A server receiving a request with invalid values shall just be rejected, but unsure if it mentions what happens if a client receives an invalid response |
The core spec does not specify any defined behavior for the GATT client it if receives invalid responses. The cleanest way to deal with this is, IMO, simply to disconnect. If only a part of the response is valid, we cannot assume that the other part is valid. |
|
@jhedberg @alwa-nordic (cc @Vudentz ) I'm not 100% which direction to take here, so would like some input from you. Current we have:
This seems to have been the behavior of ATT and GATT ever since it was added to Zephyr. I suggest one of the 2 following solutions:
Overall, we just disconnect with any invalid response. Solution 2: Relaxed
Overall, we ignore what we can and always provide application a callback with |
|
@Thalley my initial feeling is that not disconnecting may result to better interoperability (or rather, user expereience) in case of buggy devices, in which case I'd be in favor of solution 2. |
I was thinking the same, but then we cannot fix the issue raised by @JordanYates - We will just treat invalid responses similar to "Not found". My concern is that if the server does provide invalid responses, and we treat them as "Not found" (during discovery), then how much can we trust the server at all? If we discover from handle 100 to 200 and we get back a characteristic at handle 50, can we trust that a read of handle e.g. 150 then the response is really the value of handle 150? As mentioned the core spec does not specify any behavior here, so we have to either treat invalid responses as "Not found" or errors, and hope that future requests have valid responses, or we treat the entire remote GATT server (or the link) as invalid and disconnect. |
|
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
|
I have not forgotten this btw, and will provide a PR for disconnecting ACL/GATT if we receive invalid responses (possibly guarded by a Kconfig) |
Having attempted to drop these commits in my upgrade to Zephyr v4.2, I'm not convinced that actually solves my problem. Handling invalid responses is not the problem, the problem is the callback doesn't know whether discovery failed or whether the attribute doesn't exist. Discovery can fail for reasons other than an invalid response (the remote disconnecting) and unless you are planning to remove the callback being run on a disconnected connection (breaking the current model), it doesn't help the callback determine whether to continue or not. |
In the case of a disconnect, then you can (and possibly should?) check the connection state of the provided
Do you think we are still missing a case here? |
Add an error parameter to the discovery callback so that users can differentiate between an error occurring in the discovery process and the attribute simply not existing.
Currently both situations are provided to the user as
attr == NULL.