bluetooth: gatt: add `err` param to discover cb #81432

JordanYates · 2024-11-15T09:38:15Z

Add an error parameter to the discovery callback so that users can differentiate between an error occurring in the discovery process and the attribute simply not existing.

Currently both situations are provided to the user as attr == NULL.

JordanYates · 2024-11-15T09:39:08Z

I recognize this needs to go through the stable API change process, looking for an initial indication that the change is acceptable before sending through to the dev list.
Release notes and migration guide will be added after 4.0 release.

Thalley · 2024-11-15T11:51:57Z

In theory the disconnect error could be detected, but I have not seen a single instance in tree that treats attr == NULL as anything other than "Discovery complete, move on to the next step". Checking the state of the Bluetooth connection does not appear to be an obvious step based on this.

But that's begs the question: What is the use case for this?

If no-one is checking for the connection state, they do care?
Those that may care can check the state.

In theory, every single goto done in functions touched in the first commit is an error that cannot be detected through the state of the Bluetooth connection. I can count at least 20 instances of those.

So there are a lot of different errors is what you are saying?
I think the API has worked so far because from the application perspective, the discovery procedure either found what it was looking for, or it didn't, and in the case that it didn't the application could not really do anything besides just move on. In the case that we receive invalid data, as @alwa-nordic suggested, then we probably should just disconnect.

So the way I see it there are 3 overall cases:

Attribute found and attr != NULL and everyone is happy
Attribute is not found and attr == NULL
Invalid response from remote => We should just disconnect.

In cases 2) and 3) an application cannot do anything anyways. That's why I am arguing that we need to have a use case for the err value for this change. From a Zephyr contributor perspective I think it's a good idea to always provide information to the upper layers, but as a Zephyr GATT user I have yet to understand a use case to warrant a breaking API change

JordanYates · 2024-11-15T12:19:35Z

But that's begs the question: What is the use case for this?

I have a wrapper around bt_conn_create that also performs MTU updates and characteristic discovery. Trying to continue discovery if the existing discovery has failed because of a disconnect or protocol error is pointless. I'm surprised that trying to determine "did my discovery succeed or fail" is controversial.

So there are a lot of different errors is what you are saying?

I'm saying there are a lot of code paths that can result in errors.

the discovery procedure either found what it was looking for, or it didn't, and in the case that it didn't the application could not really do anything besides just move on.

The correct action to take to "move on" depends on why it failed, which cannot currently be determined.

In cases 2) and 3) an application cannot do anything anyways

I disagree, in situation 2) there are many things the application can do, depending on whether or not the characteristic you were searching for is a requirement or just a nice to have.

That's why I am arguing that we need to have a use case for the err value for this change.

My discovery process has failed, do I as a user need to call bt_disconnect?

JordanYates · 2024-11-15T12:20:41Z

To preserve backwards compatibility we cannot add a new parameter to bt_gatt_discover_func_t.

I am confused, what is the purpose of the API lifecycle then?

Update all instances of `bt_gatt_discover_func_t` in the tree. Signed-off-by: Jordan Yates <[email protected]>

Thalley · 2024-11-15T13:21:26Z

My discovery process has failed, do I as a user need to call bt_disconnect?

If we change it so that the stack will initiate a ATT disconnect in case of invalid GATT values, then we are left with 2 cases:

Either the attribute is not found and attr == NULL
The connection has failed

Both are currently possible to determine without any API changes.

However if we do not decide to disconnect in case of invalid response from the GATT server, then I agree that there's a use case for the err parameter, so the application can make that decision.

alwa-nordic · 2024-11-15T14:09:48Z

To preserve backwards compatibility we cannot add a new parameter to bt_gatt_discover_func_t.

I am confused, what is the purpose of the API lifecycle then?

You are right. I did not mean we can't do it, just that we shouldn't. I think it's very feasible to do this change with a smooth transition.

alwa-nordic · 2024-11-15T14:19:09Z

But that's begs the question: What is the use case for this?

It is not useful if the application just needs to know when "the current connection has this specific service" and react to that.

It is necessary when the application needs to know that "the remote definitively did not have this specific service". For example if this is cached.

Thalley · 2024-11-15T14:19:44Z

tests/bluetooth/audio/ascs/src/main.c

 	memset(fixture, 0, sizeof(*fixture));

-	err = bt_bap_unicast_server_register(&param);
+	(void)bt_bap_unicast_server_register(&param);


This isn't the correct solution. If you want to change this in this PR (IMO should be a separate PR), then it should be something like zassert_equal(bt_bap_unicast_server_register(&param), 0); or zassert_equal(err, 0);`

I only made the change to pass CI, since this is currently failing on main I assumed it would be properly fixed for the release.

What was the CI error? Unused variable?

Thanks. I'm having a look at how to properly fix this, but it seems non-trivial. Will let you know when I have a PR

Here's the non-trivial and proper fix: #81515

JordanYates · 2024-11-15T23:07:20Z

We could make bt_gatt_discover_params.func a union of bt_gatt_discover_func_t and a new type and add a bit somewhere to select between them. We deprecate bt_gatt_discover_func_t at the same time.

You are right. I did not mean we can't do it, just that we shouldn't. I think it's very feasible to do this change with a smooth transition.

To me this is just a breaking API change with extra steps. At some stage you are going to hit the definition of "A breaking API change is defined as one that forces users to modify their existing code in order to maintain the current behavior of their application". Forcing a change now vs later is the same thing from the project perspective, and it turns one PR into multiple PR's with additional implementation complexities. Unless there is some other advantage I am missing.

alwa-nordic · 2024-11-16T10:58:35Z

We could make bt_gatt_discover_params.func a union of bt_gatt_discover_func_t and a new type and add a bit somewhere to select between them. We deprecate bt_gatt_discover_func_t at the same time.

You are right. I did not mean we can't do it, just that we shouldn't. I think it's very feasible to do this change with a smooth transition.

To me this is just a breaking API change with extra steps. At some stage you are going to hit the definition of "A breaking API change is defined as one that forces users to modify their existing code in order to maintain the current behavior of their application". Forcing a change now vs later is the same thing from the project perspective, and it turns one PR into multiple PR's with additional implementation complexities. Unless there is some other advantage I am missing.

There are definitely benefits for a large project:

It turns some errors into warnings, which allows a downstream project chug along as normal and plan better. The downstream project can easily upgrade to a new Zephyr version, get the new features and bug-fixes and make a plan to upgrade each individual use site of the deprecated APIs.

It makes it easy for downstream users to run their CI with both their adopted version of Zephyr and the next to get advance warning about deprecation and find regressions. There is a smooth upgrade path by using only APIs that exists in both versions, so there is no ifdef-version-soup around the breakage. For this the deprecated "A" and new API "B" have to coexist for at least two releases, so that downstream can upgrade from API A to API B at a point when their CI is testing two Zephyr versions both with both A and B.

It makes it so that upgrading Zephyr is not a "stop the world" period, where merges may have to be stopped for some time to allow someone to upgrade and fix all breakage without several rounds of "Oh. A new use of the old API just got merged, you have to fix that too.", or worse, a broken main branch for some period. For large projects people stepping on each other's feet is not unlikely.

Thalley · 2024-12-12T12:30:05Z

My discovery process has failed, do I as a user need to call bt_disconnect?

If we change it so that the stack will initiate a ATT disconnect in case of invalid GATT values, then we are left with 2 cases:
1. Either the attribute is not found and `attr == NULL`

2. The connection has failed
Both are currently possible to determine without any API changes.

However if we do not decide to disconnect in case of invalid response from the GATT server, then I agree that there's a use case for the err parameter, so the application can make that decision.

@JordanYates Do you have a response for this comment?

If we disconnect the ATT bearer on receiving invalid ATT PDUs, then we have ways to handle the other scenarios with the existing API

JordanYates · 2024-12-12T22:02:37Z

My discovery process has failed, do I as a user need to call bt_disconnect?

If we change it so that the stack will initiate a ATT disconnect in case of invalid GATT values, then we are left with 2 cases:
1. Either the attribute is not found and `attr == NULL`

2. The connection has failed
Both are currently possible to determine without any API changes.
However if we do not decide to disconnect in case of invalid response from the GATT server, then I agree that there's a use case for the err parameter, so the application can make that decision.
@JordanYates Do you have a response for this comment?

If we disconnect the ATT bearer on receiving invalid ATT PDUs, then we have ways to handle the other scenarios with the existing API

Sounds reasonable, I am not able to make that change though.

Thalley · 2024-12-13T13:53:29Z

My discovery process has failed, do I as a user need to call bt_disconnect?

If we change it so that the stack will initiate a ATT disconnect in case of invalid GATT values, then we are left with 2 cases:
1. Either the attribute is not found and `attr == NULL`

2. The connection has failed
Both are currently possible to determine without any API changes.
However if we do not decide to disconnect in case of invalid response from the GATT server, then I agree that there's a use case for the err parameter, so the application can make that decision.
@JordanYates Do you have a response for this comment?
If we disconnect the ATT bearer on receiving invalid ATT PDUs, then we have ways to handle the other scenarios with the existing API
Sounds reasonable, I am not able to make that change though.

Right, so we are missing value check on responses (e.g. if we are discovering from handle 100 to 200 and the response contains a handle 50, then we assume the remote device or the link is broken.

In that case we can either (as suggested) disconnect the ATT bearer and the ACL, or we can consider that similar to "attribute not found". Which makes the most sense? (@alwa-nordic please chime in here as well)

Once that is resolved, then we are down to either "Not found" or "disconnected", both of which is possible to handle without modifying the API.

alwa-nordic · 2024-12-13T14:11:28Z

In that case we can either (as suggested) disconnect the ATT bearer and the ACL, or we can consider that similar to "attribute not found". Which makes the most sense? (@alwa-nordic please chime in here as well)

Disconnecting is fine. I don't think we can treat it as "not found", since the remote never said that. We can also be a little more robust. We can skip over handles that we did not request. For example, as long as we got at least one handle we were interested in, we can increment the handle range to discover and continue.

Thalley · 2024-12-16T10:37:34Z

Disconnecting is fine. I don't think we can treat it as "not found", since the remote never said that. We can also be a little more robust. We can skip over handles that we did not request. For example, as long as we got at least one handle we were interested in, we can increment the handle range to discover and continue.

Aren't you disagreeing with yourself here? :D

Let's take a simple case where we send a discovery request from handle 100 to handle 200 for some specific type, and the server responds with handles [50, 150].

You seem to both suggest that we disconnect given the invalid value 50, both also that we just ignore that and only provide 150 to the upper layers.

In the case that the server only respond with 50, then we cannot provide anything to the upper layers.

So the way I see it, we have the following options when discovering for handle 100 to 200:

Server responds with [50, 150], and we disconnect due to invalid value 50. Server responds with
[50] and we also disconnect.
Server responds with [50, 150], and we ignore 50 and provide 150 to the upper layer due to invalid value 50. Server responds with [50] and we disconnect as there is nothing to provide to the upper layer.
Server responds with [50, 150], and we ignore 50 and provide 150 to the upper layer due to invalid value 50. Server responds with [50] and we provide attr = NULL (not found) to the upper layer.

So we have the options to 1) always disconnect due to invalid values, 2) only disconnect if we only get invalid values or 3) treat invalid values are no values

alwa-nordic · 2024-12-16T15:53:31Z

I've changed my mind. All three options are fine by me.

Thalley · 2024-12-16T15:58:31Z

I've changed my mind. All three options are fine by me.

I'll check the core spec if it mentions what the expected behavior is :) A server receiving a request with invalid values shall just be rejected, but unsure if it mentions what happens if a client receives an invalid response

Thalley · 2024-12-19T10:47:24Z

I've changed my mind. All three options are fine by me.

I'll check the core spec if it mentions what the expected behavior is :) A server receiving a request with invalid values shall just be rejected, but unsure if it mentions what happens if a client receives an invalid response

The core spec does not specify any defined behavior for the GATT client it if receives invalid responses. The cleanest way to deal with this is, IMO, simply to disconnect. If only a part of the response is valid, we cannot assume that the other part is valid.

Thalley · 2024-12-19T13:41:48Z

@jhedberg @alwa-nordic (cc @Vudentz )

I'm not 100% which direction to take here, so would like some input from you.

Current we have:

If we receive an ATT response without having sent a request, we simply ignore it
If we receive an ATT response with invalid format (e.g. incorrect length), we call the application callback with attr == NULL
If we receive an ATT response with invalid handles (currently we only check for handle == 0), then we parse all handles up to the invalid handle (e.g. if we get [50, 100, 0], then the application callback is called for attr with handle 50 and 100, but for 0 we call it with attr == NULL and stop parsing

This seems to have been the behavior of ATT and GATT ever since it was added to Zephyr.

I suggest one of the 2 following solutions:
Solution 1: Strict

If we receive an ATT response without having sent a request, we disconnect
If we receive an ATT response with invalid format (e.g. incorrect length), we disconnect
If we receive an ATT response with invalid handles (0, or handles not in the range of the request), we disconnect

Overall, we just disconnect with any invalid response.

Solution 2: Relaxed

If we receive an ATT response without having sent a request, we simply ignore it
If we receive an ATT response with invalid format (e.g. incorrect length), we call the application callback with attr == NULL
If we receive an ATT response with invalid handles (currently we only check for handle == 0), then we parse all handles up to the invalid handles (0, or handles not in the range of the request), we call the application callback with attr == NULL without calling it with any valid handles in the response

Overall, we ignore what we can and always provide application a callback with attr == NULL on invalid responses.

jhedberg · 2024-12-19T15:15:36Z

@Thalley my initial feeling is that not disconnecting may result to better interoperability (or rather, user expereience) in case of buggy devices, in which case I'd be in favor of solution 2.

Thalley · 2024-12-21T11:50:45Z

@Thalley my initial feeling is that not disconnecting may result to better interoperability (or rather, user expereience) in case of buggy devices, in which case I'd be in favor of solution 2.

I was thinking the same, but then we cannot fix the issue raised by @JordanYates - We will just treat invalid responses similar to "Not found".

My concern is that if the server does provide invalid responses, and we treat them as "Not found" (during discovery), then how much can we trust the server at all? If we discover from handle 100 to 200 and we get back a characteristic at handle 50, can we trust that a read of handle e.g. 150 then the response is really the value of handle 150?

As mentioned the core spec does not specify any behavior here, so we have to either treat invalid responses as "Not found" or errors, and hope that future requests have valid responses, or we treat the entire remote GATT server (or the link) as invalid and disconnect.

github-actions · 2025-02-20T00:34:02Z

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

Thalley · 2025-03-06T08:28:54Z

I have not forgotten this btw, and will provide a PR for disconnecting ACL/GATT if we receive invalid responses (possibly guarded by a Kconfig)

JordanYates · 2025-09-12T03:14:21Z

I have not forgotten this btw, and will provide a PR for disconnecting ACL/GATT if we receive invalid responses (possibly guarded by a Kconfig)

Having attempted to drop these commits in my upgrade to Zephyr v4.2, I'm not convinced that actually solves my problem. Handling invalid responses is not the problem, the problem is the callback doesn't know whether discovery failed or whether the attribute doesn't exist. Discovery can fail for reasons other than an invalid response (the remote disconnecting) and unless you are planning to remove the callback being run on a disconnected connection (breaking the current model), it doesn't help the callback determine whether to continue or not.

Thalley · 2025-09-12T08:10:18Z

Discovery can fail for reasons other than an invalid response (the remote disconnecting) and unless you are planning to remove the callback being run on a disconnected connection (breaking the current model), it doesn't help the callback determine whether to continue or not.

In the case of a disconnect, then you can (and possibly should?) check the connection state of the provided bt_conn. As pointed out by my previous comment (#81432 (comment)), discover can only "fail" in 3 ways:

Invalid values (seems we decided to drop the connection on this, so this will never reach the application callback)
Attribute not found (attr == NULL)
Disconnect (either triggered from 1), or something else) - Verifiable via the connection state

Do you think we are still missing a case here?

JordanYates changed the title ~~241115 gatt disc err~~ bluetooth: gatt: add err param to discover cb Nov 15, 2024

zephyrbot added area: Bluetooth Qualification Bluetooth Qualification -related issues and pull requests area: Bluetooth area: Bluetooth Mesh area: Bluetooth Audio area: Bluetooth Controller area: Samples Samples area: Bluetooth Host Bluetooth Host (excluding BR/EDR) labels Nov 15, 2024

zephyrbot requested review from Casper-Bonde-Bose, MariuszSkamra, Thalley, alwa-nordic, asbjornsabo, fredrikdanebjer, hermabe, jhedberg, kruithofa, larsgk, niym-ot, pin-zephyr, rugeGerritsen, sjanc and theob-pro November 15, 2024 09:39

zephyrbot assigned jhedberg and alwa-nordic Nov 15, 2024

JordanYates force-pushed the 241115_gatt_disc_err branch 4 times, most recently from e9317a1 to d4afa91 Compare November 15, 2024 10:40

bluetooth: update bt_gatt_discover_func_t usage

29803c5

Update all instances of `bt_gatt_discover_func_t` in the tree. Signed-off-by: Jordan Yates <[email protected]>

JordanYates force-pushed the 241115_gatt_disc_err branch from baa1f66 to 29803c5 Compare November 15, 2024 12:23

Thalley reviewed Nov 15, 2024

View reviewed changes

Thalley mentioned this pull request Jan 8, 2025

Bluetooth: ATT: Disconnect ATT and ACL when receiving inval rsp #83222

Merged

github-actions bot added the Stale label Feb 20, 2025

github-actions bot closed this Mar 6, 2025

github-project-automation bot moved this from In Review to Done in Bluetooth LE Audio Mar 6, 2025

bluetooth: gatt: add err param to discover cb #81432

bluetooth: gatt: add err param to discover cb #81432

Uh oh!

Conversation

JordanYates commented Nov 15, 2024

Uh oh!

JordanYates commented Nov 15, 2024

Uh oh!

Thalley commented Nov 15, 2024

Uh oh!

JordanYates commented Nov 15, 2024

Uh oh!

JordanYates commented Nov 15, 2024

Uh oh!

Thalley commented Nov 15, 2024

Uh oh!

alwa-nordic commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alwa-nordic commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Thalley Nov 15, 2024

Choose a reason for hiding this comment

Uh oh!

JordanYates Nov 15, 2024

Choose a reason for hiding this comment

Uh oh!

Thalley Nov 17, 2024

Choose a reason for hiding this comment

Uh oh!

JordanYates Nov 17, 2024

Choose a reason for hiding this comment

Uh oh!

Thalley Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

Thalley Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

JordanYates commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alwa-nordic commented Nov 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Thalley commented Dec 12, 2024

Uh oh!

JordanYates commented Dec 12, 2024

Uh oh!

Thalley commented Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alwa-nordic commented Dec 13, 2024

Uh oh!

Thalley commented Dec 16, 2024

Uh oh!

alwa-nordic commented Dec 16, 2024

Uh oh!

Thalley commented Dec 16, 2024

Uh oh!

Thalley commented Dec 19, 2024

Uh oh!

Thalley commented Dec 19, 2024

Uh oh!

jhedberg commented Dec 19, 2024

Uh oh!

Thalley commented Dec 21, 2024

Uh oh!

github-actions bot commented Feb 20, 2025

Uh oh!

Thalley commented Mar 6, 2025

Uh oh!

JordanYates commented Sep 12, 2025

Uh oh!

Thalley commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

bluetooth: gatt: add `err` param to discover cb #81432

bluetooth: gatt: add `err` param to discover cb #81432

alwa-nordic commented Nov 15, 2024 •

edited

Loading

alwa-nordic commented Nov 15, 2024 •

edited

Loading

JordanYates commented Nov 15, 2024 •

edited

Loading

alwa-nordic commented Nov 16, 2024 •

edited

Loading

Thalley commented Dec 13, 2024 •

edited

Loading