Skip to content

Conversation

ludvigsj
Copy link
Contributor

According to MshPrt 5.4.4, The Provisioner, upon receiving the Provisioning Failed PDU, shall assume that the provisioning failed and immediately disconnect the provisioning bearer.

Also changes the link close upon to success to use the prov_link_close helper function instead of doing it manually, as minor cleanup.

@PavelVPV
Copy link
Contributor

The Also part should be moved into a separate commit. Otherwise looks fine.

According to MshPrt 5.4.4, The Provisioner, upon receiving the
Provisioning Failed PDU, shall assume that the provisioning failed and
immediately disconnect the provisioning bearer.

Signed-off-by: Ludvig Jordet <[email protected]>
Changes the link close upon success to use the `prov_link_close` helper
function instead of doing it manually, as minor cleanup.

Signed-off-by: Ludvig Jordet <[email protected]>
@ludvigsj ludvigsj force-pushed the develop/prov_close_link_on_failed branch from 210402f to a93a15a Compare October 14, 2025 07:47
static void prov_failed(const uint8_t *data)
{
LOG_WRN("Error: 0x%02x", data[0]);
reset_state();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get why provisioner does not clear CDB, state flags and doesn't release Diffie-Hellman key pair upon receiving provisioning failed frame.
Seems incorrect to me.
It should be:

prov_fail(PROV_BEARER_LINK_STATUS_FAIL);
reset_state();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look in the prov_link_closed callback, the reset is done there.

Copy link
Contributor

@alxelax alxelax Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no guarantee this callback is called. For pb gatt this will never cause this callback at all.
Cleaning data in this callback is used mostly for unpredictable link closing like timeout Host for gatt communication and closing over API by application.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even simple case with pd adv will cause zombie node in CDB and heap leakage over mbedtls once mesh cannot find free adv structure, that might happen at any time and depends on customer configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset_state will be called after link is closed.
prov_link_close -> bearer->link_close:

For PB_ADV:
prov_link_close:pb_adv.c -> send Link Close PDU -> buf_sent -> close_link -> role->link_closed -> prov_link_closed:provisioner.c -> reset_state.

For PB-GATT:
prov_link_close:pb_gatt.c -> bt_conn_disconnect -> gatt_disconnected:pb_gatt_srv.c -> bt_mesh_pb_gatt_close:pb_gatt.c -> link_closed -> role->link_closed -> prov_link_closed:provisioner.c -> reset_state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm pretty sure this change breaks cleaning provisioner that works above rpr clent. When it receives link report "Closed by device" it doesn't clear cdb and keys anymore

Pretty sure -> can you elaborate because I don't really see this. As I showed above, for PB-Adv, reset_state is eventually called after sending Link Close PDU. For PB-GATT, Link is a regular BLE link and thus closed through BLE API, once it is closed, reset_state is called. I don't see how it can not be called. The only difference between old and new behavior is message sending (in case of PB-Adv) and link close (in case of PB-GATT).

Also, PB-GATT is not supported by RPR Client at the moment.

You described correctly, but this is applicable only to rpr server. Further rpr server handles pb_link_closed that will initiate link report with status BT_MESH_RPR_ERR_LINK_CLOSED_BY_DEVICE (https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/mesh/rpr_srv.c#L478)
Then rpr client receives it and calls closed callback for its own provisioner: https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/mesh/rpr_cli.c#L74
that will cause calling the same code but already on rpr client side: https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/mesh/provisioner.c#L698-L706
you removed reset functionality here and rely on rpr client bearer callback that is called for some reason again after this PR (it seems weird by self). But follow further:
https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/mesh/rpr_cli.c#L742-L750

this implementation does not assume any calbacks at all. Neither about success nor fail.
Finally, provisioner based on RPR Client gets zombie node and lost keys.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You described correctly, but this is applicable only to rpr server

Why? provisioner.c runs on RPR Client, not RPR server.

you removed reset functionality here and rely on rpr client bearer callback that is called for some reason again after this PR (it seems weird by self).

This I don't get. This discussion points to reset_state -> prov_link_close change in prov_failed callback which is called when Provisioning Failed PDU is received. I don't understand your sentence really, what does that is called for some reason again after this PR mean (where is again)?

In case of RPR, when Provisiong Failed PDU is received, RPR Client will with this new change have this call flow:
prov_link_close:provisioner.c -> pb_link_close:rpr_cli.c -> link_close -> link_timeout / handle_link_status -> bearer.cb->link_closed which is prov_link_closed:prov.c -> role->link_closedwhich isprov_link_closed:provisioner.c->reset_state`.

Copy link
Contributor

@PavelVPV PavelVPV Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And note, this PR aligns behavior with other cases where processing provisioning PDU ends up in failure: any pdu handler --(error)--> prov_fail() -> prov_link_close(). No orphaned CDB entries in those cases, right?

Even error handling of bt_conn_disconnect call is not a big problem as there is protocol timeout. It is another problem that bt_conn_disconnect error is not handled in the timeout handler either:

/* If connection failed or timeout, not allow establish connection */
if (IS_ENABLED(CONFIG_BT_MESH_PB_GATT_CLIENT) &&
atomic_test_bit(bt_mesh_prov_link.flags, PROVISIONER)) {
if (link.conn) {
(void)bt_conn_disconnect(link.conn,
BT_HCI_ERR_REMOTE_USER_TERM_CONN);

But IMHO, this is a different problem and can be addressed in a separate task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave up to argue about this. I still think it is not a good idea to rely on Host reliability to clean up internal mesh resources. Will Host call it in 100% cases or will not? I do not know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Host call it in 100% cases or will not? I do not know.

It is guaranteed by Host to call disconnected callback, yes:

* This callback notifies the application that a connection
* has been disconnected.

If this doesn't work, then it is Host bug and should be fixed in Host, not in Mesh.

I agree with the point of that we should check the return value of bt_conn_disconnect. But still, IMO this can be done separately as it is neither handled here, nor in the timeout handler. I'd say it is up to @ludvigsj if @ludvigsj wants to fix this error handling in this PR or not. If not, lets just create a follow-up task.

Copy link

@ludvigsj ludvigsj requested a review from alxelax October 14, 2025 08:29
Copy link
Contributor

@PavelVPV PavelVPV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though this is not stated in contribution guidelines, it is generally better to start commit head with verb. Approving anyway.

{
LOG_WRN("Error: 0x%02x", data[0]);
reset_state();
prov_link_close(PROV_BEARER_LINK_STATUS_FAIL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess provisioner should analyze Error Code value before closing link.

5.4.4
...
When the Provisionee or the Provisioner receives a message with a field set to a value that is Prohibited
or with a bit set to 1 within a bitfield indicated as Prohibited, the provisioning protocol shall fail and the
message shall be treated as an error in the provisioning protocol.
...

If Error Code is not from Table 5.41 it should call error callback before closing link.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it will be prov_fail

@cfriedt cfriedt merged commit f998357 into zephyrproject-rtos:main Oct 15, 2025
29 checks passed
@ludvigsj ludvigsj deleted the develop/prov_close_link_on_failed branch October 17, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants