Skip to content

Conversation

@SebastienRabaudPaCotte
Copy link

From an improved BLE project on nRF5340, switch to new gen ULTRA7 (Meteor Lake soc integrating the bluetooth 5.4) and get issue on ble connections: disconnection with reason 35 (BT_HCI_ERR_LL_PROC_COLLISION) following the connection.

From BLUETOOTH CORE SPECIFICATION Version 5.3, seems that the central should reject instead of disconnect. This fixs the disconnection issue.

@github-actions
Copy link

github-actions bot commented Nov 7, 2024

Hello @SebastienRabaudPaCotte, and thank you very much for your first pull request to the Zephyr project!
Our Continuous Integration pipeline will execute a series of checks on your Pull Request commit messages and code, and you are expected to address any failures by updating the PR. Please take a look at our commit message guidelines to find out how to format your commit messages, and at our contribution workflow to understand how to update your Pull Request. If you haven't already, please make sure to review the project's Contributor Expectations and update (by amending and force-pushing the commits) your pull request if necessary.
If you are stuck or need help please join us on Discord and ask your question there. Additionally, you can escalate the review when applicable. 😊

Fixed comments from CI check and commit format.

Signed-off-by: SebastienRabaudPaCotte <[email protected]>
@cvinayak
Copy link
Contributor

cvinayak commented Nov 8, 2024

@thoh-ot and @erbr-ot is there is cross-over bug that this PR fixes? Seem there has not been any test coverage for this scenario?

@thoh-ot
Copy link
Contributor

thoh-ot commented Nov 8, 2024

Without an air-trace I cannot really comment on anything. The modified code block handles this part of the specification:

BLUETOOTH CORE SPECIFICATION Version 6.0 | Vol 6, Part B | Page 3212

5.3 Procedure collisions

Since LL Control PDUs are not interpreted in real time, collisions can occur where the Link Layer of the Central and the Link Layer of the Peripheral initiate incompatible procedures. Two procedures are incompatible in the following cases:
• The two procedures both involve an instant.

...

A device shall not initiate a procedure after responding to a PDU that had initiated an incompatible procedure until that procedure is complete.

If device initiates a procedure A and, while that procedure is not complete, receives a PDU from its peer that initiates an incompatible procedure B, then:

• If the peer has already sent at least one PDU as part of procedure A, the device should immediately exit the Connection State and transition to the Standby State.

INCOMPAT_RESERVED means that the locale procedure has already received a response PDU from it's peer (procedure A), the peer cannot initiate a new instant based procedure (procedure B). But as stated initially i would need an air-trace to see if that is the case.

@nordicjm
Copy link
Contributor

nordicjm commented Nov 8, 2024

Without an air-trace I cannot really comment on anything. The modified code block handles this part of the specification:

BLUETOOTH CORE SPECIFICATION Version 6.0 | Vol 6, Part B | Page 3212

5.3 Procedure collisions
Since LL Control PDUs are not interpreted in real time, collisions can occur where the Link Layer of the Central and the Link Layer of the Peripheral initiate incompatible procedures. Two procedures are incompatible in the following cases:
• The two procedures both involve an instant.
A device shall not initiate a procedure after responding to a PDU that had initiated an incompatible procedure until that procedure is complete.
If device initiates a procedure A and, while that procedure is not complete, receives a PDU from its peer that initiates an incompatible procedure B, then:
• If the peer has already sent at least one PDU as part of procedure A, the device should immediately exit the Connection State and transition to the Standby State.

INCOMPAT_RESERVED means that the locale procedure has already received a response PDU from it's peer (procedure A), the peer cannot initiate a new instant based procedure (procedure B). But as stated initially i would need an air-trace to see if that is the case.

This is their trace they provided on discord:
newUltra7BleDisconnectionIssue_07112024.zip

@thoh-ot
Copy link
Contributor

thoh-ot commented Nov 8, 2024

Without an air-trace I cannot really comment on anything. The modified code block handles this part of the specification:
BLUETOOTH CORE SPECIFICATION Version 6.0 | Vol 6, Part B | Page 3212

5.3 Procedure collisions
Since LL Control PDUs are not interpreted in real time, collisions can occur where the Link Layer of the Central and the Link Layer of the Peripheral initiate incompatible procedures. Two procedures are incompatible in the following cases:
• The two procedures both involve an instant.
A device shall not initiate a procedure after responding to a PDU that had initiated an incompatible procedure until that procedure is complete.
If device initiates a procedure A and, while that procedure is not complete, receives a PDU from its peer that initiates an incompatible procedure B, then:
• If the peer has already sent at least one PDU as part of procedure A, the device should immediately exit the Connection State and transition to the Standby State.

INCOMPAT_RESERVED means that the locale procedure has already received a response PDU from it's peer (procedure A), the peer cannot initiate a new instant based procedure (procedure B). But as stated initially i would need an air-trace to see if that is the case.

This is their trace they provided on discord: newUltra7BleDisconnectionIssue_07112024.zip

Unfortunately that's a HCI trace, so it will not show the relevant LL procedure details.

@SebastienRabaudPaCotte
Copy link
Author

Without an air-trace I cannot really comment on anything. The modified code block handles this part of the specification:
BLUETOOTH CORE SPECIFICATION Version 6.0 | Vol 6, Part B | Page 3212

5.3 Procedure collisions
Since LL Control PDUs are not interpreted in real time, collisions can occur where the Link Layer of the Central and the Link Layer of the Peripheral initiate incompatible procedures. Two procedures are incompatible in the following cases:
• The two procedures both involve an instant.
A device shall not initiate a procedure after responding to a PDU that had initiated an incompatible procedure until that procedure is complete.
If device initiates a procedure A and, while that procedure is not complete, receives a PDU from its peer that initiates an incompatible procedure B, then:
• If the peer has already sent at least one PDU as part of procedure A, the device should immediately exit the Connection State and transition to the Standby State.

INCOMPAT_RESERVED means that the locale procedure has already received a response PDU from it's peer (procedure A), the peer cannot initiate a new instant based procedure (procedure B). But as stated initially i would need an air-trace to see if that is the case.

This is their trace they provided on discord: newUltra7BleDisconnectionIssue_07112024.zip

Unfortunately that's a HCI trace, so it will not show the relevant LL procedure details.

Please find the full trace (with LL Proc), always with the same adapter but with the device de:1f:86:ad:c6:b4, but can't find protocol error. Hope this helps

newUltra7BleDisconnectionIssueFull_08112024.pcapng.zip

Fixed a trailling space from CI check.
Signed-off-by: SebastienRabaudPaCotte <[email protected]>
@cvinayak
Copy link
Contributor

@nashif Hi, is there a possibility to involve someone from Intel Bluetooth Controller team to help resolve this interoperability discussions?

@thoh-ot
Copy link
Contributor

thoh-ot commented Nov 11, 2024

Without an air-trace I cannot really comment on anything. The modified code block handles this part of the specification:
BLUETOOTH CORE SPECIFICATION Version 6.0 | Vol 6, Part B | Page 3212

5.3 Procedure collisions
Since LL Control PDUs are not interpreted in real time, collisions can occur where the Link Layer of the Central and the Link Layer of the Peripheral initiate incompatible procedures. Two procedures are incompatible in the following cases:
• The two procedures both involve an instant.
A device shall not initiate a procedure after responding to a PDU that had initiated an incompatible procedure until that procedure is complete.
If device initiates a procedure A and, while that procedure is not complete, receives a PDU from its peer that initiates an incompatible procedure B, then:
• If the peer has already sent at least one PDU as part of procedure A, the device should immediately exit the Connection State and transition to the Standby State.

INCOMPAT_RESERVED means that the locale procedure has already received a response PDU from it's peer (procedure A), the peer cannot initiate a new instant based procedure (procedure B). But as stated initially i would need an air-trace to see if that is the case.

This is their trace they provided on discord: newUltra7BleDisconnectionIssue_07112024.zip

Unfortunately that's a HCI trace, so it will not show the relevant LL procedure details.

Please find the full trace (with LL Proc), always with the same adapter but with the device de:1f:86:ad:c6:b4, but can't find protocol error. Hope this helps

newUltra7BleDisconnectionIssueFull_08112024.pcapng.zip

The trace doesn't contain any relevant connection information, only ADV and SCAN, are you sure this isn't an "Only advertising packets" trace?

@SebastienRabaudPaCotte
Copy link
Author

Without an air-trace I cannot really comment on anything. The modified code block handles this part of the specification:
BLUETOOTH CORE SPECIFICATION Version 6.0 | Vol 6, Part B | Page 3212

5.3 Procedure collisions
Since LL Control PDUs are not interpreted in real time, collisions can occur where the Link Layer of the Central and the Link Layer of the Peripheral initiate incompatible procedures. Two procedures are incompatible in the following cases:
• The two procedures both involve an instant.
A device shall not initiate a procedure after responding to a PDU that had initiated an incompatible procedure until that procedure is complete.
If device initiates a procedure A and, while that procedure is not complete, receives a PDU from its peer that initiates an incompatible procedure B, then:
• If the peer has already sent at least one PDU as part of procedure A, the device should immediately exit the Connection State and transition to the Standby State.

INCOMPAT_RESERVED means that the locale procedure has already received a response PDU from it's peer (procedure A), the peer cannot initiate a new instant based procedure (procedure B). But as stated initially i would need an air-trace to see if that is the case.

This is their trace they provided on discord: newUltra7BleDisconnectionIssue_07112024.zip

Unfortunately that's a HCI trace, so it will not show the relevant LL procedure details.

Please find the full trace (with LL Proc), always with the same adapter but with the device de:1f:86:ad:c6:b4, but can't find protocol error. Hope this helps
newUltra7BleDisconnectionIssueFull_08112024.pcapng.zip

The trace doesn't contain any relevant connection information, only ADV and SCAN, are you sure this isn't an "Only advertising packets" trace?

Sorry for the bad capture, but no filtering set except ble address for my nRf device (de:1f:86:ad:c6:b4), trace realized from wireshark 4.2.2. I've done the same trace from another pc with wireshark 3.2.3 and get much more data (with exactly same setting, profile and dongle) but no error observed on the trace. Hope this helps

connectIntelNusFailed_fromWireshark323.pcapng.zip

@thoh-ot
Copy link
Contributor

thoh-ot commented Nov 12, 2024

Much better trace.

Packet Summary

Using "C" for central device (NUS)
Using "P" for peripheral device (Zephyr)

Packet 116 (event 2): P->C: LL_PHY_REQ
Packet 117 (event 3): C->P: LL_PHY_UPDATE_IND Instant=10
Packet 125 (event 6): C->P: LL_PHY_REQ

Explanation

This is a case of procedure collision.

P starts a PHY Update procedure in packet 116 (event 2) with a LL_PHY_REQ

C responds in packet 117 (event 3) with a LL_PHY_UPDATE_IND that has Instant=10 (the point in time the PHY change takes effect). Until event 10 has been reached the PHY Update procedure is not complete.

C starts another PHY Update procedure in packet 125 (event 6).

5.3 Procedure collisions
Since LL Control PDUs are not interpreted in real time, collisions can occur where
the Link Layer of the Central and the Link Layer of the Peripheral initiate incompatible
procedures. Two procedures are incompatible in the following cases:
• The two procedures both involve an instant.

...

A device shall not initiate a procedure after responding to a PDU that had initiated an
incompatible procedure until that procedure is complete.

If device initiates a procedure A and, while that procedure is not complete, receives a
PDU from its peer that initiates an incompatible procedure B, then:
• If the peer has already sent at least one PDU as part of procedure A, the device
should immediately exit the Connection State and transition to the Standby State.

From the point of P's view the last paragraph is read like:

If device (P) initiates a procedure A (1st PHY Update) and, while that procedure (1st PHY Update) is not complete, receives a PDU (LL_PHY_REQ) from its peer (C) that initiates an incompatible procedure B (2nd PHY Update), then:
• If the peer (C) has already sent at least one PDU (LL_PHY_UPDATE_IND) as part of procedure A (1st PHY Update), the device (P) should immediately exit the Connection State and transition to the Standby State.

@cvinayak
Copy link
Contributor

Previously used workarounds #13396 (comment)

@SebastienRabaudPaCotte
Copy link
Author

Much better trace.

Packet Summary

Using "C" for central device (NUS) Using "P" for peripheral device (Zephyr)

Packet 116 (event 2): P->C: LL_PHY_REQ Packet 117 (event 3): C->P: LL_PHY_UPDATE_IND Instant=10 Packet 125 (event 6): C->P: LL_PHY_REQ

Explanation

This is a case of procedure collision.

P starts a PHY Update procedure in packet 116 (event 2) with a LL_PHY_REQ

C responds in packet 117 (event 3) with a LL_PHY_UPDATE_IND that has Instant=10 (the point in time the PHY change takes effect). Until event 10 has been reached the PHY Update procedure is not complete.

C starts another PHY Update procedure in packet 125 (event 6).

5.3 Procedure collisions
Since LL Control PDUs are not interpreted in real time, collisions can occur where
the Link Layer of the Central and the Link Layer of the Peripheral initiate incompatible
procedures. Two procedures are incompatible in the following cases:
• The two procedures both involve an instant.
...
A device shall not initiate a procedure after responding to a PDU that had initiated an
incompatible procedure until that procedure is complete.
If device initiates a procedure A and, while that procedure is not complete, receives a
PDU from its peer that initiates an incompatible procedure B, then:
• If the peer has already sent at least one PDU as part of procedure A, the device
should immediately exit the Connection State and transition to the Standby State.

From the point of P's view the last paragraph is read like:

If device (P) initiates a procedure A (1st PHY Update) and, while that procedure (1st PHY Update) is not complete, receives a PDU (LL_PHY_REQ) from its peer (C) that initiates an incompatible procedure B (2nd PHY Update), then: • If the peer (C) has already sent at least one PDU (LL_PHY_UPDATE_IND) as part of procedure A (1st PHY Update), the device (P) should immediately exit the Connection State and transition to the Standby State.

Thanks for the analysis, seems faulty intel chip.

@SebastienRabaudPaCotte
Copy link
Author

Previously used workarounds #13396 (comment)

Thanks for the workaround, seems to work. Wait to confirm before close the PR.

@SebastienRabaudPaCotte
Copy link
Author

Many thanks for the support, working fine disabling BT_AUTO_PHY_UPDATE, I close the pull request.

@cvinayak
Copy link
Contributor

@thoh-ot

From the point of P's view the last paragraph is read like:
If device (P) initiates a procedure A (1st PHY Update) and, while that procedure (1st PHY Update) is not complete, receives a PDU (LL_PHY_REQ) from its peer (C) that initiates an incompatible procedure B (2nd PHY Update), then:
• If the peer (C) has already sent at least one PDU (LL_PHY_UPDATE_IND) as part of procedure A (1st PHY Update), the device (P) should immediately exit the Connection State and transition to the Standby State.

Disabling the BT_AUTO_PHY_UPDATE is only a workaround for the symptom.

We are observing many products in market with the observed symptoms. For better interoperability, which is a reasonable workaround?:

  1. Do as in this PR, send REJECT_IND PDU (spec behavior change)
  2. "Assume" the peer initiated PHY_REQ being retransmitted, i.e. defer responding to the peer, pause reception until PHY update instant elapses.

Any thoughts?

@cvinayak cvinayak reopened this Nov 14, 2024
@github-actions
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Jan 14, 2025
@github-actions github-actions bot closed this Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants