- 
                Notifications
    You must be signed in to change notification settings 
- Fork 8.2k
Bluetooth: host: Fix ATT indication deadlock in RX thread #31184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bluetooth: host: Fix ATT indication deadlock in RX thread #31184
Conversation
| Thanks for the PR @joerchan. This is quite a big change, so I'd like to ask you, what kind of regression tests have you performed on it? | 
| 
 I have run the EDTT test suite locally, and reproduced and verified the issues I'm fixing. The PR can be split into smaller pieces, if that is preferable. | 
| 
 Sounds good, would you mind also running the throughput sample of nRF Connect SDK, as well as any others that might be applicable? I know that some of the samples there put the stack under heavy stress. 
 I don't see any particular reason to do that in this case. | 
| @Vudentz we'd like to try and get this in for 2.5, please take a look. | 
        
          
                subsys/bluetooth/host/gatt.c
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this assumption that the response cannot be bigger than then a single buffer is incorrect for EATT, EATT can in fact have channels which MTU is bigger then MPS which is why net_buf_append_bytes was used so it can allocate multiple buffers if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vudentz As far as I can tell the buf allocated is sized according to the MTU, not the MPS, and when sending the ATT PDU then L2CAP will handle splitting the MTU up into MPS sized fragment if needed. To me this looks like something that is handled properly and does not include GATT/ATT.
The reason why I removed the net_buf_append_bytes was because allocate_cb argument was set to NULL, which means that if this call would have had to allocate more buffers it would have crashed, because the allocate_cb is not checked for non-NULL.
And by not having the return value of net_buf_append_bytes the write_buf callback could be simplified since it does not have to handle failures at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joerchan no it is not allocated according to the remote MTU, in EATT MTUs are not required to be symmetric, that would mean that one cannot take advantage of having bigger MTU because the code can only use a single buffer.
That said if net_buf_append_bytes does require a callback we should provide one, @asbjornsabo this might explain our crashes with EATT with some platforms as they might have bigger MTU then our buffer size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should take care of the allocation callback being NULL: #31493
        
          
                subsys/bluetooth/host/att_internal.h
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it wouldn't be a better idea to push the resend logic to gatt.c then, since now we would need to callbacks back to it to encode the request again it might be better to have everything there and perhaps have something like bt_att_resend to prepend the request on the queue, anyway we can make this change later as well if we find it necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets just drop the commit that removes the use of net_buf_append_bytes then Im good with this set.
3b4b9c6    to
    329c2e9      
    Compare
  
    | @Vudentz I have put back the use of net_buf_append_bytes. I also found another issue where the buf for the ATT request was not always freed. In order to fix this issue I put released of the ATT request buffer into the  The changes since last view can bee looked at here: Regarding the use of net_buf_append_bytes, I still don't understand why this should be there, and I think maybe you misunderstood what I said? | 
329c2e9    to
    f0ef59c      
    Compare
  
            
          
                subsys/bluetooth/host/smp.c
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SMP_FLAG_KEYS_DISTR is set from bt_smp_encrypt_change() so this depends on order of registering channels
since channels are registered via BT_L2CAP_CHANNEL_DEFINE macro order seems to depend on linker action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into the finished zephyr.elf and this is the order.
<att_fixed_chan>:
 <le_fixed_chan>:
<smp_fixed_chan>:
I'm pretty sure that the order is alphabetical, because the we seem to rely on this behavior for the static GATT service declararions, e.g:
<_1_gatt_svc>:
<_2_gap_svc>:
The request for a higher security level is not done immediately in the att_encrypt_change handler, instead the att_encrypt_change handler retransmits the ATT request that trigger security escalation. And then on a new ATT error PDU with error indication higher security level required do we request the next security level. But this is happening before the key distribution is finished.
That being said I'm fine with taking this particular fix out and fixing it separately, because I'm fairly confident we have other cases where this would fail, for example ATT and L2CAP both initiating security at the same time, or with EATT on multiple enhanced ATT bearers at the same time.
This bug was only reproducible if both devices have the BT_SMP_ENFORCE_MITM option disabled (or the peer is acting in a similar way). This option is default on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough :-)
| 
 @carlescufi Done some smoke-tests with the throughput sample and the nrf desktop application, and didn't find any issues. | 
| 
 The chan.tx.mtu doesn't need to match by our L2CAP TX MTU, that is precisely why net_buf_append_bytes is used, though it looks like we could use net_buf_append_bytes in other places like: | 
        
          
                subsys/bluetooth/host/gatt.c
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't quite understand why are you skipping the offset? The original code used params->data directly, the offset is meant for the server where to start writing not for the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I changed this code was because data pointer should not be modified, otherwise the encode function cannot properly re-encode it when resending.
But you are right, my change would not work properly in the case where the application has set an offset.
I have fixed this now, the data pointer is manipulated, as well as the length and offset like it was in the original code, except it is now done once the prepare_write_rsp has been received.
There was also an off-by one error, and the EDTT tests did not catch this, I have tested it manually with the shell now.
Also added a fix to the shell for when the hex convert failed.
        
          
                subsys/bluetooth/host/gatt.c
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, we could in theory just encode when the request is actually about to be sent, meaning we don't have to allocate a buffer instead att.c would allocate one just as it does when resending and call write_buf, actually perhaps we should use the term encode instead of write_buf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the name to 'encode'. However the change to just encode once sent I think is best to leave until after the release.
        
          
                subsys/bluetooth/host/gatt.c
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be better off chaning bt_att_req_alloc to take all these parameters intead of having a wrapper on gatt.c that just initializes the structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why that wasn't possible at the moment was the bt_gatt_indicate. In order to make indications fit we will have to add a stack internal variable in the indication params to hold the resolved handle, since finding the handle is a rather complicated piece of code and should be kept out of the encode function IMO.
Fix gatt write command returned "write in progress" when either hex2bin or bt_gatt_write returned an error. The write_params.func should not be set if the write command was not successful. Signed-off-by: Joakim Andersson <[email protected]>
Fix the update keys check allowing to overwrite the keys when using debug keys. Instead the check disallowed overwriting keys made using debug keys. Signed-off-by: Joakim Andersson <[email protected]>
When ATT resends an ATT request it is sent as a "response" instead of as a request. This causes the ATT request buffer to be released and the ATT request cannot be resent one more time. This causes a problem when the ATT request requires authentication but the elevation of security is not enforcing MITM protection. In this case the ATT will first require security level 2 and then resend the request once this has been reached. This will lead to a new ATT error response and ATT will require security level L3. Signed-off-by: Joakim Andersson <[email protected]>
Allow to request a higher security level during the key distribution phase. This is required by ATT and L2CAP since they only react to the encrypt change event where they resend the current request. The current request might require a higher security level still and might have to request a higher security level before the pairing procedure has been finished. Signed-off-by: Joakim Andersson <[email protected]>
Refactor SMP to have a conn pointer where this pointer is used multiple times. Signed-off-by: Joakim Andersson <[email protected]>
ATT channels do support queueing buffer so it no longer need to block waiting the tx_sem besides the buffer allocation already serves the same purpose as the application will not be able to have more requests than there are buffers available. Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Fix indicate without func not working properly, when sent as a non-req by GATT this has two propblems: - The indicate would not be treated as a transaction, and back to back indicate would be sent without waiting for the confirm - The destroy callback would not be called on the indicate parameters since the indicate_rsp callback would not be called. Signed-off-by: Joakim Andersson <[email protected]>
Remove the ATT request destroy callback which is never assigned by any of the ATT requests. Signed-off-by: Joakim Andersson <[email protected]>
The ATT request buffers are held until the ATT response has been received. This means that the ATT request buffers are released by the RX thread, instead of the from the RX priority context of num_complete. This can cause a deadlock in the RX thread when we allocate buffers and all the available buffers are ATT requests, since the RX thread is the only thread that can release buffers. Release the ATT request buffers once they have been sent and instead handle ATT request resending by reconstructing the buffer from the GATT parameters. Also re-order the order of resource allocation by allocating the request context before the buffer. This ensures that we cannot allocate more buffers for ATT requests than there are ATT requests. Fixed a buf reference leak that could occur when the ATT request buffer has been allocated, but GATT returns an error before handing the responsebility of the buffer to ATT, for example when bt_att_req_alloc fails. This is fixed by moving the functionality of att_req_destroy to bt_att_req_free. Signed-off-by: Joakim Andersson <[email protected]>
f0ef59c    to
    21131e2      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fix deadlock in RX thread when ATT all buffers are taken by ATT requests and the RX thread blocks waiting for an buffer.
Solve this by not having a reference to the ATT request buffer until the ATT response is received, instead get a new ATT buffer in the case where it has to be resent.
Fixes: #28685
Fix ATT transaction violation in indications if the application has not set a callback function. In this case the indication was not treated as a transaction, leading to transaction violation.
Fix security elevation not working properly when the elevation is done in a two-step, this can happen when MITM is not enforced and security level 3 is required on the attribute.
Fix SMP debug keys overwrite handling not allowing to overwrite debug keys with debug keys, and allowing debug keys to overwrite existing bond information.