-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Bluetooth: L2CAP: Queue packets when a segment could not be allocated #20661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is still crashing so more work is needed. |
|
@Vudentz Is this a continuation #20544 ? It is, but it took it quite a hard measure to transfer a payload that big without deadlocking, but the end result is that bt_l2cap_chan_send will no longer block anymore, it just queue the buffer if it could at least send 1 segment so that we can resume on TX complete. |
|
@jfischer-phytec-iot With this set I was able to ping -s 1024, Ive also added some tunning so it doesn't take forever to reply at the cost of increasing our frag_pool when NET_L2_BT is set. |
|
@Vudentz thanks, looks better. It does not lock anymore. I will test it for a while. |
subsys/bluetooth/host/conn.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference here? NET_BUF_POOL_DEFINE() is the same as NET_BUF_POOL_FIXED_DEFINE() in fact, you shouldn't use NET_BUF_POOL_DEFINE() anymore as it's essentially deprecated (even though it might not be formally marked so).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just for setting user_data size, the fixed doesn't seem to take the user_data size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the actual definition. All it does with it is a build assert. Ensuring the BT_BUF_USER_DATA_MIN is already covered by range 8 65535 if BT in subsys/net/Kconfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that still wasn't clear: since quite a while ago all net_buf objects share the same user data size that gets set using CONFIG_NET_BUF_USER_DATA_SIZE. I.e. you can't set it per-pool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok now I realize that it doesn't actually assume a 0 size with fixed version, will fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
21746b4 to
7ce7658
Compare
|
@Vudentz 7ce7658c936e688f9fac3dd05de94d43c017940e does not survive flood like The test which must be passed: #!/bin/bash
IP=$1
for (( n=0; n<1; n++ ))
do
for (( packetsize=0; packetsize<1500; packetsize=$packetsize+1 ))
do
echo "payload $packetsize"
ping6 -w 1 -i 0.002 -c 100 -s $packetsize -q $IP
done
done
|
So I managed to fix the lockup, though sometimes I experience crashes with qemu which I don't experience with native_posix, the backtrace points to buf.c:207 so I guess it is some sort of corruption. |
Fixed these issues, the channel will now shutdown and eventually be disconnected if credits cannot be sent due to congestion caused by flooding. |
|
Depends on #20951 |
|
@dleach02 Im not very sure why this was pushed back to 2.2 but there is a high chance that these changes will need to be backported to 2.1 even after the release because this also fixes the regression where TX errors, on bt_conn_send_cb, are not dealt with so IPSP will not work the as before. |
|
@jhedberg Is there anything blocking us to merge this now that 2.1 has been release? |
|
@joerchan Ive updated the comment regarding cleaning the outstanding tx_buf and a define for the RTX timeout. |
I think this one could be a changed to just the last returned now that BT_WARN is gone. |
Will fix that, this change was not intended actually so Im keeping the warning. |
|
@Vudentz Great. Sorry for the questions. I'm not that familiar with this part of the code :) |
When a segment could not be allocated it should be possible to resume sending it later once previous segments complete, the only exception is when there is no previous activity and we are unable to alocate even the very first segment which should indicate to the caller that it would block since that only happens on syswq the caller might need to defer to another thread or resubmit the work. Fixes zephyrproject-rtos#20640 Signed-off-by: Luiz Augusto von Dentz <[email protected]>
This enable chaning the function and line number making it easier to debug where a buffer allocation is blocking. Signed-off-by: Luiz Augusto von Dentz <[email protected]>
When NET_L2_BT the memory pressure for fragments can be quite high since that would be transfering IP packets which are considerable big so this makes our frag_pool to be of the same size as NET_BUF_TX_COUNT. Signed-off-by: Luiz Augusto von Dentz <[email protected]>
If NET_L2_BT is enabled we need enough acl_in_pool needs to be big enough to contain a full IP packet since that is no longer processed by RX thread buffer would be queued to syswq to reassemble the SDU. Signed-off-by: Luiz Augusto von Dentz <[email protected]>
This prevents disconnect request packets to not being sent due to lack of buffers normally caused by flooding or congestion. Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Drop packets received while disconnecting since they would most likely be flushed once peer respond there is no gain in keeping them on a queue. Signed-off-by: Luiz Augusto von Dentz <[email protected]>
This offloads the processing of tx_queue to a work so the callbacks calling resume don't start sending packets directly which can cause stack overflow. Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Packets shall never fail to be sent now that they are queued, so if an error occured there is no point in keep the channel connected. Signed-off-by: Luiz Augusto von Dentz <[email protected]>
This can actually block system critical threads like the syswq. Signed-off-by: Luiz Augusto von Dentz <[email protected]>
This introduces BT_L2CAP_STATUS_SHUTDOWN which is used to indicate when a channel has been shutdown. Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Now that bt_l2cap_send_cb can fail the buffer state needs to be save and restored otherwise the data stored on it would be lost. Signed-off-by: Luiz Augusto von Dentz <[email protected]>
This documents the special cases where -EAGAIN is returned which leads the buffer to be queued. Signed-off-by: Luiz Augusto von Dentz <[email protected]>
This adds a define for the so called RTX timeout. Signed-off-by: Luiz Augusto von Dentz <[email protected]>
When a segment could not be allocated it should be possible to resume
sending it later once previous segments complete, the only exception is
when there is no previous activity and we are unable to alocate even the
very first segment which should indicate to the caller that it would
block since that only happens on syswq the caller might need to defer to
another thread or resubmit the work.
Fixes #20640
Patch grow a little bit big but the overall impact in memory is quite small:
IPSP sample board=qemu_x86
Before:
make ram_report
make rom_report
After:
make ram_report
make rom_report
Signed-off-by: Luiz Augusto von Dentz [email protected]