-
-
Notifications
You must be signed in to change notification settings - Fork 78
Try to set TCP_QUICKACK where support for that option is available #932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Warning Rate limit exceeded@bdraco has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 49 seconds before requesting another review. How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThe changes involve a modification to the Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
This probably makes sense now that we group the writes and don't have cases where we send a lot of writes anymore where waiting for them all to be acked would make send since most of the time the messages we are sending are small and they get sent together. |
We group on reconnect aioesphomeapi/aioesphomeapi/connection.py Line 465 in 87cb7c3
|
These might be a bit slower with this change though since they aren't grouped and we probably get 3 acks instead of 1 here |
@bdraco This fix addresses a behavior much further down the network stack at the TCP protocol layer, it doesn't have anything to do with messages being sent over that connection. If you're on the discord, DM me or start a thread and @ me (nkinnan) and I can explain better. |
@bdraco It will always be faster to ack immediately than to wait, because of how TCP is implemented on the ESP. While the ESP has sent a packet (or packets, comprising a single send buffer) out, it can not service any more events or messages into the send buffer until it gets an ack back on the packet(s) it already sent. The faster that ACK gets back, the faster it can release the buffer and start processing more data to send. |
@bdraco The primary limiting factor in ESP throughput over the API is ack delay. Max throughput in bytes/sec that the ESP can send out is (tcp_send_buffer_size_in_bytes * number_of_acks_per_second) since the buffer is locked until an ack comes back. And number_of_acks is limited by how fast the other end can send them back. By default systems will wait 50-200ms to send an ack, limiting the ESP to only being able to send 5-20 tcp_send_buffer's worth of data. Fortunately HAOS has TCP_QUICKACK set system-wide so it's already responding as fast as possible, which is lucky for us. But other OSes don't, and will be forced to communicate more slowly. Now if the API always sends back a message once it receives a message, that mitigates it somewhat as the TCP stack will piggyback an ack onto the outgoing data... And assuming TCP_NODELAY is set that will be immediate. This change just means that is no longer a requirement for full speed. |
I'm worried about additional Wi-Fi traffic with Bluetooth proxies. In this case the ESP is sending a group of raw advertisements every few ms to Home Assistant. If it can send multiple with only having to get back one ACK that this change will generate more traffic. If the ESP going to have wait for the ACK on every send than this change should not impact the amount of traffic. |
Currently, no matter how full or empty the tcp send buffer is, once send() is called, that buffer is locked until an ack comes back, and no further data can be sent. |
I'm not sure |
@bdraco I tested that, it's why there is a try/catch around it. In fact I linked a PR to cpython where I added support for it to the windows python runtime. But the try/catch handles the situation now before that eventually gets released. It is supported on all Linux runtimes. And I think MacOS but I didn't try it since the exception I'd get back if it wasn't would be the same as the one I get on windows anyway. |
I'm going to test this on my mac laptop as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with MacOS 👍
Confirmed its already turned on with HAOS 👍
Thanks @nkinnan
Thanks for getting it in so quickly :) |
Looking at network traces (on windows), the ESP responds with more data within 20ms to every TCP ACK of an API packet, but the python API consumer doesn't set the proper socket option to disable delayed ack and so windows delays for 40-50ms before sending an ack back.
Delayed ack waits for more data so multiple packets can be acked at the same time, which in most circumstances would be more efficient, except the esp has limited buffer space and must retain the tcp send buffer in case a retry is required if the packet doesn't get acked at all, so by disabling delayed ack on the other side of the API connection, packets can be acked immediately and this loop can be reduced to 1/2 to 1/3rd of it's current latency allowing 2-3x the send buffers to be dispatched over the TCP connection
This will result in less dropped states/events over the API
In testing on a HAOS virtualbox, I found that TCP_QUICKACK is already set system-wide and so this change is redundant there, but it is helpful for anywhere else the service might run which doesn't have that same default. It is also useful for the esphome commandline tool which uses the API to get logs after an OTA push.
Turns out this setting isn't supported in python on windows, only on linux, so I went ahead and fixed that too: python/cpython#123478