Skip to content

Conversation

nkinnan
Copy link
Contributor

@nkinnan nkinnan commented Aug 29, 2024

Looking at network traces (on windows), the ESP responds with more data within 20ms to every TCP ACK of an API packet, but the python API consumer doesn't set the proper socket option to disable delayed ack and so windows delays for 40-50ms before sending an ack back.

Delayed ack waits for more data so multiple packets can be acked at the same time, which in most circumstances would be more efficient, except the esp has limited buffer space and must retain the tcp send buffer in case a retry is required if the packet doesn't get acked at all, so by disabling delayed ack on the other side of the API connection, packets can be acked immediately and this loop can be reduced to 1/2 to 1/3rd of it's current latency allowing 2-3x the send buffers to be dispatched over the TCP connection

This will result in less dropped states/events over the API

In testing on a HAOS virtualbox, I found that TCP_QUICKACK is already set system-wide and so this change is redundant there, but it is helpful for anywhere else the service might run which doesn't have that same default. It is also useful for the esphome commandline tool which uses the API to get logs after an OTA push.

Turns out this setting isn't supported in python on windows, only on linux, so I went ahead and fixed that too: python/cpython#123478

Copy link
Contributor

coderabbitai bot commented Aug 29, 2024

Warning

Rate limit exceeded

@bdraco has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 49 seconds before requesting another review.

How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Commits

Files that changed from the base of the PR and between e5f6882 and ddae90e.

Walkthrough

The changes involve a modification to the _connect_socket_connect asynchronous method in the aioesphomeapi/connection.py file. A new code block attempts to set the TCP socket option TCP_QUICKACK to 1 to enable quicker acknowledgment of TCP packets. This addition is safeguarded with a try block to handle potential AttributeError exceptions, ensuring the code remains robust if the option is unsupported.

Changes

Files Change Summary
aioesphomeapi/connection.py Added code to set the TCP socket option TCP_QUICKACK to 1 within _connect_socket_connect, with error handling for unsupported systems.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@bdraco
Copy link
Member

bdraco commented Aug 29, 2024

This probably makes sense now that we group the writes and don't have cases where we send a lot of writes anymore where waiting for them all to be acked would make send since most of the time the messages we are sending are small and they get sent together.

@bdraco
Copy link
Member

bdraco commented Aug 29, 2024

We group on reconnect

msg_types.append(ConnectResponse)

@bdraco
Copy link
Member

bdraco commented Aug 29, 2024

These might be a bit slower with this change though since they aren't grouped and we probably get 3 acks instead of 1 here
https://github.com/home-assistant/core/blob/ff9937f942828c18c82e8e6ff66ca9f0004fcea8/homeassistant/components/esphome/manager.py#L536

@nkinnan
Copy link
Contributor Author

nkinnan commented Aug 29, 2024

@bdraco This fix addresses a behavior much further down the network stack at the TCP protocol layer, it doesn't have anything to do with messages being sent over that connection. If you're on the discord, DM me or start a thread and @ me (nkinnan) and I can explain better.

@nkinnan
Copy link
Contributor Author

nkinnan commented Aug 29, 2024

@bdraco It will always be faster to ack immediately than to wait, because of how TCP is implemented on the ESP. While the ESP has sent a packet (or packets, comprising a single send buffer) out, it can not service any more events or messages into the send buffer until it gets an ack back on the packet(s) it already sent. The faster that ACK gets back, the faster it can release the buffer and start processing more data to send.

@nkinnan
Copy link
Contributor Author

nkinnan commented Aug 29, 2024

@bdraco The primary limiting factor in ESP throughput over the API is ack delay. Max throughput in bytes/sec that the ESP can send out is (tcp_send_buffer_size_in_bytes * number_of_acks_per_second) since the buffer is locked until an ack comes back. And number_of_acks is limited by how fast the other end can send them back. By default systems will wait 50-200ms to send an ack, limiting the ESP to only being able to send 5-20 tcp_send_buffer's worth of data. Fortunately HAOS has TCP_QUICKACK set system-wide so it's already responding as fast as possible, which is lucky for us. But other OSes don't, and will be forced to communicate more slowly.

Now if the API always sends back a message once it receives a message, that mitigates it somewhat as the TCP stack will piggyback an ack onto the outgoing data... And assuming TCP_NODELAY is set that will be immediate. This change just means that is no longer a requirement for full speed.

@bdraco
Copy link
Member

bdraco commented Aug 29, 2024

I'm worried about additional Wi-Fi traffic with Bluetooth proxies. In this case the ESP is sending a group of raw advertisements every few ms to Home Assistant. If it can send multiple with only having to get back one ACK that this change will generate more traffic. If the ESP going to have wait for the ACK on every send than this change should not impact the amount of traffic.

@nkinnan
Copy link
Contributor Author

nkinnan commented Aug 29, 2024

Currently, no matter how full or empty the tcp send buffer is, once send() is called, that buffer is locked until an ack comes back, and no further data can be sent.

@bdraco
Copy link
Member

bdraco commented Aug 29, 2024

I'm not sure TCP_QUICKACK is going to be defined on all platforms we support. Will need to check that.

@nkinnan
Copy link
Contributor Author

nkinnan commented Aug 29, 2024

@bdraco I tested that, it's why there is a try/catch around it. In fact I linked a PR to cpython where I added support for it to the windows python runtime. But the try/catch handles the situation now before that eventually gets released. It is supported on all Linux runtimes. And I think MacOS but I didn't try it since the exception I'd get back if it wasn't would be the same as the one I get on windows anyway.

@bdraco
Copy link
Member

bdraco commented Aug 29, 2024

I'm going to test this on my mac laptop as well

Copy link
Member

@bdraco bdraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with MacOS 👍
Confirmed its already turned on with HAOS 👍

Thanks @nkinnan

@bdraco bdraco merged commit ec08c49 into esphome:main Aug 29, 2024
@nkinnan
Copy link
Contributor Author

nkinnan commented Aug 29, 2024

Thanks for getting it in so quickly :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants