Thinking Packets Instead of Messages #13

pha3z · 2023-02-22T03:48:07Z

pha3z
Feb 22, 2023

Hello again friend!

After a few years I have returned again to your wonderful suite of TCP libraries. This time more armed with knowledge than before!

So I have read numerous articles off-and-on about the performance-woes of TCP... and about how UDP comes with its own connectivity/firewall woes.

I found an book article proposing parallel TCP connections where each connection has only one single in-flight message at any given moment. The mechanism increases overall connection reliability and consistent improved latency. It can even be used with websockets, which is slick.
Here it is:
http://ithare.com/almost-zero-additional-latency-udp-over-tcp/

Building upon that, when a person begins to scrutinize all of the metrics for deliverability, the issue of max MTU sizing also comes up. If you send a TCP packet that exceeds an MTU size, it can cause problems. So what you end up wanting in a perfect world (where latency and reliability is prized foremost) is:

Application takes responsibility for framing its data into small enough messages that one message can be sent in one packet and is guaranteed not to be auto-fragmented in-flight by a network device.
Application can then guarantee when it performs a network send, the data is received all-or-nothing with maximum deliverability, and the application can continue to send other packets on parallel connections if it chooses to do so.

So I went back to looking at off-the-shelf TCP tooling and it looks like Caveman gets the closest to allowing some kind of implementation like this without having to go to bare .NET BCL tooling.

I wonder what are your thoughts on an implementation that accomplishes the objectives above?

jchristn · 2023-02-22T15:31:09Z

jchristn
Feb 22, 2023
Maintainer

Hi @pha3z you've definitely been doing your homework :)

In a general library, you really can't constrain the max message size to ensure it fits within a single MTU (plus framing)
It's really unfair to ask developers to implement their own framing and determine (at runtime) what the MTU is
TCP provides a (reasonable) mechanism for managing the give-and-take between buffer resources vs transmission

Earlier in my career I did a lot of work on application acceleration over wide area networks (https://a.co/d/hx9h0AD) where latency limits performance (protocol inefficiencies), TCP's inefficient give-and-take (sliding window and acknowledgements), and data redundancy hinder performance. The technology is since irrelevant in a high-bandwidth, low-latency cloud-first world, but the core learnings were:

Application developers shouldn't have to worry about the transport; writing to TCP (or something that appears to be TCP to the application) yields the most portability and platform compatibility
TCP is inefficient in that memory resources (buffers) are dictated by acknowledgements (selective or full), and this is compounded by either a) packet loss of an acknowledgement or b) latency in the exchange of an acknowledgement
Other TCP implementations could yield material value (larger buffers, more aggressive buffer replenishment/less dependency on acknowledgements, multiple underlay parallel connections) but no developer wants to own the underlay transport implementation, so it has to be provided in a transparent manner (see https://en.wikipedia.org/wiki/BIC_TCP)

Personally I'm not investing (much) time at all into TCP because several 1) framing and 2) overlay transport platforms are available (such as websockets, AMQP, etc). Websockets holds a ton of promise because of integrated framing and its ability to support HTTP/2 which uses parallel (semi-non-blocking) streams.

My personal best practices:

Avoid UDP at all costs (but in some apps, you simply can't, which is why I wrote the SimpleUdp library) due to lack of framing, firewall issues, and lack of segment delivery guarantees
Use something that already runs on top of TCP (websockets, AMQP, others) where possible, many people smarter than me building out these platforms
Use TCP (framed or raw) only where necessary

Looking over the link you sent:

Actually, the idea is dead simple:

We have N TCP connections between Client and Server

For each of these TCP connections – there is a “packet en route” flag. This flag is set to false after TCP connection is established.

On the sending side, whenever an UDP packet comes in – we’re sending it over one of the TCP connections which does not have the packet-en-route flag. Also, we set the packet-en-route flag for the TCP connection where we sent the packet.

Of course, we still need to wrap the packet to send it over TCP (to denote the packet size and/or boundary)

On the receiving side – whenever we get an incoming UDP-over-TCP packet – we send an app-level acknowledgement (as small as 1 byte) back to the sender, over the same TCP connection where we got the incoming message.

On the sending side, when we get this app-level acknowledgement – we reset the packet-en-route flag

This could be a nice proxy implementation, but 1) appears to still use TCP's inefficient congestion control and 2) even if the acknowledgement is tiny (1-byte), it still incurs latency. Multiple connections don't really solve the overall throughput issue unless you address congestion control (memory/buffer vs acknowledgement pacing) - see BIC-TCP, CUBIC TCP, HTCP, HSTCP, et al.

I guess I'd ask - what are you looking to build? And sorry for rambling...

Cheers, Joel

0 replies

pha3z · 2023-02-23T03:04:09Z

pha3z
Feb 23, 2023
Author

Wow, lots of thoughts Joel, thank you!

I'm not sure I wholely agree with your points. You said "The technology is since irrelevant in a high-bandwidth, low-latency cloud-first world..."
Mobile networks have major latency and reliability problems. Roaming in particular is especially bad. If you want to make a Mobile game for folks to play in a moving vehicle, they're going to get hammered by temporary outages, packet loss, and even hung connections.

I want to make a game that relies on timely deliver of short text updates. It is critical that the UI is able to show the latest messages as soon as it can get them -- latency is a killer. Furthermore, if any message is lost (does not reach the receiver), then this needs to be detected and resent (without head-of-line-blocking for the other messages) and when the receiver does get the message, the receiver should insert it in the proper visual order in a queue so that the user always sees messages in correct time placement -- even if they don't necessarily arrive when expected.

So its a situation where correct order is paramount but out-of-order should not inhibit timely delivery.

Honestly, I don't think my particular situation is so special. I can imagine a lot of real-world scenarios where you want a live feed that is as always as immediate-as-possible, but you still want the UI to display late deliveries to be in sent order instead of received order.

TCP seems like an obstacle in trying to achieve such a requirement. I also read that UDP is not a reliable choice if you want to serve a wide audience where some of your users are likely going to be on weird networks that won't even permit UDP, so UDP is out for me on this.

I was reading about how you need to consider switches like TCP_NODELAY on your socket to make sure nagle's algorithm doesn't buffer your data (because you have chosen to send a 1-byte message). This is critical if your intent is to send data immediately.

I think the main gist of what you're saying is the part about "A library shouldn't put the burden on the developer for framing his data." Fair enough. I suppose a scenario where the developer wants fine-grained control over messages because his intent is to sent lots of small, precisely crafted messages .. is a scenario in which a developer shouldn't be using a library! I am reading the .NET documentation on TcpListener.AcceptSocket() and the methods for Sockets and to me it seems very straight forward and like exactly what I want.

But I want to point out, I don't think my design motivation is a small class of problem. TCP is designed with the idea that the developer wants to think of his data as Streams or Arbitrary-Sized messages. In reality, those conditions themselves are a particular subclass.

Something interesting to me is that IPv6 doesn't even support packet fragmentation. Packets sent over IPv6 cannot exceed the End-2-End MTU size on any segment. Therefore, the data must be fully pre-fragmented into completely self-contained packet by the sender before invoking send(). From my vantage point, it looks like what TCP stack is supposed to offer me is the promise that it will chop the data into packets and then guarantee their delivery for me. That becomes problematic the moment I want some control over which packets are guaranteed and which ones aren't .... or which packets are guaranteed with order and which ones aren't. I am still trying to dig into the socket options -- which seems to be where the goodies are at.

0 replies

pha3z · 2023-02-23T03:16:24Z

pha3z
Feb 23, 2023
Author

I have a very good question.

In terms of bandwidth usage: is 1 packet = to 1 packet?

If I keep invoking socket.send() at some consistent pace, and I am always sending 1 byte of data, is that going to perform exactly the same as if I was sending 500 bytes of data per packet (assuming same number of packets sent at same rate. The only thing changing is the volume of bytes in each packet)?

0 replies

jchristn · 2023-02-23T03:54:35Z

jchristn
Feb 23, 2023
Maintainer

Hi @pha3z thanks for your thoughtful response. For displaying in sent order, I'd consider including that metadata in the message/data itself, that way it can be sorted on the receiver's side. Yes, you're right, mobile networks have much higher latency and more bandwidth constraints - I generally deal with server-side apps and heavier clients, which highlights the bias in my response :)

For your use case I'd definitely go with TCP. UDP would force you to handle delivery guarantees within your own code. TCP will guarantee in-order delivery of data to your app - that's the intent/purpose behind sequence and acknowledgement numbers. An app's attempt to read 4KB from the socket won't be served until that 4KB of data - based on its sequence numbers - are actually received into the receive buffer.

Re: your second question - bandwidth usage is measured in bits per second. A 1-byte payload is going to create less bandwidth usage than a 500-byte payload. In TCP/UDP (or heck, even IP/Ethernet) load testing, the higher the payload size, the fewer packets/sec you typically see over a given link, assuming you have the hardware available to saturate it. However, with higher payload size, even though it's fewer packets/sec you're going to see higher transmission rates. It makes sense, because each transmission requires one (or more) acknowledgements (TCP ACK, SACK), each occurring a one-way latency cost. With sufficient buffer memory, you may never encounter a situation where you can't "fill the pipe" because you have too many outstanding acknowledgements.

Going back to what you wrote:

I want to make a game that relies on timely deliver of short text updates. It is critical that the UI is able to show the latest messages as soon as it can get them -- latency is a killer. Furthermore, if any message is lost (does not reach the receiver), then this needs to be detected and resent (without head-of-line-blocking for the other messages) and when the receiver does get the message, the receiver should insert it in the proper visual order in a queue so that the user always sees messages in correct time placement -- even if they don't necessarily arrive when expected.

So its a situation where correct order is paramount but out-of-order should not inhibit timely delivery.

TCP should not allow your app to read 4KB (for example) of data if the next 4KB of data is not available. The example I'm about to share is not how TCP works, but assume 1 message = 1 send to the socket = 1 read from the socket (just to make it easy). If I send messages 1, 2, 3, ... 10, and message 4 is lost, my app will block until the sender's TCP implementation retransmits 4 back to me. If you have parallel connections and you multiplex across them, all bets are off, and you'd need some metadata (sent timestamp) in the message itself for sorting.

0 replies

pha3z · 2023-02-23T04:04:13Z

pha3z
Feb 23, 2023
Author

Thank you for your response!

You confirmed a lot of my understanding. The information about how bandwidth vs. packets-per-second works is very enlightening.

So TCP is in-order always, which is what I thought. It fits with the model of thinking about TCP as a "stream" of data. It makes sense when that's what you want.

Let's suppose I had this case:

I send messages 1, 2, 3, and 4. But I want to send each of them exactly 50 milliseconds apart.

If I'm using Socket.NoDelay = TRUE, will each message go out when I send it?? Or is the sender going to wait for an ACK for every single message before sending the next?

This is the area I get really confused on.

My desire is that I could send all four messages exactly 50 milliseconds apart and have all of them arrive as soon as possible -- out-of-order is fine. There should be no blocking just because one is delayed or lost. I could include a timestamp within the messages so the application can sort them as they arrive. In order to achieve this, Parallel connections seems like the natural solution.

0 replies

pha3z · 2023-02-23T04:14:07Z

pha3z
Feb 23, 2023
Author

I think I found the answer to my question from the last post.

Here:
https://stackoverflow.com/a/1863157/1402498

The responder said:

The TCP layer will keep resending the packet until it receives a successful ACK.

Send will block until this happens - SendAsync will not block, and you can continue processing other stuff while the TCP layer handles sending the packet.

0 replies

pha3z · 2023-02-23T04:22:36Z

pha3z
Feb 23, 2023
Author

Ok... So I knew I was probing into a problem area. And sure enough a 22-upvote question on SO talks about Delayed ACK issue.

https://stackoverflow.com/questions/22583941/what-is-the-workaround-for-tcp-delayed-acknowledgment

0 replies

jchristn · 2023-02-23T14:39:29Z

jchristn
Feb 23, 2023
Maintainer

Hi @pha3z going back to your question:

I send messages 1, 2, 3, and 4. But I want to send each of them exactly 50 milliseconds apart.

If I'm using Socket.NoDelay = TRUE, will each message go out when I send it?? Or is the sender going to wait for an ACK for every single message before sending the next?

This is the area I get really confused on.

We're getting into one (very important) implementation detail we haven't discussed yet - memory. When an app opens a socket, a certain amount of memory is allocated for both send and receive operations. The send memory is for data that the app wishes to send, and the receive memory is for data taken from the network that is awaiting read by the application.

When an app sends data, it is added to the send memory pool (buffer), and sent across the wire. That data is not removed from memory until the acknowledgement is received.

When data is received, it is added to the receive memory pool (buffer), and when an app reads it, an acknowledgement is sent (generally, this is not always the case, implementation specific), and the data is removed from the receive memory pool.

These memory pools provide two key capabilities: 1) they act as a temporary staging area (not durable) until either the data can be sent (send buffer) or can be read by the app (receive buffer).

The second function it serves is to inform congestion control on when the size of the memory pool, which impacts the TCP window size and other parameters, should be scaled up or down

So to your question: assuming the implementation feels it has levity to continue sending, it will, i.e. it will send 1, 2, 3, and 4 in succession. NoDelay typically informs the underlay TCP implementation to not wait for more data before sending (you can read more about the PSH flag on this topic too).

But if the memory pool is exactly the same size as a single message, then your .Send() implementation will block until the last message was sent and acknowledged.

As much as I hate to say it, ... it depends ... :)

0 replies

jchristn · 2023-02-23T14:40:13Z

jchristn
Feb 23, 2023
Maintainer

Hope you don't mind I'm moving this to a discussion btw

0 replies

Uh oh!

Thinking Packets Instead of Messages #13

Uh oh!

Uh oh!

pha3z Feb 22, 2023

Replies: 9 comments

Uh oh!

jchristn Feb 22, 2023 Maintainer

Uh oh!

pha3z Feb 23, 2023 Author

Uh oh!

Uh oh!

pha3z Feb 23, 2023 Author

Uh oh!

Uh oh!

jchristn Feb 23, 2023 Maintainer

Uh oh!

Uh oh!

pha3z Feb 23, 2023 Author

Uh oh!

pha3z Feb 23, 2023 Author

Uh oh!

pha3z Feb 23, 2023 Author

Uh oh!

jchristn Feb 23, 2023 Maintainer

Uh oh!

jchristn Feb 23, 2023 Maintainer

pha3z
Feb 22, 2023

jchristn
Feb 22, 2023
Maintainer

pha3z
Feb 23, 2023
Author

pha3z
Feb 23, 2023
Author

jchristn
Feb 23, 2023
Maintainer

pha3z
Feb 23, 2023
Author

pha3z
Feb 23, 2023
Author

pha3z
Feb 23, 2023
Author

jchristn
Feb 23, 2023
Maintainer

jchristn
Feb 23, 2023
Maintainer