BREAKING-CHANGE Are Chunked Responses Being Handled Wrong?

A `Route` consists of a `Vec` of `CryptData` objects. Most of those `CryptData` objects, when decrypted, are `LiveHop`s, containing the public key of the destination, a `Payer` object, and a `Component` enum telling which actor in the target Node will be handling the data. However, the very last `CryptData` object in a `Route` can't be decrypted into a `LiveHop`, because it's actually just a single `u32` that's known as a Return Route ID.

When a `ClientResponsePayload_0v1` comes back to the originating Node from the exit Node (refer to `ProxyServer::handle_client_response_payload()`, we use the Return Route ID at the very end of the Route to look up the return route information in the Proxy Server. This return route information was stored when we sent the request that resulted in the `ClientResponsePayload_0v1`, so that we'd know what to do when the response came back, and when it was stored, we got an ID back, which was put in the `Route` at the end where we'd see it.

The return route information contains almost everything we need to pay what we owe to all the Nodes along the return route, including each of those Nodes' `RatePack`s; but we can't compute an actual amount of money until we know how much data they processed for us, which we find out when the `ClientResponsePayload_0v1` arrives. So after we use the Return Route ID in the `Route` to look up the return route information in the Proxy Server's `HashMap` (actually a `TtlHashMap`, more on that later), we can combine the two pieces of information and report the services consumed to the `Accountant`.

So here's the issue: when should the Proxy Server forget the return route information it has stored? When _does_ the Proxy Server forget the return route information it has stored? Are those two the same?

At a high level, here are the answers to the first question:
1. When we're done receiving the response to the request that resulted in the return route information being stored
1. When the stream over which the request was sent is terminated, even if no response has yet arrived
1. After a decent amount of time has elapsed, in case the server never responded or the response (or request) got lost along the way, so that we don't endlessly accumulate useless data in the Proxy Server

However, #1 is deceptive, in the face of chunked HTTP responses (and possibly things that protocols other than HTTP do). If we remove the RRI immediately after receiving the first chunk of response, then every other chunk will arrive and find no RRI, and so we will have no idea how to pay the return-route bill for it.

We're not sure whether #2 is implemented or not; we've looked briefly at the `StreamKey`-retirement code and haven't found anything about RRI, but we haven't searched exhaustively.

#3 is implemented by the fact that we're using a `TtlHashMap`, which keeps its data around for only two minutes, and then removes it. #3 may be taking care of #2 as well.

What we do know is that we're receiving too many logs like this one from NetFlix:
```
2025-05-25 23:13:02.380 Thd8: ERROR: ProxyServer: Can't report services consumed: received response with bogus return-route ID 2257 for client response. Ignoring
2025-05-25 23:13:02.404 Thd8: ERROR: ProxyServer: Can't report services consumed: received response with bogus return-route ID 2255 for client response. Ignoring
2025-05-25 23:13:02.448 Thd8: ERROR: ProxyServer: Can't report services consumed: received response with bogus return-route ID 2256 for client response. Ignoring
2025-05-25 23:13:02.504 Thd8: ERROR: ProxyServer: Can't report services consumed: received response with bogus return-route ID 2254 for client response. Ignoring
2025-05-25 23:13:03.195 Thd8: ERROR: ProxyServer: Can't report services consumed: received response with bogus return-route ID 2257 for client response. Ignoring
2025-05-25 23:13:03.221 Thd8: ERROR: ProxyServer: Can't report services consumed: received response with bogus return-route ID 2255 for client response. Ignoring
2025-05-25 23:13:03.266 Thd8: ERROR: ProxyServer: Can't report services consumed: received response with bogus return-route ID 2256 for client response. Ignoring
2025-05-25 23:13:03.336 Thd8: ERROR: ProxyServer: Can't report services consumed: received response with bogus return-route ID 2254 for client response. Ignoring
```
That's what happens when a response arrives with a Return Route ID in its `Route`, but the Proxy Server has forgotten the corresponding return route information. Notice that in these eight logs, there are four pairs of two. Each ID is being searched for twice in vain, with the two attempts occurring a little less than a second apart. This would tend to indicate that it's not just a random problem with late single responses arriving after the 120-second timeout; there's probably something deeper going on here.

__Task:__ Investigate the `ProxyServer` and related code and find out how return route information is _actually_ removed from the ProxyServer, and determine how the design should change to eliminate logs like the ones above. We should not use loads and loads of memory remembering RRI forever, but we should also be able to handle HTTP chunked responses and other protocols that send multiple response packets for a single request packet.

Do not fix the code as part of this card; create a card or series of cards to solve whatever problems you find.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BREAKING-CHANGE Are Chunked Responses Being Handled Wrong? #818

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BREAKING-CHANGE Are Chunked Responses Being Handled Wrong? #818

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions