Skip to content

Conversation

@SebastienGllmt
Copy link
Contributor

@SebastienGllmt SebastienGllmt commented Jan 3, 2026

wRPC deadlocks/hangs indefinitely when a client calls invoke on a function via wrpc-runtime-wasmtime (ex: serve_function_shared) with insufficient encoded data for the expected params type. This can occur when:

  1. You send an insufficient number of params (ex: send one param instead of two)
  2. You send a param of the wrong type (that happens to have fewer bytes than the true type)

Why this happens

in the call function in wrpc-runtime-wasmtime

  1. The client sends params as bytes over a stream (rx). These are untyped raw bytes
  2. The server tries to parse the bytes according to the type it expects params_ty

You can see the reading from rx in a loop according to params_ty inside call here

However, if the contents of the stream doesn't match the expected params (ex: fewer params than expected, or there aren't enough bytes to finish the loop because maybe you passed the wrong type) this for loop will never terminate.

Why did this never get caught?

Other than it being an edge case, this bug only appears in the wrpc-runtime-wasmtime crate, while most tests use serve_values/invoke_values_blocking which do not go through this code path.

Why this matters

This is a denial-of-service (DoS) vulnerability. Malicious actors can exploit this to exhaust server resources (connections, memory, file descriptors) by sending malformed requests that hang indefinitely, making the server unavailable to legitimate clients.

How should it be fixed?

Internally, read_value checks which type it should use, then tries to read it (ex: read_u8().await). This then calls poll_read on the Incoming stream. If it's ever unable to parse the value because there aren't enough bytes left, poll_read deadlocks in the Pending state inside the ready! macro.

Example trace

TRACE ThreadId(31) read_value: reading struct field value i=0
TRACE ThreadId(31) ingress: reading path length
TRACE ThreadId(31) ingress: read path length n=0
TRACE ThreadId(31) ingress: reading data length
TRACE ThreadId(31) ingress: read data length n=1
TRACE ThreadId(31) ingress: reading data
TRACE ThreadId(31) ingress: read data buf=b"\x05"
TRACE ThreadId(31) read_value:read_value:poll_read: read buffer buf=[5]
TRACE ThreadId(31) read_value: reading struct field value i=1
TRACE ThreadId(31) read_value:read_value:poll_read: reading
TRACE ThreadId(31) read_value:read_value:poll_read: return=Pending
# ... hangs here indefinitely

Some solutions that don't work:

  1. We can't modify read_u8 (or similar functions) to fail, as there is no way for read_value to know it's reached the end of the data, and certain types can have multiple encodings so it's hard for it to magically know when it's received enough data for certain data types.(wrong place to try to fix this)
  2. We can't rely on the client to nicely end their stream or include any "end of stream" mark, because it wouldn't stop a malicious client from attacking our server as mentioned earlier (although it's not a necessarily a bad idea in order to have better errors in the happy path)
  3. We can't solely rely on a timeout, because that's also fragile on poor connections (it can be part of the solution, but ideally not the only part)

Possible solution A: modify ingress/egress behavior

Goal: somehow indicate we've reached the end of the data

call does not actually receive data from the network directly. Rather, the data is first read in by an ingress in the server that processes the data first.

Notably, the connection is established as follows:
a. serve_function_shared serves a connection via Conn
b. invoke also connects via Conn

They then communicate via an egress in the client to an ingress in the server.

It's possible there is a way to make the ingress on the server side more aware of what is going on in the communication protocol so it can properly indicate the EOF (and maybe other protections like clients sending payloads beyond a size limit, timeouts, or whatever other solutions people prefer to protect their server) as currently it's just an infinite loop until it receives enough data (which in turn means call also stays blocked waiting for more data from the ingress, which it never receives)

Possible solution B (used by this PR): modify poll_read

Instead of trying to redesign the ingress (which probably is too big of a decision for me to make), I instead implemented a heuristic in poll_read to try to avoid deadlocks

Core idea:

  1. If we keep data and then suddenly there is no data anymore. We assume it's not because of a slow connection and we've just run out of data to read, so we error. (*this.has_read_data branch)
  2. If we never received any data at all, so we're always just getting Poll::Pending no matter how many times we try and get data. We assume it's because no data is ever coming, so we error (*this.pending_count > 1 branch)

Note: this is not perfect, because on slow connections this could cause an error even though the data really was coming. I think this is not too likely, and I'd prefer to accidentally error on slow connections than deadlock on improperly formatted calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I wasn't sure where to put this test, so I just put it in a standalone file at the root of tests so it's easy to move in case you want it in a specific place

Comment on lines +385 to +388
if diff_len == 0 {
trace!("consumed empty frame, closing receiver");
self.as_mut().get_mut().rx.take();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: the old code did

if buf.filled().is_empty() {
            self.rx.take();
}

but theoretically, it's possible for buf to not be empty when this function is called. I think this is a subtle bug, so I updated it to check the diff (i.e. if we have new data) instead

@SebastienGllmt
Copy link
Contributor Author

SebastienGllmt commented Jan 7, 2026

I also think that similarly, wRPC does not error on extra bytes. This makes some sense from a stream approach (just stop reading data from the stream when you no longer need it), but is a bit awkward if you want to proactively avoid errors by ensuring you process everything the user sent. Probably we get a fix for this for free if we had a way to indicate a way to denote the end of the stream though (basically, if we finish parsing and we haven't seen an "end of stream", we error)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant