avoid deadlock on insufficient param bytes in wrpc_runtime_wasmtime
#1045
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
wRPC deadlocks/hangs indefinitely when a client calls
invokeon a function viawrpc-runtime-wasmtime(ex:serve_function_shared) with insufficient encoded data for the expectedparamstype. This can occur when:Why this happens
in the
callfunction inwrpc-runtime-wasmtimeparamsas bytes over a stream (rx). These are untyped raw bytesparams_tyYou can see the reading from
rxin a loop according toparams_tyinsidecallhereHowever, if the contents of the stream doesn't match the expected params (ex: fewer params than expected, or there aren't enough bytes to finish the loop because maybe you passed the wrong type) this for loop will never terminate.
Why did this never get caught?
Other than it being an edge case, this bug only appears in the
wrpc-runtime-wasmtimecrate, while most tests useserve_values/invoke_values_blockingwhich do not go through this code path.Why this matters
This is a denial-of-service (DoS) vulnerability. Malicious actors can exploit this to exhaust server resources (connections, memory, file descriptors) by sending malformed requests that hang indefinitely, making the server unavailable to legitimate clients.
How should it be fixed?
Internally, read_value checks which type it should use, then tries to read it (ex: read_u8().await). This then calls poll_read on the
Incomingstream. If it's ever unable to parse the value because there aren't enough bytes left,poll_readdeadlocks in thePendingstate inside the ready! macro.Example trace
Some solutions that don't work:
read_u8(or similar functions) to fail, as there is no way forread_valueto know it's reached the end of the data, and certain types can have multiple encodings so it's hard for it to magically know when it's received enough data for certain data types.(wrong place to try to fix this)Possible solution A: modify
ingress/egressbehaviorGoal: somehow indicate we've reached the end of the data
calldoes not actually receive data from the network directly. Rather, the data is first read in by aningressin the server that processes the data first.Notably, the connection is established as follows:
a.
serve_function_sharedserves a connection viaConnb.
invokealso connects viaConnThey then communicate via an egress in the client to an ingress in the server.
It's possible there is a way to make the
ingresson the server side more aware of what is going on in the communication protocol so it can properly indicate the EOF (and maybe other protections like clients sending payloads beyond a size limit, timeouts, or whatever other solutions people prefer to protect their server) as currently it's just an infinite loop until it receives enough data (which in turn meanscallalso stays blocked waiting for more data from the ingress, which it never receives)Possible solution B (used by this PR): modify
poll_readInstead of trying to redesign the
ingress(which probably is too big of a decision for me to make), I instead implemented a heuristic inpoll_readto try to avoid deadlocksCore idea:
*this.has_read_databranch)Poll::Pendingno matter how many times we try and get data. We assume it's because no data is ever coming, so we error (*this.pending_count > 1branch)Note: this is not perfect, because on slow connections this could cause an error even though the data really was coming. I think this is not too likely, and I'd prefer to accidentally error on slow connections than deadlock on improperly formatted calls.