Skip to content

account for RangeError: Array buffer allocation failed#303

Merged
jackyzha0 merged 6 commits intomainfrom
jackyzha0/account-for-failed-to-allocate
Apr 2, 2025
Merged

account for RangeError: Array buffer allocation failed#303
jackyzha0 merged 6 commits intomainfrom
jackyzha0/account-for-failed-to-allocate

Conversation

@jackyzha0
Copy link
Copy Markdown
Member

@jackyzha0 jackyzha0 commented Apr 1, 2025

Why

function replicateArrayBufferError() {
  try {
    const size = 2 * 1024 * 1024 * 1024;
    const buf = new ArrayBuffer(size);
	conn.send(buf);
  } catch (e) {
    console.error(e);
  }
}

replicateArrayBufferError();
console.log('you can try catch this!!!')

tldr; a failed allocation can be try-caught by a top-level handler (e.g. in react) and river can pretend nothing went wrong and construct another message

by the time the next message comes around, its possible theres enough memory to allocate again and whoops!!! we skipped sending a message

some more context on this error: https://groups.google.com/a/chromium.org/g/blink-dev/c/pZ9kld0LehA/m/Ho-SCn8tBAAJ

What changed

  • assume codecs are fallible: wrap codec in an adapter (CodecMessageAdapter) to try catch toBuffer and fromBuffer operations
  • delete the session if codec fails, this should gc the message buffer for that session which should help memory pressure
  • we also expose ProtocolError.MessageSendFailure so the consumer can decide to shut down the world in this case too
  • modify ws connection to not throw when sending on CLOSING or CLOSED states (safe to discard as we have the sendbuffer)
  • actually read the result of .send()
  • added test to prevent regression
  • delete unused messageFraming code (we axed UDS a while ago)

Versioning

  • Breaking protocol change
  • Breaking ts/js API change

@jackyzha0 jackyzha0 changed the title make client-side heartbeat timeout instead of interval based account for RangeError: Array buffer allocation failed Apr 1, 2025
@jackyzha0 jackyzha0 marked this pull request as ready for review April 1, 2025 22:38
@jackyzha0 jackyzha0 requested a review from a team as a code owner April 1, 2025 22:38
@jackyzha0 jackyzha0 requested review from blast-hardcheese and masad-frost and removed request for a team April 1, 2025 22:38
Base automatically changed from jackyzha0/timeout-instead-of-interval to main April 1, 2025 22:39
@jackyzha0 jackyzha0 force-pushed the jackyzha0/account-for-failed-to-allocate branch from 703a7c9 to d179a97 Compare April 1, 2025 22:40

return true;
} catch {
return false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how i feel about absolutely swallowing everything here. Should we log the error?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only ever fails for one reason, sending on CONNECTING state, imo its fine? connection doesnt have direct access to a logger and i think threading it through just for this doesnt super make sense

we can alternatively trigger this.ws.onerror and close

import { DeserializeResult, SerializeResult } from '../transport/results';
import { coerceErrorString } from '../transport/stringifyError';

export class CodecMessageAdapter {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should leave doc strings on all of our classes!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why an adapter instead of making the implementation conform to this

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codec is user substitutable and we cant expect every consumer to remember to try/catch toBuffer fromBuffer, so we do it at this layer

}

// transition
this.pendingSessions.delete(session);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the significance of moving this up? maybe worth a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the point where we 'consume' the pending session and turn it into a connected session

now that there is a fail point, we want to clear the old pending session earlier rather than later

Copy link
Copy Markdown
Member

@masad-frost masad-frost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIVE BIG BOOMS

💥
💥
💥
💥
💥

tiny comments

@jackyzha0 jackyzha0 merged commit 6575aa5 into main Apr 2, 2025
6 checks passed
@jackyzha0 jackyzha0 deleted the jackyzha0/account-for-failed-to-allocate branch April 2, 2025 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants