Skip to content

fix(base64): chunk Uint8Array encoding to prevent stack overflow for large inputs#59

Open
terminalchai wants to merge 1 commit intounjs:mainfrom
terminalchai:fix/base64-large-array-stack-overflow
Open

fix(base64): chunk Uint8Array encoding to prevent stack overflow for large inputs#59
terminalchai wants to merge 1 commit intounjs:mainfrom
terminalchai:fix/base64-large-array-stack-overflow

Conversation

@terminalchai
Copy link

@terminalchai terminalchai commented Mar 14, 2026

Closes #41

Problem

_base64Encode converts a Uint8Array to a base64 string with:

btoa(String.fromCodePoint(...data))

The spread operator passes every byte as a separate function argument. JavaScript engines have a hard limit on the number of arguments a function call can receive (~65 536 in V8/SpiderMonkey). For large inputs such as PDF pages or images, this reliably throws:

RangeError: Maximum call stack size exceeded

The bug only manifests in resource-constrained environments (small cloud containers, edge workers) where the engine's argument limit is hit sooner, which is why it works locally but crashes on DigitalOcean's smallest app tier.

Fix

Process the array in 65 535-byte chunks, building the binary string incrementally before passing it to btoa:

const CHUNK_SIZE = 0xffff;
let str = "";
for (let i = 0; i < data.length; i += CHUNK_SIZE) {
  str += String.fromCodePoint(...data.subarray(i, i + CHUNK_SIZE));
}
let encoded = btoa(str);

This is the standard pattern recommended by MDN for this exact scenario. It is fully cross-platform — no Buffer, no Node.js-only APIs — so it works in browsers, Deno, Cloudflare Workers, and Bun too.

Test

Added a regression test that encodes a 200 kB Uint8Array (well above the 65 535-arg threshold) and asserts:

  1. it does not throw
  2. the output length matches the expected base64 size for the input

Summary by CodeRabbit

  • New Features

    • Added a public function to encode Uint8Array data to base64.
  • Bug Fixes

    • Reworked the encoder to process large binary payloads safely, preventing stack/overflow issues and ensuring correct output length for very large inputs.

@coderabbitai
Copy link

coderabbitai bot commented Mar 14, 2026

📝 Walkthrough

Walkthrough

The _base64Encode implementation was changed to encode Uint8Array input in fixed-size chunks rather than using a single spread into String.fromCodePoint, and a test was added to ensure large arrays (>65535 bytes) encode without throwing and produce the expected base64 length.

Changes

Cohort / File(s) Summary
Base64 encoding implementation
src/data-types/_utils.ts
Replaced String.fromCodePoint(...data) approach with a chunked loop (CHUNK_SIZE) that concatenates code points per slice and then base64-encodes the result; preserves URL-safe and dataURL options.
Tests & public export
test/index.test.ts
Added uint8ArrayToBase64 usage and a test that constructs a 200,000-byte Uint8Array, asserts encoding does not throw, and checks resulting base64 length matches expected value.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibbled bytes in careful chunks,
No more spreads that made me flunk.
Big arrays hum, encodings light,
Hopped through base64 — sleep tight. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing base64 encoding by implementing chunking to prevent stack overflow on large inputs.
Linked Issues check ✅ Passed The PR implements the core requirement from #41: chunking the Uint8Array to avoid stack overflow with large inputs, while maintaining cross-platform compatibility without Node-only APIs.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the base64 encoding stack overflow issue and adding corresponding test coverage, with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@terminalchai terminalchai force-pushed the fix/base64-large-array-stack-overflow branch from 9bb0942 to 93c9b88 Compare March 14, 2026 20:54
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/index.test.ts (1)

125-133: Consider adding a round-trip assertion for stronger regression coverage.

The current check proves “no throw” and expected size, but not exact content correctness. Adding a decode-and-compare assertion would make this test much harder to false-pass.

✅ Suggested test enhancement
-it("should encode large Uint8Array (>65535 bytes) without stack overflow", () => {
+it("should encode large Uint8Array (>65535 bytes) without stack overflow", async () => {
   // String.fromCodePoint(...data) crashes when data.length exceeds the JS
   // engine's max argument count. Use a 200 kB array to reliably reproduce.
   const large = new Uint8Array(200_000).fill(42);
   expect(() => uint8ArrayToBase64(large, { dataURL: false })).not.toThrow();
   const encoded = uint8ArrayToBase64(large, { dataURL: false });
   // btoa of 200 000 bytes of 0x2A ('*') → predictable base64 output length
   expect(encoded.length).toBe(Math.ceil(200_000 / 3) * 4);
+  const decoded = await convertTo("Uint8Array", encoded, "Base64");
+  expect(decoded).toEqual(large);
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/index.test.ts` around lines 125 - 133, Add a round-trip assertion to
verify content correctness: after calling uint8ArrayToBase64(large, { dataURL:
false }) decode the resulting base64 back to bytes (e.g. Buffer.from(encoded,
"base64") in Node tests or a project helper like base64ToUint8Array if
available) and assert the decoded Uint8Array equals the original `large`; update
the test inside the same it block that references uint8ArrayToBase64 to include
this equality check to prevent false-positive passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/index.test.ts`:
- Around line 125-133: Add a round-trip assertion to verify content correctness:
after calling uint8ArrayToBase64(large, { dataURL: false }) decode the resulting
base64 back to bytes (e.g. Buffer.from(encoded, "base64") in Node tests or a
project helper like base64ToUint8Array if available) and assert the decoded
Uint8Array equals the original `large`; update the test inside the same it block
that references uint8ArrayToBase64 to include this equality check to prevent
false-positive passes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c5ab5a1-94a8-45ca-832c-c5e8d0d3c1f9

📥 Commits

Reviewing files that changed from the base of the PR and between 7d3f730 and 9bb0942.

📒 Files selected for processing (2)
  • src/data-types/_utils.ts
  • test/index.test.ts

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/data-types/_utils.ts`:
- Around line 19-23: The current chunking and concatenation in the loop is
unsafe and inefficient: reduce CHUNK_SIZE from 0xff_ff to 0x8000 and avoid
repeated string allocations by accumulating pieces in an array and joining once;
specifically, replace the loop that uses CHUNK_SIZE,
String.fromCodePoint(...data.subarray(...)) and str += ... with building an
array (e.g., parts.push(String.fromCodePoint(...data.subarray(i, i +
CHUNK_SIZE)))) and after the loop set str = parts.join('') so fromCodePoint
never receives too many arguments and string concatenation is optimized.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f72343ca-47dd-4e05-b332-eacd9ccc344e

📥 Commits

Reviewing files that changed from the base of the PR and between 9bb0942 and 93c9b88.

📒 Files selected for processing (2)
  • src/data-types/_utils.ts
  • test/index.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/index.test.ts

Comment on lines +19 to +23
const CHUNK_SIZE = 0xff_ff;
let str = "";
for (let i = 0; i < data.length; i += CHUNK_SIZE) {
str += String.fromCodePoint(...data.subarray(i, i + CHUNK_SIZE));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For converting large Uint8Array values to strings in JavaScript, what chunk size is broadly recommended across engines (V8, SpiderMonkey, JavaScriptCore) when using String.fromCodePoint/String.fromCharCode with spread or apply?

💡 Result:

A broadly used cross-engine “safe” chunk size is 0x8000 (32,768) elements per call when doing String.fromCharCode(...chunk) / .apply(null, chunk) (and similarly for fromCodePoint), because engines have an unspecified max-arguments limit (often failing somewhere in the “tens of thousands”), with JavaScriptCore explicitly noted as having a 65,536 argument cap, and MDN’s recommended hybrid strategy uses QUANTUM = 32768. [1], [2]

Sources: [1] MDN Function.prototype.apply() (argument-limit note + QUANTUM=32768 example), [2] MDN spread syntax (points to the same argument-length limit guidance).


🏁 Script executed:

# First, find and read the file
cat -n src/data-types/_utils.ts | head -40

Repository: unjs/undio

Length of output: 1349


🏁 Script executed:

# Search for tests of _base64Encode
rg "_base64Encode" --type ts --type js

Repository: unjs/undio

Length of output: 1673


🏁 Script executed:

# Search for test files
fd ".*test.*" -e ts -e js | head -20

Repository: unjs/undio

Length of output: 72


🏁 Script executed:

# Check the test file for tests related to _base64Encode or large data
cat -n test/index.test.ts | grep -A 5 -B 5 -i "base64\|chunk\|large"

Repository: unjs/undio

Length of output: 2753


Use a more conservative chunk size and optimize string concatenation.

At line 19, 0xff_ff (65,535) operates at the edge of JavaScript engines' argument limits—JavaScriptCore has an explicit 65,536 argument cap, leaving only 1-argument margin before failure. Additionally, the str += pattern in the loop causes repeated string allocations, creating unnecessary overhead for large payloads. Use 0x8000 (32,768) as the chunk size, which is the broadly recommended safe threshold across V8, SpiderMonkey, and JavaScriptCore, and collect results in an array to join at the end.

Proposed refactor
-  const CHUNK_SIZE = 0xff_ff;
-  let str = "";
+  const CHUNK_SIZE = 0x8000;
+  const parts: string[] = [];
   for (let i = 0; i < data.length; i += CHUNK_SIZE) {
-    str += String.fromCodePoint(...data.subarray(i, i + CHUNK_SIZE));
+    parts.push(String.fromCodePoint(...data.subarray(i, i + CHUNK_SIZE)));
   }
-  let encoded = btoa(str);
+  let encoded = btoa(parts.join(""));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/data-types/_utils.ts` around lines 19 - 23, The current chunking and
concatenation in the loop is unsafe and inefficient: reduce CHUNK_SIZE from
0xff_ff to 0x8000 and avoid repeated string allocations by accumulating pieces
in an array and joining once; specifically, replace the loop that uses
CHUNK_SIZE, String.fromCodePoint(...data.subarray(...)) and str += ... with
building an array (e.g., parts.push(String.fromCodePoint(...data.subarray(i, i +
CHUNK_SIZE)))) and after the loop set str = parts.join('') so fromCodePoint
never receives too many arguments and string concatenation is optimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

base64Encode causes RangeError: Maximum call stack size exceeded

1 participant