fix(base64): chunk Uint8Array encoding to prevent stack overflow for large inputs#59
Conversation
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
9bb0942 to
93c9b88
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/index.test.ts (1)
125-133: Consider adding a round-trip assertion for stronger regression coverage.The current check proves “no throw” and expected size, but not exact content correctness. Adding a decode-and-compare assertion would make this test much harder to false-pass.
✅ Suggested test enhancement
-it("should encode large Uint8Array (>65535 bytes) without stack overflow", () => { +it("should encode large Uint8Array (>65535 bytes) without stack overflow", async () => { // String.fromCodePoint(...data) crashes when data.length exceeds the JS // engine's max argument count. Use a 200 kB array to reliably reproduce. const large = new Uint8Array(200_000).fill(42); expect(() => uint8ArrayToBase64(large, { dataURL: false })).not.toThrow(); const encoded = uint8ArrayToBase64(large, { dataURL: false }); // btoa of 200 000 bytes of 0x2A ('*') → predictable base64 output length expect(encoded.length).toBe(Math.ceil(200_000 / 3) * 4); + const decoded = await convertTo("Uint8Array", encoded, "Base64"); + expect(decoded).toEqual(large); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/index.test.ts` around lines 125 - 133, Add a round-trip assertion to verify content correctness: after calling uint8ArrayToBase64(large, { dataURL: false }) decode the resulting base64 back to bytes (e.g. Buffer.from(encoded, "base64") in Node tests or a project helper like base64ToUint8Array if available) and assert the decoded Uint8Array equals the original `large`; update the test inside the same it block that references uint8ArrayToBase64 to include this equality check to prevent false-positive passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/index.test.ts`:
- Around line 125-133: Add a round-trip assertion to verify content correctness:
after calling uint8ArrayToBase64(large, { dataURL: false }) decode the resulting
base64 back to bytes (e.g. Buffer.from(encoded, "base64") in Node tests or a
project helper like base64ToUint8Array if available) and assert the decoded
Uint8Array equals the original `large`; update the test inside the same it block
that references uint8ArrayToBase64 to include this equality check to prevent
false-positive passes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3c5ab5a1-94a8-45ca-832c-c5e8d0d3c1f9
📒 Files selected for processing (2)
src/data-types/_utils.tstest/index.test.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/data-types/_utils.ts`:
- Around line 19-23: The current chunking and concatenation in the loop is
unsafe and inefficient: reduce CHUNK_SIZE from 0xff_ff to 0x8000 and avoid
repeated string allocations by accumulating pieces in an array and joining once;
specifically, replace the loop that uses CHUNK_SIZE,
String.fromCodePoint(...data.subarray(...)) and str += ... with building an
array (e.g., parts.push(String.fromCodePoint(...data.subarray(i, i +
CHUNK_SIZE)))) and after the loop set str = parts.join('') so fromCodePoint
never receives too many arguments and string concatenation is optimized.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f72343ca-47dd-4e05-b332-eacd9ccc344e
📒 Files selected for processing (2)
src/data-types/_utils.tstest/index.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- test/index.test.ts
| const CHUNK_SIZE = 0xff_ff; | ||
| let str = ""; | ||
| for (let i = 0; i < data.length; i += CHUNK_SIZE) { | ||
| str += String.fromCodePoint(...data.subarray(i, i + CHUNK_SIZE)); | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🌐 Web query:
For converting large Uint8Array values to strings in JavaScript, what chunk size is broadly recommended across engines (V8, SpiderMonkey, JavaScriptCore) when using String.fromCodePoint/String.fromCharCode with spread or apply?
💡 Result:
A broadly used cross-engine “safe” chunk size is 0x8000 (32,768) elements per call when doing String.fromCharCode(...chunk) / .apply(null, chunk) (and similarly for fromCodePoint), because engines have an unspecified max-arguments limit (often failing somewhere in the “tens of thousands”), with JavaScriptCore explicitly noted as having a 65,536 argument cap, and MDN’s recommended hybrid strategy uses QUANTUM = 32768. [1], [2]
Sources: [1] MDN Function.prototype.apply() (argument-limit note + QUANTUM=32768 example), [2] MDN spread syntax (points to the same argument-length limit guidance).
🏁 Script executed:
# First, find and read the file
cat -n src/data-types/_utils.ts | head -40Repository: unjs/undio
Length of output: 1349
🏁 Script executed:
# Search for tests of _base64Encode
rg "_base64Encode" --type ts --type jsRepository: unjs/undio
Length of output: 1673
🏁 Script executed:
# Search for test files
fd ".*test.*" -e ts -e js | head -20Repository: unjs/undio
Length of output: 72
🏁 Script executed:
# Check the test file for tests related to _base64Encode or large data
cat -n test/index.test.ts | grep -A 5 -B 5 -i "base64\|chunk\|large"Repository: unjs/undio
Length of output: 2753
Use a more conservative chunk size and optimize string concatenation.
At line 19, 0xff_ff (65,535) operates at the edge of JavaScript engines' argument limits—JavaScriptCore has an explicit 65,536 argument cap, leaving only 1-argument margin before failure. Additionally, the str += pattern in the loop causes repeated string allocations, creating unnecessary overhead for large payloads. Use 0x8000 (32,768) as the chunk size, which is the broadly recommended safe threshold across V8, SpiderMonkey, and JavaScriptCore, and collect results in an array to join at the end.
Proposed refactor
- const CHUNK_SIZE = 0xff_ff;
- let str = "";
+ const CHUNK_SIZE = 0x8000;
+ const parts: string[] = [];
for (let i = 0; i < data.length; i += CHUNK_SIZE) {
- str += String.fromCodePoint(...data.subarray(i, i + CHUNK_SIZE));
+ parts.push(String.fromCodePoint(...data.subarray(i, i + CHUNK_SIZE)));
}
- let encoded = btoa(str);
+ let encoded = btoa(parts.join(""));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/data-types/_utils.ts` around lines 19 - 23, The current chunking and
concatenation in the loop is unsafe and inefficient: reduce CHUNK_SIZE from
0xff_ff to 0x8000 and avoid repeated string allocations by accumulating pieces
in an array and joining once; specifically, replace the loop that uses
CHUNK_SIZE, String.fromCodePoint(...data.subarray(...)) and str += ... with
building an array (e.g., parts.push(String.fromCodePoint(...data.subarray(i, i +
CHUNK_SIZE)))) and after the loop set str = parts.join('') so fromCodePoint
never receives too many arguments and string concatenation is optimized.
Closes #41
Problem
_base64Encodeconverts aUint8Arrayto a base64 string with:The spread operator passes every byte as a separate function argument. JavaScript engines have a hard limit on the number of arguments a function call can receive (~65 536 in V8/SpiderMonkey). For large inputs such as PDF pages or images, this reliably throws:
The bug only manifests in resource-constrained environments (small cloud containers, edge workers) where the engine's argument limit is hit sooner, which is why it works locally but crashes on DigitalOcean's smallest app tier.
Fix
Process the array in 65 535-byte chunks, building the binary string incrementally before passing it to
btoa:This is the standard pattern recommended by MDN for this exact scenario. It is fully cross-platform — no
Buffer, no Node.js-only APIs — so it works in browsers, Deno, Cloudflare Workers, and Bun too.Test
Added a regression test that encodes a 200 kB
Uint8Array(well above the 65 535-arg threshold) and asserts:Summary by CodeRabbit
New Features
Bug Fixes