fix: add buffered DMMF API to bypass V8 string length limit#5757
fix: add buffered DMMF API to bypass V8 string length limit#5757chris-tophers wants to merge 1 commit intoprisma:mainfrom
Conversation
Add get_dmmf_buffered(), read_dmmf_chunk(), and free_dmmf_buffer() to prisma-schema-wasm. These allow the DMMF JSON to be returned as chunked Uint8Array data instead of a single JS string, bypassing V8's hard limit of ~536MB (0x1fffffe8 characters). For schemas generating DMMF larger than ~536MB, the existing get_dmmf() throws 'Cannot create a string longer than 0x1fffffe8 characters' because wasm-bindgen converts the Rust String to a JS string. The new buffered API keeps the JSON as bytes in WASM linear memory and lets JS read chunks as Uint8Array (which has no such limit). Changes: - query-compiler/dmmf/src/lib.rs: add dmmf_json_bytes_from_validated_schema() using serde_json::to_vec() instead of to_string() - prisma-fmt/src/get_dmmf.rs: add get_dmmf_bytes() returning Vec<u8> - prisma-fmt/src/lib.rs: export get_dmmf_bytes() - prisma-schema-wasm/src/lib.rs: add 3 wasm_bindgen exports: - get_dmmf_buffered(params) -> byte count - read_dmmf_chunk(offset, length) -> Uint8Array - free_dmmf_buffer() Tested with a 1,600+ model schema producing 571MB DMMF: - Original get_dmmf: FAILS (V8 string limit) - Buffered API: PASSES (35 chunks x 16MB, streamed successfully) Fixes: prisma/prisma#29111
When prisma generate processes schemas that produce DMMF larger than ~536MB, the existing get_dmmf() WASM call fails with V8's hard-coded string length limit (0x1fffffe8 characters). This adds automatic fallback to the buffered DMMF API (get_dmmf_buffered + read_dmmf_chunk) which returns data as chunked Uint8Array, bypassing the V8 string limit. The fallback is transparent — it only activates when the V8 string limit error is detected, so there is no behavior change for schemas that work with the existing API. Companion to: prisma/prisma-engines#5757 Fixes: prisma#29111
Add a `get-dmmf` subcommand to the prisma-fmt CLI that streams DMMF JSON directly to stdout via serde_json::to_writer(). This approach has no memory ceiling — unlike WASM (limited to ~4GB linear memory), the native binary can stream arbitrarily large DMMF with only 1x peak memory (the in-memory DMMF struct, no serialized buffer). Changes: - dmmf crate: add dmmf_json_to_writer() using serde_json::to_writer() - prisma-fmt: add get_dmmf_to_writer() that validates + streams - prisma-fmt: export get_dmmf_to_writer() from lib.rs - prisma-fmt: add GetDmmf CLI variant, reads stdin params, streams to stdout Alternative approach to the buffered WASM API in prisma#5757. Fixes: prisma/prisma#29111 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Alternative approach submitted: #5761 I've also submitted a companion binary streaming approach as an alternative to this WASM buffered API. The key difference:
The binary approach uses Both approaches solve the immediate V8 string limit issue. The Prisma team can choose whichever fits best, or combine both (WASM buffered as primary fallback, binary streaming as secondary for schemas that exceed WASM32's ~4GB limit). Companion TypeScript PRs: |
| /// This allows JS to read the DMMF in chunks via Uint8Array, bypassing | ||
| /// V8's string length limit of ~536MB. | ||
| /// See: https://github.com/prisma/prisma/issues/29111 | ||
| static DMMF_BUFFER: Mutex<Vec<u8>> = Mutex::new(Vec::new()); |
There was a problem hiding this comment.
I think instead of using a static the API should allow allocating and freeing your own buffer (and passing it as an argument), so that there's no implicit global state
|
Thanks for submitting the PR! I think we can merge this change (along with the |
Merging this PR will not alter performance
Comparing Footnotes
|
Summary
Adds three new
wasm_bindgenexports toprisma-schema-wasmthat allow DMMF JSON to be returned as chunkedUint8Arraydata instead of a single JS string, bypassing V8's hard string length limit of0x1fffffe8characters (~536MB).get_dmmf_buffered(params) -> usize— serializes DMMF to an internal buffer usingserde_json::to_vec(), returns total byte countread_dmmf_chunk(offset, length) -> Vec<u8>— returns a chunk asUint8Array(no V8 string limit)free_dmmf_buffer()— releases the internal bufferThe existing
get_dmmf()is unchanged for backward compatibility.Root Cause
dmmf_json_from_validated_schema()inquery-compiler/dmmf/src/lib.rsusesserde_json::to_string(), producing a single RustString. Whenwasm-bindgenconverts this to a JS string across the WASM FFI boundary, V8 rejects it if the string exceeds0x1fffffe8characters. No Node.js flags can change this limit — it's a V8 engine constant.Changes
query-compiler/dmmf/src/lib.rsdmmf_json_bytes_from_validated_schema()usingserde_json::to_vec()prisma-fmt/src/get_dmmf.rsget_dmmf_bytes()returningVec<u8>prisma-fmt/src/lib.rsget_dmmf_bytes()prisma-schema-wasm/src/lib.rswasm_bindgenexports withMutex<Vec<u8>>buffer+76 lines, 4 files changed. No existing behavior modified.
Test Results
Tested with a production schema (1,600+ models, 1,100+ enums, 111K lines):
get_dmmf()Cannot create a string longer than 0x1fffffe8 charactersget_dmmf_buffered()+ chunked readLimitations
This fix addresses the V8 string length limit but has an upper bound: WASM32 linear memory is capped at ~4GB. For schemas producing DMMF larger than ~4GB, the buffered approach would
also fail. An alternative for extreme cases would be using the native schema engine binary path (which streams over stdio with no memory limit), but the buffered API should cover schemas
well into the tens of thousands of models.
Companion Change Required
The TypeScript side (
prisma/prisma—getDmmfWasminpackages/internals) needs to detect the V8 string limit error and fall back to the buffered API with a streaming JSON parser.Both this PR and the companion TypeScript PR are required for a complete fix. We will submit the companion PR to
prisma/prismaas well.Fixes: prisma/prisma#29111