-
-
Notifications
You must be signed in to change notification settings - Fork 233
Description
When minifying an SVG file that contains elements with
4-byte UTF-8 characters (U+20000 and above) in the unicode attribute, the output gets corrupted.
Environment:
The issue is reproducible via the Deno (2.7.1) bindings, but not via the CLI.
It was introduced in version 2.24.8 — version 2.24.7 does not have this issue.
To reproduce:
Minify the attached SVG file using the following Deno code:
// problem (2.24.8)
import { minify } from "npm:@tdewolff/minify@2.24.8";
const svg = Deno.readTextFileSync("test.svg");
const minified = await minify("image/svg+xml", svg);
console.log(minified);The following works correctly:
import { string } from "npm:@tdewolff/minify@2.24.7";
const svg = Deno.readTextFileSync("test.svg");
const minified = await string("image/svg+xml", svg);
console.log(minified);The output of 2.24.8 will be corrupted around the following attribute:
unicode="𨮓"Observed behavior:
The issue seems to be related to the byte offset of the 4-byte character within the file:
- Adding or removing 3 or more characters before unicode="𨮓" makes the error disappear
- Adding characters after unicode="𨮓" (e.g. unicode="𨮓 aaaaaaaaa") does not fix the issue
- The issue cannot be reproduced with a single element in isolation. It only occurs when the overall file size puts the 4-byte character at a specific offset
This suggests the problem is tied to the specific byte position of the 4-byte UTF-8 character in the file, not the surrounding context.
Since the issue was introduced in 2.24.8, it may be related to changes made in that version.
Workaround:
Replacing the character with a numeric character reference before minifying avoids the issue:
unicode="𨦓"The SVG file that reproduces the issue is attached below.
Note that it is an SVG font file and will not render visually in a browser..