Skip to content

Use String.prototype.isWellFormed() once it's widely available #1243

@timostamm

Description

@timostamm

Protobuf requires strings to be valid UTF-8. When serializing, we check strings via encodeUriComponent, which is far from ideal for performance.

String.prototype.isWellFormed is a suitable alternative. On Node.js, it shows significantly better performance, especially for longer strings:

$ node --version
v24.5.0
$ node ./t.ts encodeURIComponent 10
node ./t.ts isWellFormed 10
node ./t.ts encodeURIComponent 100
node ./t.ts isWellFormed 100
node ./t.ts encodeURIComponent 1000
node ./t.ts isWellFormed 1000

encodeURIComponent with string length 10: 77.23291699999999 ms
isWellFormed with string length 10: 16.621958 ms
encodeURIComponent with string length 100: 34.113417 ms
isWellFormed with string length 100: 3.5224170000000044 ms
encodeURIComponent with string length 1000: 29.431917 ms
isWellFormed with string length 1000: 0.5034169999999989 ms
Benchmark script
// t.ts
const type = process.argv[2];
let checkUtf8: (str: string) => boolean;
switch (type) {
  case "encodeURIComponent":
    checkUtf8 = function checkUtf8(str: string) {
      try {
        encodeURIComponent(str);
        return true;
      } catch (_) {
        return false;
      }
    };
    break;
  case "isWellFormed":
    checkUtf8 = function checkUtf8(str: string) {
      // @ts-expect-error
      return str.isWellFormed();
    };
    break;
  default:
    throw new Error("Unknown type: " + type);
}

const strLen = process.argv[3];
let strings: string[];
switch (strLen) {
  case "10":
    strings = new Array(1_000_000).fill("012345678¼");
    break;
  case "100":
    strings = new Array(100_000).fill("012345678¼".repeat(10));
    break;
  case "1000":
    strings = new Array(10_000).fill("012345678¼".repeat(100));
    break;
  default:
    throw new Error("Unknown strLen: " + strLen);
}

const start = performance.now();
for (const str of strings) {
  if (!checkUtf8(str) ) {
    throw new Error(`Unexpected invalid utf-8 ${str}`);
  }
}
const elapsed = performance.now() - start;
console.log(`${type} with string length ${strLen}: ${elapsed} ms`);

isWellFormed is not widely available yet, but it will be in April 2026. See the definition of "widely available" on MDN.

Related to: #333

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions