Skip to content

Commit c77e7e4

Browse files
authored
Change output from names to labels; use new underlying package
Per discussions in jsdom/whatwg-encoding#22 (comment), whatwg-encoding will be deprecated and replaced with the `@exodus/bytes` package. Replace our usage of whatwg-encoding with that package. As a consequence, the outputs of this package are now encoding labels, not encoding names. (Practically speaking, they are now lowercased versions of what they were previously.) Additionally, this raises the minimum Node.js version requirements, since the `@exodus/bytes` package relies on JavaScript modules.
1 parent d655e5a commit c77e7e4

15 files changed

+49
-54
lines changed

.github/workflows/build.yml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,14 @@ jobs:
1212
fail-fast: false
1313
matrix:
1414
node-version:
15-
- 18
15+
# Explicitly test minimum Node.js versions. Keep in sync with package.json.
16+
- 20.19.0
1617
- 20
17-
- latest
18+
- 22.12.0
19+
- 22
20+
- 24.0.0
21+
- lts/* # currently 24
22+
- latest # currently 25
1823
steps:
1924
- uses: actions/checkout@v4
2025
- uses: actions/setup-node@v4

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,11 @@ const sniffedEncoding = htmlEncodingSniffer(htmlBytes);
1212

1313
The passed bytes are given as a `Uint8Array`; the Node.js `Buffer` subclass of `Uint8Array` will also work, as shown above.
1414

15-
The returned value will be a canonical [encoding name](https://encoding.spec.whatwg.org/#names-and-labels) (not a label). You might then combine this with the [whatwg-encoding](https://github.com/jsdom/whatwg-encoding) package to decode the result:
15+
The returned value will be an [encoding label](https://encoding.spec.whatwg.org/#names-and-labels), and in particular, the label which is a lowercased version of the encoding's name. You might then combine this with the [`@exodus/bytes`](https://github.com/ExodusOSS/bytes/) package to decode the result:
1616

1717
```js
18-
const whatwgEncoding = require("whatwg-encoding");
19-
const htmlString = whatwgEncoding.decode(htmlBytes, sniffedEncoding);
18+
const { TextDecoder } = require("@exodus/bytes");
19+
const htmlString = (new TextEncoder(sniffedEncoding)).decode(htmlBytes);
2020
```
2121

2222
## Options

lib/html-encoding-sniffer.js

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
"use strict";
2-
const whatwgEncoding = require("whatwg-encoding");
2+
const { getBOMEncoding, normalizeEncoding: labelToName } = require("@exodus/bytes/encoding-lite.js");
33

44
// https://html.spec.whatwg.org/#encoding-sniffing-algorithm
55
module.exports = (uint8Array, { transportLayerEncodingLabel, defaultEncoding = "windows-1252" } = {}) => {
6-
let encoding = whatwgEncoding.getBOMEncoding(uint8Array);
6+
let encoding = getBOMEncoding(uint8Array);
77

88
if (encoding === null && transportLayerEncodingLabel !== undefined) {
9-
encoding = whatwgEncoding.labelToName(transportLayerEncodingLabel);
9+
encoding = labelToName(transportLayerEncodingLabel);
1010
}
1111

1212
if (encoding === null) {
@@ -69,7 +69,7 @@ function prescanMetaCharset(uint8Array) {
6969
needPragma = true;
7070
}
7171
} else if (attrRes.attr.name === "charset") {
72-
charset = whatwgEncoding.labelToName(attrRes.attr.value);
72+
charset = labelToName(attrRes.attr.value);
7373
needPragma = false;
7474
}
7575
}
@@ -86,8 +86,8 @@ function prescanMetaCharset(uint8Array) {
8686
continue;
8787
}
8888

89-
if (charset === "UTF-16LE" || charset === "UTF-16BE") {
90-
charset = "UTF-8";
89+
if (charset === "utf-16le" || charset === "utf-16be") {
90+
charset = "utf-8";
9191
}
9292
if (charset === "x-user-defined") {
9393
charset = "windows-1252";
@@ -271,7 +271,7 @@ function extractCharacterEncodingFromMeta(string) {
271271
const nextIndex = string.indexOf(string[position], position + 1);
272272

273273
if (nextIndex !== -1) {
274-
return whatwgEncoding.labelToName(string.substring(position + 1, nextIndex));
274+
return labelToName(string.substring(position + 1, nextIndex));
275275
}
276276

277277
// It is an unmatched quotation mark
@@ -287,7 +287,7 @@ function extractCharacterEncodingFromMeta(string) {
287287
string.length :
288288
position + indexOfASCIIWhitespaceOrSemicolon + 1;
289289

290-
return whatwgEncoding.labelToName(string.substring(position, end));
290+
return labelToName(string.substring(position, end));
291291
}
292292

293293
function isSpaceCharacter(c) {

package-lock.json

Lines changed: 19 additions & 29 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,13 @@
1818
"lint": "eslint ."
1919
},
2020
"dependencies": {
21-
"whatwg-encoding": "^3.1.1"
21+
"@exodus/bytes": "^1.0.0"
2222
},
2323
"devDependencies": {
2424
"@domenic/eslint-config": "^3.0.0",
2525
"eslint": "^8.53.0"
2626
},
2727
"engines": {
28-
"node": ">=18"
28+
"node": "^20.19.0 || ^22.12.0 || >=24.0.0"
2929
}
3030
}
File renamed without changes.

test/fixtures/normal/charset-short-comment_ISO-8859-2.html renamed to test/fixtures/normal/charset-short-comment_iso-8859-2.html

File renamed without changes.

0 commit comments

Comments
 (0)