-
-
Notifications
You must be signed in to change notification settings - Fork 756
Description
Even the most basic windows-1252 which latin1 and ascii alias to:
const express = require('express')
const bodyParser = require('body-parser')
const app = express()
app.use(bodyParser.urlencoded())
app.use(bodyParser.json())
app.use(bodyParser.text())
app.use(function (req, res) {
res.setHeader('Content-Type', 'text/plain')
res.write('you posted:\n')
res.write(`${escape(req.body)}\n`)
res.end(String(req.body))
})
app.listen(8080, async () => {
const res = await fetch('http://localhost:8080/', {
method: 'POST',
headers: { 'content-type': 'text/plain; charset=windows-1252' },
body: Uint8Array.of(0x80, 0x81, 0x82, 0x83, 0x8d, 0x9e, 0x9f)
})
console.log(await res.text())
})Results in:
you posted:
%u20AC%uFFFD%u201A%u0192%uFFFD%u017E%u0178
€�‚ƒ�žŸ
But it should be €\x81‚ƒ\x8DžŸ instead, with no replacement chars
i.e. %u20AC%81%u201A%u0192%8D%u017E%u0178 escaped
See Encoding Standard: https://encoding.spec.whatwg.org/
All characters are mapped in https://encoding.spec.whatwg.org/index-windows-1252.txt, including 0x81 and 0x8D.
Same goes for other encodings: half of single-bytes are mapped incorrectly and contradict the spec: all of windows-* family except windows-1256, koi8-u, macintosh.
All of legacy multi-bytes that are supported also behave incorrectly
UTF-16 also behaves incorrectly:
const express = require('express')
const bodyParser = require('body-parser')
const app = express()
app.use(bodyParser.urlencoded())
app.use(bodyParser.json())
app.use(bodyParser.text())
app.use(function (req, res) {
res.setHeader('Content-Type', 'text/plain')
res.write('you posted:\n')
res.write(`Is well formed: ${req.body.isWellFormed()}\n`)
res.write(`${escape(req.body)}\n`)
res.end(String(req.body))
})
app.listen(8080, async () => {
const res = await fetch('http://localhost:8080/', {
method: 'POST',
headers: { 'content-type': 'text/plain; charset=utf-16le' },
body: Uint8Array.of(0, 0xd8, 0, 0xd8)
})
console.log(await res.text())
})Results in:
you posted:
Is well formed: false
%uD800%uD800
��
But per spec it should never produce non-well-formed strings and should instead have produced replacements chars, i.e. %uFFFD%uFFFD escaped
See spec: https://encoding.spec.whatwg.org/#shared-utf-16-decoder
This could have potential security impact
These decoders are enabled in the default configuration
The default utf-8 decoder never produces non-well-formed strings, but the client can force that by specifying utf-16 encoding, while per spec that shouldn't be possible (produced strings should be always well-formed)