Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions e2e/react-start/basic/tests/charset-encoding.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import { expect } from '@playwright/test'
import { test } from './fixture'

test.describe('Charset Encoding', () => {
test('asserts charset meta tag appears before dehydration script', async ({
page,
}) => {
// Navigate to a server-rendered page to get the HTML with dehydration scripts
const response = await page.goto('/')
const html = await response?.text()

expect(html).toBeDefined()
if (!html) return

// Case-insensitive search for charset meta and TSR script
const htmlLower = html.toLowerCase()
const charsetIndex = htmlLower.search(/<meta\s+charset=(["'])utf-8\1/)
const tsrScriptIndex = htmlLower.search(/<script\s+class=(["'])\$tsr\1/)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden selectors: support optional quotes/spacing and avoid lowercasing.

Match <meta charset> with optional quotes/whitespace and allow other attributes; match the $TSR script class among other attributes. This removes the need to lowercase the HTML.

-    // Case-insensitive search for charset meta and TSR script
-    const htmlLower = html.toLowerCase()
-    const charsetIndex = htmlLower.search(/<meta\s+charset=(["'])utf-8\1/)
-    const tsrScriptIndex = htmlLower.search(/<script\s+class=(["'])\$tsr\1/)
+    // Case-insensitive search for charset meta and TSR script (robust to spacing/attr order)
+    const charsetRe = /<meta[^>]*\bcharset\s*=\s*(["'])?utf-8\1[^>]*>/i
+    const tsrRe = /<script[^>]*\bclass\s*=\s*(["'])\$tsr\1[^>]*>/i
+    const charsetIndex = html.search(charsetRe)
+    const tsrScriptIndex = html.search(tsrRe)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Case-insensitive search for charset meta and TSR script
const htmlLower = html.toLowerCase()
const charsetIndex = htmlLower.search(/<meta\s+charset=(["'])utf-8\1/)
const tsrScriptIndex = htmlLower.search(/<script\s+class=(["'])\$tsr\1/)
// Case-insensitive search for charset meta and TSR script (robust to spacing/attr order)
const charsetRe = /<meta[^>]*\bcharset\s*=\s*(["'])?utf-8\1[^>]*>/i
const tsrRe = /<script[^>]*\bclass\s*=\s*(["'])\$tsr\1[^>]*>/i
const charsetIndex = html.search(charsetRe)
const tsrScriptIndex = html.search(tsrRe)
🤖 Prompt for AI Agents
In e2e/react-start/basic/tests/charset-encoding.spec.ts around lines 15-19, the
current code lowercases the HTML and uses brittle exact-match regexes; replace
that with case-insensitive, attribute-aware regexes that do not mutate the HTML.
Specifically, search the raw html string with patterns that locate a <meta ...
charset=utf-8 ...> allowing optional quotes, optional whitespace, and other
attributes, and similarly locate a <script ... class=...> that contains the
token "$tsr" among other classes/attributes; use the regex i-flag and \b/[^>]*
style matching so selectors are robust to attribute order and spacing and you
can remove the html.toLowerCase() step.

// Both should exist in server-rendered HTML with dehydration
expect(charsetIndex).toBeGreaterThan(-1)
expect(tsrScriptIndex).toBeGreaterThan(-1)

// With the fix, the charset meta tag should now appear BEFORE the TSR dehydration script.
// This ensures correct character encoding and compliance with the HTML5 spec.
expect(charsetIndex).toBeLessThan(tsrScriptIndex)
})
})
44 changes: 32 additions & 12 deletions packages/router-core/src/ssr/transformStreamWithRouter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ const patternBodyStart = /(<body)/
const patternBodyEnd = /(<\/body>)/
const patternHtmlEnd = /(<\/html>)/
const patternHeadStart = /(<head.*?>)/
const patternHeadEnd = /(<\/head>)/
const patternCharset = /(<meta\s+charset=(["']?)[^"'>\s]+\2.*?>)/i
// regex pattern for matching closing tags
Comment on lines +27 to 29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Broaden charset detection to handle spacing and additional attributes.

Current pattern requires charset= to appear immediately after <meta and without spaces around =. Make it tolerant.

-const patternHeadEnd = /(<\/head>)/
-const patternCharset = /(<meta\s+charset=(["']?)[^"'>\s]+\2.*?>)/i
+const patternHeadEnd = /(<\/head>)/
+const patternCharset = /(<meta[^>]*\bcharset\s*=\s*(["'])?[^"'>\s]+\2[^>]*>)/i
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const patternHeadEnd = /(<\/head>)/
const patternCharset = /(<meta\s+charset=(["']?)[^"'>\s]+\2.*?>)/i
// regex pattern for matching closing tags
const patternHeadEnd = /(<\/head>)/
const patternCharset = /(<meta[^>]*\bcharset\s*=\s*(["'])?[^"'>\s]+\2[^>]*>)/i
// regex pattern for matching closing tags
🤖 Prompt for AI Agents
In packages/router-core/src/ssr/transformStreamWithRouter.ts around lines 27 to
29, the current patternCharset only matches when "charset=" immediately follows
"<meta " with no spaces or other attributes; update the regex to tolerate
arbitrary spacing around "=", allow other attributes before or after "charset",
and be case-insensitive so it matches tags like <meta http-equiv="X" charset =
"utf-8"> or <META charset='utf-8' data-x>. Use a pattern that finds a <meta ...>
element containing a charset attribute (e.g., allow any attributes, optional
quotes, \s* around =, word boundary for charset) and still captures the entire
meta tag for replacement. Ensure the new regex remains global/anchored as needed
and preserves existing capture groups used elsewhere.

const patternClosingTag = /(<\/[a-zA-Z][\w:.-]*?>)/g

Expand Down Expand Up @@ -98,6 +100,7 @@ export function transformStreamWithRouter(
let pendingClosingTags = ''
let bodyStarted = false as boolean
let headStarted = false as boolean
let headScriptInjected = false as boolean
let leftover = ''
let leftoverHtml = ''

Expand Down Expand Up @@ -181,18 +184,35 @@ export function transformStreamWithRouter(
}
}

if (!headStarted) {
const headStartMatch = chunkString.match(patternHeadStart)
if (headStartMatch) {
headStarted = true
const index = headStartMatch.index!
const headTag = headStartMatch[0]
const remaining = chunkString.slice(index + headTag.length)
finalPassThrough.write(
chunkString.slice(0, index) + headTag + getBufferedRouterStream(),
)
// make sure to only write `remaining` until the next closing tag
chunkString = remaining
if (!headScriptInjected && !bodyStarted) {
if (!headStarted) {
const headStartMatch = chunkString.match(patternHeadStart)
if (headStartMatch) {
headStarted = true
}
}

if (headStarted) {
const charsetMatch = chunkString.match(patternCharset)

if (charsetMatch) {
headScriptInjected = true
const index = charsetMatch.index! + charsetMatch[0]!.length
finalPassThrough.write(
chunkString.slice(0, index) + getBufferedRouterStream(),
)
chunkString = chunkString.slice(index)
} else {
const headEndMatch = chunkString.match(patternHeadEnd)
if (headEndMatch) {
headScriptInjected = true
const index = headEndMatch.index!
finalPassThrough.write(
chunkString.slice(0, index) + getBufferedRouterStream(),
)
chunkString = chunkString.slice(index)
}
}
}
}

Expand Down