Fix code viewer syntax highlighting corruption #21

ejc3 · 2025-11-28T05:49:19Z

Fixes HTML class attribute corruption in the code viewer.

The bug

The syntax highlighter was applying regex replacements sequentially, which caused later patterns to match content inside already-generated HTML class attributes. For example, "text-purple-400" would get matched by the number pattern, producing broken output like:

class="text-green-class="text-purple-400">400">"text"

The fix

Replaced sequential regex with a single-pass tokenizer that:

Matches all syntax patterns in one pass
Only applies highlighting to the matched tokens
Escapes HTML on matched content only

Also fixed escapeHtml in formatters.ts to not use document.createElement (fails in SSR).

Changes

web/app/components/CodeViewer.tsx: Single-pass tokenizer for syntax highlighting
web/app/utils/formatters.ts: SSR-safe HTML escaping

Copilot

Pull request overview

This PR fixes HTML class attribute corruption in the code viewer's syntax highlighting by replacing sequential regex replacements with a single-pass tokenizer. The issue occurred when later patterns matched content inside already-generated HTML attributes (e.g., matching "400" inside class="text-purple-400"). Additionally, the PR makes escapeHtml SSR-compatible by removing the document.createElement dependency.

Key changes:

Single-pass tokenization prevents regex from matching inside generated HTML attributes
SSR-safe HTML escaping using manual character replacement
Simplified formatLargeText with reduced complexity (though with some issues identified)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
`web/app/utils/formatters.ts`	Replaced DOM-based HTML escaping with SSR-compatible character replacement; simplified `formatLargeText` formatting logic
`web/app/components/CodeViewer.tsx`	Implemented single-pass tokenizer for syntax highlighting using combined regex pattern and iterative matching to prevent HTML corruption

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-03T16:20:16Z

web/app/utils/formatters.ts

  return escaped
-    // Preserve existing double line breaks
-    .replace(/\n\n/g, '<br><br>')
-    // Convert single line breaks to single <br> tags
+    // Preserve existing double line breaks as paragraph breaks
+    .replace(/\n\n/g, '</p><p class="mt-3">')
+    // Convert single line breaks to <br> tags
    .replace(/\n/g, '<br>')
-    // Format bullet points with modern styling
-    .replace(/^(\s*)([-*•])\s+(.+)$/gm, '$1<span class="inline-flex items-center space-x-2"><span class="w-1.5 h-1.5 bg-blue-500 rounded-full flex-shrink-0"></span><span>$3</span></span>')
-    // Format numbered lists with modern styling
-    .replace(/^(\s*)(\d+)\.\s+(.+)$/gm, '$1<span class="inline-flex items-center space-x-2"><span class="w-5 h-5 bg-blue-100 text-blue-700 rounded-full flex items-center justify-center text-xs font-semibold">$2</span><span>$3</span></span>')
-    // Format headers with better typography
-    .replace(/^([A-Z][^<\n]*:)(<br>|$)/gm, '<div class="font-semibold text-gray-900 mt-4 mb-2 border-b border-gray-200 pb-1">$1</div>$2')
-    // Format code blocks with better styling
-    .replace(/\b([A-Z_]{3,})\b/g, '<code class="bg-gradient-to-r from-gray-100 to-blue-50 border border-gray-200 px-2 py-0.5 rounded-md text-xs text-blue-700 font-mono font-medium">$1</code>')
-    // Format file paths and technical terms
-    .replace(/\b([a-zA-Z0-9_-]+\.[a-zA-Z]{2,4})\b/g, '<span class="bg-slate-100 text-slate-700 px-1.5 py-0.5 rounded text-xs font-mono border border-slate-200">$1</span>')
-    // Format URLs with modern link styling
-    .replace(/(https?:\/\/[^\s<]+)/g, '<a href="$1" class="text-blue-600 hover:text-blue-800 underline underline-offset-2 decoration-blue-300 hover:decoration-blue-500 transition-colors font-medium" target="_blank" rel="noopener noreferrer">$1</a>')
-    // Format quoted text
-    .replace(/^(\s*)([""](.+?)[""])/gm, '$1<blockquote class="border-l-4 border-blue-200 bg-blue-50 pl-4 py-2 my-2 italic text-gray-700 rounded-r">$3</blockquote>')
-    // Add proper spacing around paragraphs
-    .replace(/(<br><br>)/g, '<div class="my-4"></div>')
-    // Clean up any excessive spacing
-    .replace(/(<br>\s*){3,}/g, '<br><br>')
-    // Format emphasis patterns
-    .replace(/\*\*([^*]+)\*\*/g, '<strong class="font-semibold text-gray-900">$1</strong>')
-    .replace(/\*([^*]+)\*/g, '<em class="italic text-gray-700">$1</em>')
-    // Format inline code
-    .replace(/`([^`]+)`/g, '<code class="bg-gray-100 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono border border-gray-200">$1</code>');
+    // Format inline code (backticks)
+    .replace(/`([^`]+)`/g, '<code class="bg-gray-100 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">$1</code>')
+    // Format bold text
+    .replace(/\*\*([^*]+)\*\*/g, '<strong>$1</strong>')
+    // Format italic text
+    .replace(/\*([^*]+)\*/g, '<em>$1</em>')
+    // Wrap in paragraph
+    .replace(/^(.*)$/, '<p>$1</p>');


The final regex .replace(/^(.*)$/, '$1') will wrap the entire result in a single paragraph, creating malformed HTML. Since line 60 already adds  for double line breaks, this will produce invalid nesting like .......

Instead, the formatting should either:

Split by double newlines first, wrap each section in  tags, then apply the other replacements

Or wrap the entire result in a container div without the final paragraph wrap

Example: Input "Line1\n\nLine2" will become:
Line1Line2 (missing closing tag and invalid nesting)

Copilot · 2025-12-03T16:20:16Z