-
Notifications
You must be signed in to change notification settings - Fork 48
feat: add support for additional HTML tags in RichView #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add support for previously unsupported HTML tags: - Table support (table, thead, tbody, tfoot, tr, th, td) - Strikethrough (del, s, strike) - Underline (u, ins) - Superscript/subscript (sup, sub) - Mark/highlight (mark) - Definition lists (dl, dt, dd) - Semantic elements (abbr, cite, kbd, samp, var, small) - Figure elements (figure, figcaption) - Document structure (address, time, details, summary) - Container elements (article, section, nav, aside, header, footer, main, caption) Also added corresponding tests for all new HTML tags and strikethrough/highlight rendering in MarkdownRenderer. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive HTML tag support to the RichView component, enabling proper rendering of V2EX posts that contain tables and various semantic HTML elements. The implementation converts HTML to markdown format which is then rendered with appropriate styling.
Key changes:
- Implements table conversion from HTML to markdown format with support for thead/tbody
- Adds support for text formatting tags (strikethrough, underline, superscript/subscript, highlight)
- Adds semantic HTML element support (abbreviations, citations, keyboard input, etc.)
- Includes extensive unit tests for HTML to markdown conversion
- Adds performance tests for complex table rendering
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| V2er/Sources/RichView/Converters/HTMLToMarkdownConverter.swift | Adds conversion logic for tables, strikethrough, underline, superscript/subscript, mark/highlight, definition lists, and semantic elements to markdown format |
| V2er/Sources/RichView/Renderers/MarkdownRenderer.swift | Implements markdown table rendering with visual styling, adds strikethrough and highlight rendering support |
| V2erTests/RichView/HTMLToMarkdownConverterTests.swift | Comprehensive test coverage for all new HTML tag conversions including tables, formatting tags, semantic elements, and performance tests |
| // Underline - no standard markdown, render as emphasized text | ||
| case "u", "ins": | ||
| let content = try convertElement(childElement) | ||
| result += "_\(content)_" |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underline HTML tags (<u> and <ins>) are converted to markdown format _text_, but the MarkdownRenderer does not have logic to render this format with actual underline styling. The underscore format will be interpreted as italic (similar to *text*), not underline.
Consider either:
- Adding underline rendering support in
renderInlineMarkdown()method to handle_text_withunderlineStyle = .single - Using a different markdown format for underline that doesn't conflict with italic (e.g.,
<u>text</u>or custom markers)
| // Underline - no standard markdown, render as emphasized text | |
| case "u", "ins": | |
| let content = try convertElement(childElement) | |
| result += "_\(content)_" | |
| // Underline - no standard markdown, preserve as HTML <u> for custom renderer | |
| case "u", "ins": | |
| let content = try convertElement(childElement) | |
| result += "<u>\(content)</u>" |
| // Superscript/subscript - render with markers | ||
| case "sup": | ||
| let content = try convertElement(childElement) | ||
| result += "^\(content)" | ||
|
|
||
| case "sub": | ||
| let content = try convertElement(childElement) | ||
| result += "~\(content)" |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superscript and subscript HTML tags are converted to markdown format (^text and ~text), but the MarkdownRenderer does not have logic to render these formats. These will be rendered as plain text without any special styling.
Additionally, the subscript marker ~ could conflict with regular tildes in text (e.g., "~5 minutes ago"). Consider:
- Adding rendering support in
renderInlineMarkdown()for^and~markers - Using a more distinctive marker that's less likely to appear in regular text (e.g.,
^{text}or HTML-style<sup>text</sup>)
| // Superscript/subscript - render with markers | |
| case "sup": | |
| let content = try convertElement(childElement) | |
| result += "^\(content)" | |
| case "sub": | |
| let content = try convertElement(childElement) | |
| result += "~\(content)" | |
| // Superscript/subscript - render as HTML tags to preserve formatting | |
| case "sup": | |
| let content = try convertElement(childElement) | |
| result += "<sup>\(content)</sup>" | |
| case "sub": | |
| let content = try convertElement(childElement) | |
| result += "<sub>\(content)</sub>" |
| // Skip separator row (| --- | --- |) | ||
| if line.contains("---") { |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table separator detection using line.contains("---") could incorrectly skip rows that legitimately contain three consecutive hyphens in their cell content (e.g., a cell with "2020---2025" or "pre---post").
Consider a more precise check that verifies the line matches the separator pattern more strictly:
if line.range(of: #"^\|\s*---+\s*(\|\s*---+\s*)*\|$"#, options: .regularExpression) != nil {This ensures the line consists only of pipes and hyphens in the expected separator format.
| // Skip separator row (| --- | --- |) | |
| if line.contains("---") { | |
| // Skip separator row (| --- | --- | or with colons for alignment) | |
| if line.range(of: #"^\|\s*(:?-+:?)\s*(\|\s*(:?-+:?)\s*)*\|$"#, options: .regularExpression) != nil { |
| let tagName = cell.tagName().lowercased() | ||
| if tagName == "th" || tagName == "td" { | ||
| let content = try convertElement(cell) | ||
| .replacingOccurrences(of: "\n", with: " ") |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table cells containing pipe characters (|) are not escaped, which will break the markdown table structure. For example, a cell containing "option A | option B" would be split into multiple columns.
Consider escaping pipe characters in cell content before building the table:
let content = try convertElement(cell)
.replacingOccurrences(of: "\n", with: " ")
.replacingOccurrences(of: "|", with: "\\|") // Escape pipes
.trimmingCharacters(in: .whitespaces)| .replacingOccurrences(of: "\n", with: " ") | |
| .replacingOccurrences(of: "\n", with: " ") | |
| .replacingOccurrences(of: "|", with: "\\|") // Escape pipes for Markdown tables |
| var headerRowCount = 0 | ||
|
|
||
| // Get all rows from thead and tbody | ||
| let allRows = try element.select("tr") | ||
|
|
||
| for row in allRows { | ||
| var cells: [String] = [] | ||
| let thCells = try row.select("th") | ||
| let isHeaderRow = row.parent()?.tagName().lowercased() == "thead" | ||
| || !thCells.isEmpty() | ||
|
|
||
| // Get th and td cells | ||
| for cell in row.children() { | ||
| let tagName = cell.tagName().lowercased() | ||
| if tagName == "th" || tagName == "td" { | ||
| let content = try convertElement(cell) | ||
| .replacingOccurrences(of: "\n", with: " ") | ||
| .trimmingCharacters(in: .whitespaces) | ||
| cells.append(content) | ||
| } | ||
| } | ||
|
|
||
| if !cells.isEmpty { | ||
| rows.append(cells) | ||
| if isHeaderRow && headerRowCount == 0 { | ||
| headerRowCount = 1 | ||
| } |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The headerRowCount variable is incremented when a header row is detected but is never used to influence the table conversion logic. The separator is unconditionally added after the first row (line 378) regardless of whether a header was detected.
Consider either:
- Removing this unused variable and the
isHeaderRowlogic if it's not needed - Using it to control separator placement (though markdown tables always require a separator after the first row, so the current behavior may be intentional)
Code Coverage Report ❌Current coverage: 17.7% |
- Preserve <u> HTML tags for underline instead of converting to _text_ to avoid conflict with italic styling - Preserve <sup>/<sub> HTML tags for superscript/subscript instead of ^/~ markers to avoid conflicts with regular text - Use regex pattern for table separator detection to avoid false positives when cell content contains "---" - Escape pipe characters in table cells to prevent markdown table structure breakage - Remove unused headerRowCount variable and related code - Update tests to reflect new behavior - Add test for pipe escaping in table cells 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Code Coverage Report ❌Current coverage: 25.62% |
Summary
del,s,strike→~~text~~)u,ins→_text_)sup,sub)mark→==text==)dl,dt,dd)abbr,cite,kbd,samp,var,small)figure,figcaption)address,time,details,summary)article,section,nav,aside,header,footer,main,caption)This fixes the issue where V2EX posts containing tables (like https://www.v2ex.com/t/1176026) were not rendering correctly.
Test plan
🤖 Generated with Claude Code