Display tables from other clients as formatted text#6201
Open
olivierlambert wants to merge 1 commit intoelement-hq:developfrom
Open
Display tables from other clients as formatted text#6201olivierlambert wants to merge 1 commit intoelement-hq:developfrom
olivierlambert wants to merge 1 commit intoelement-hq:developfrom
Conversation
The wysiwyg library (io.element.android:wysiwyg) does not support <table> tags — its Safelist strips them during Jsoup.clean(), leaving only flattened text with no structure. This pre-processes the raw HTML *before* it reaches the wysiwyg parser, replacing <table> elements with <pre><code> blocks containing a pipe-based text representation that the wysiwyg library already renders as styled code blocks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
|
Thank you for your contribution! Here are a few things to check in the PR to ensure it's reviewed as quickly as possible:
|
Author
|
FYI, my goal is to provide a first idea/possibility to solve the fact tables aren't displayed correctly on Element-X (on my Android phone). I'm not experienced enough myself to provide a solution, it was done via Claude Code. I would understand if you are not willing to merge this, I'm simply hopeful it could bring some ideas or make things easier to solve that functional limitation. I tried my best that the result fits with the existing tests and code base. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Markdown tables sent from other clients (e.g. Element Web) arrive as HTML
<table>elements in theformatted_body. Currently, the wysiwyg library (io.element.android:wysiwygv2.41.1) silently strips these — leaving only flattened text with no structure whatsoever.This PR pre-processes tables into
<pre><code>blocks containing a pipe-based text representation:Why this approach
The core constraint: the wysiwyg Safelist
HtmlToDomParser.document()callsJsoup.clean()with a hardcodedSafelistthat only allows:a, b, strong, i, em, u, del, code, ul, ol, li, pre, blockquote, p, br. All<table>,<thead>,<tbody>,<tr>,<td>,<th>tags are stripped before the DOM is even constructed. This means any table-aware processing must happen beforeHtmlToDomParser.document()runs — not after.Why
<pre><code>as the output formatWe need a replacement that:
<pre>and<code>are allowed)<pre><code>satisfies all three. The wysiwyg library already renders these as styled code blocks with a monospace font, which is exactly what pipe-formatted tables need for proper column alignment.Why pre-process the HTML string rather than the DOM
The initial implementation called
dom.convertTablesToText()afterHtmlToDomParser.document(). This didn't work becauseJsoup.clean()(insideHtmlToDomParser) had already stripped all table tags. The fix is to:Jsoup.parse()(no safelist)<table>elements to<pre><code>in that DOMdoc.body().html()HtmlToDomParser.document()A fast-path check (
"<table" !in html) avoids the extra parse for the vast majority of messages that contain no tables.Separator style:
-+-vs|The separator line uses
-+-as the column joiner (e.g.------+------), which visually aligns the+with the|in data rows. This is intentional: in a monospace font, the+sits exactly under each|, giving a clean grid appearance.Header detection heuristic
<thead>exists → its rows are headers (separator placed after them)<tr>contains only<th>elements → treated as a single header rowThis covers the two common patterns: explicit
<thead>/<tbody>structure (Element Web) and simple<th>-first-row tables.Edge cases handled
.text()on cells naturally flattens nested contentjoinToString(" | ")on a single element produces no separator)element.text().trim()<tbody>→ jsoup always wraps bare<tr>elements in a<tbody>during parsing; the extraction logic handles this correctly through thetbody != nullbranchFiles changed
HtmlTableToText.kt— standaloneDocument.convertTablesToText()extensionToHtmlDocument.kt— pre-processes raw HTML beforeHtmlToDomParser.document()HtmlTableToTextTest.kt— 10 unit tests (simple table, thead, th detection, unequal cols, empty, surrounding content, multiple tables, single column, whitespace, integration)ToPlainTextTest.kt— 1 additional test for the plain-text pipelineLimitations and future considerations
toPlainText()path collapses formatting —PlainTextNodeVisitorusesTextNode.text()which normalizes whitespace, so the pipe table loses its newlines in plain-text output. The primary rendering path (HTML → wysiwyg) preserves formatting correctly.Test plan
HtmlTableToTextTest— 10 tests all passingToPlainTextTest— all tests passing including the new table testToHtmlDocumentTest— all existing tests still passing (no regression)🤖 Generated with Claude Code