Display tables from other clients as formatted text by olivierlambert · Pull Request #6201 · element-hq/element-x-android

olivierlambert · 2026-02-13T12:31:30Z

Summary

Markdown tables sent from other clients (e.g. Element Web) arrive as HTML <table> elements in the formatted_body. Currently, the wysiwyg library (io.element.android:wysiwyg v2.41.1) silently strips these — leaving only flattened text with no structure whatsoever.

This PR pre-processes tables into <pre><code> blocks containing a pipe-based text representation:

Header A | Header B
---------+---------
Cell 1   | Cell 2
Cell 3   | Cell 4

Why this approach

The core constraint: the wysiwyg Safelist

HtmlToDomParser.document() calls Jsoup.clean() with a hardcoded Safelist that only allows: a, b, strong, i, em, u, del, code, ul, ol, li, pre, blockquote, p, br. All <table>, <thead>, <tbody>, <tr>, <td>, <th> tags are stripped before the DOM is even constructed. This means any table-aware processing must happen before HtmlToDomParser.document() runs — not after.

Why `<pre><code>` as the output format

We need a replacement that:

Survives the wysiwyg Safelist (both <pre> and <code> are allowed)
Is already rendered well by the wysiwyg library (styled monospace code block with background)
Preserves the visual structure of tabular data (alignment requires a monospace font)

<pre><code> satisfies all three. The wysiwyg library already renders these as styled code blocks with a monospace font, which is exactly what pipe-formatted tables need for proper column alignment.

Why pre-process the HTML string rather than the DOM

The initial implementation called dom.convertTablesToText() after HtmlToDomParser.document(). This didn't work because Jsoup.clean() (inside HtmlToDomParser) had already stripped all table tags. The fix is to:

Parse the raw HTML with Jsoup.parse() (no safelist)
Convert <table> elements to <pre><code> in that DOM
Serialize back to HTML via doc.body().html()
Pass the processed HTML to HtmlToDomParser.document()

A fast-path check ("<table" !in html) avoids the extra parse for the vast majority of messages that contain no tables.

Separator style: `-+-` vs `|`

The separator line uses -+- as the column joiner (e.g. ------+------), which visually aligns the + with the | in data rows. This is intentional: in a monospace font, the + sits exactly under each |, giving a clean grid appearance.

Header detection heuristic

If <thead> exists → its rows are headers (separator placed after them)
Otherwise, if the first <tr> contains only <th> elements → treated as a single header row
Otherwise → no header, no separator line

This covers the two common patterns: explicit <thead>/<tbody> structure (Element Web) and simple <th>-first-row tables.

Edge cases handled

Empty tables → removed from the DOM (no crash, no empty code block)
Unequal column counts → shorter rows padded with empty cells to the max column count
Nested tables → .text() on cells naturally flattens nested content
Single-column tables → rendered as plain lines (no pipes needed since joinToString(" | ") on a single element produces no separator)
Cell whitespace → trimmed via element.text().trim()
Multiple tables → each converted independently (list is snapshotted before iteration to avoid concurrent modification)
jsoup auto-wraps <tbody> → jsoup always wraps bare <tr> elements in a <tbody> during parsing; the extraction logic handles this correctly through the tbody != null branch

Files changed

New: HtmlTableToText.kt — standalone Document.convertTablesToText() extension
Modified: ToHtmlDocument.kt — pre-processes raw HTML before HtmlToDomParser.document()
New: HtmlTableToTextTest.kt — 10 unit tests (simple table, thead, th detection, unequal cols, empty, surrounding content, multiple tables, single column, whitespace, integration)
Modified: ToPlainTextTest.kt — 1 additional test for the plain-text pipeline

Limitations and future considerations

No colspan/rowspan support — cells spanning multiple columns or rows are treated as single cells. This could be improved but adds significant complexity for a rare case.
The toPlainText() path collapses formatting — PlainTextNodeVisitor uses TextNode.text() which normalizes whitespace, so the pipe table loses its newlines in plain-text output. The primary rendering path (HTML → wysiwyg) preserves formatting correctly.
Ideally the wysiwyg library would support tables natively — this is a pragmatic workaround. If the wysiwyg library adds table support in the future, this pre-processing can be removed.

Test plan

HtmlTableToTextTest — 10 tests all passing
ToPlainTextTest — all tests passing including the new table test
ToHtmlDocumentTest — all existing tests still passing (no regression)
Manual testing: send a table from Element Web, verify it renders as a code block with aligned columns on Android

🤖 Generated with Claude Code

The wysiwyg library (io.element.android:wysiwyg) does not support <table> tags — its Safelist strips them during Jsoup.clean(), leaving only flattened text with no structure. This pre-processes the raw HTML *before* it reaches the wysiwyg parser, replacing <table> elements with <pre><code> blocks containing a pipe-based text representation that the wysiwyg library already renders as styled code blocks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-13T12:31:41Z

Thank you for your contribution! Here are a few things to check in the PR to ensure it's reviewed as quickly as possible:

Your branch should be based on origin/develop, at least when it was created.
The title of the PR will be used for release notes, so it needs to describe the change visible to the user.
The test pass locally running ./gradlew test.
The code quality check suite pass locally running ./gradlew runQualityChecks.
If you modified anything related to the UI, including previews, you'll have to run the Record screenshots GH action in your forked repo: that will generate compatible new screenshots. However, given Github Actions limitations, it will prevent the CI from running temporarily, until you upload a new commit after that one. To do so, just pull the latest changes and push an empty commit.

CLAassistant · 2026-02-13T12:31:41Z

All committers have signed the CLA.

olivierlambert · 2026-02-13T12:49:39Z

FYI, my goal is to provide a first idea/possibility to solve the fact tables aren't displayed correctly on Element-X (on my Android phone). I'm not experienced enough myself to provide a solution, it was done via Claude Code.

I would understand if you are not willing to merge this, I'm simply hopeful it could bring some ideas or make things easier to solve that functional limitation. I tried my best that the result fits with the existing tests and code base.

olivierlambert requested a review from a team as a code owner February 13, 2026 12:31

olivierlambert requested review from ganfra and removed request for a team February 13, 2026 12:31

github-actions bot added the Z-Community-PR Issue is solved by a community member's PR label Feb 13, 2026

olivierlambert changed the title ~~Render HTML tables as pipe-formatted text in code blocks~~ Display tables from other clients as formatted text Feb 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Display tables from other clients as formatted text#6201

Display tables from other clients as formatted text#6201
olivierlambert wants to merge 1 commit intoelement-hq:developfrom
olivierlambert:feature/render-markdown-tables-as-text

olivierlambert commented Feb 13, 2026

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

CLAassistant commented Feb 13, 2026 •

edited

Loading

Uh oh!

olivierlambert commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

olivierlambert commented Feb 13, 2026

Summary

Why this approach

The core constraint: the wysiwyg Safelist

Why <pre><code> as the output format

Why pre-process the HTML string rather than the DOM

Separator style: -+- vs |

Header detection heuristic

Edge cases handled

Files changed

Limitations and future considerations

Test plan

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

CLAassistant commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

olivierlambert commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why `<pre><code>` as the output format

Separator style: `-+-` vs `|`

CLAassistant commented Feb 13, 2026 •

edited

Loading