Softer HTML table processing? #11464
Replies: 2 comments 1 reply
-
There are a lot of corner cases in that code base, and I'm not particularly keen to change it anytime soon. With that said, if you're willing to contribute a PR that doesn't break backwards compatibility and includes tests and documentation, we'd be happy to take it. |
Beta Was this translation helpful? Give feedback.
-
|
Alright, I'll see what I can do. Not sure about the tests though: Which cases should be covered to check backwards compatibility? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Description
I recently dug a bit deeper into Quarto's HTML table processing to understand what exactly is supported and what isn't. It seems that (part of it) is implemented in
parsehtml.lua. The functionhandle_raw_html_as_tablecontains a comment which isn't wrong but also not completely correct:It is true that Pandoc's table model does not distinguish between
Cells which aretdand which areth. However, Pandoc doesn't completely ignore the difference in the input either.Within a
tbody,if one or more of the uppermost rows contain only
thelements, they are assigned to theTableBodypropertyhead("intermediate head") instead of remaining in the propertybody("table body rows") with the other rows, andif one or more of the leftmost columns contain only
thelements, this is translated into theTableBodypropertyrow_head_columns("number of columns taken up by the row head of each row").Basically, Pandoc imposes a stricter semantic model, where
thelements cannot be used arbitrarily within atbody, but can only be used to indicate "intermediate head" rows as well as "row head" or stub columns. If aTableelement is written to HTML, cells inTableHead,TableBody.headand within theTableBody.row_head_columnsleftmost columns are written asthand all others astd. That means that if an HTML table conforms to Pandocs stricter semantic model, the distinction betweentdandthis actually preserved in HTML output.To my knowledge, this behavior is not documented anywhere; I was pointed to
row_head_columnsand the above is the result of my experiments. Maybe @jgm can confirm?Why this is important for Quarto's HTML table processing? I believe it is too radical.
It is very useful to be able to include tables in HTML format in Markdown documents and have them parsed by Pandoc, such that they are output to all formats. However, the replacement of all
thelements bytd data-quarto-table-cell-role="th"prevents Pandoc from detecting the described semantic structure, which at least potentially degrades the structure of tables in output formats other than HTML.It may make sense wanting to preserve the distinction between
tdandthmore generally, but wouldn't it be enough to replacethelements bythdata-quarto-table-cell-role="th"?Moreover, the linked-to
gtissue onthelements for accessibility probably could have been solved without this special processing, because Pandoc does preservethelements in stub columns.Finally, the "HTML postprocessor" mentioned in the comment doesn't seem to work as intended; at least I get
td data-quarto-table-cell-role="th"in Quarto's HTML output.My feature request is to modify Quarto's HTML table processing such that it does preserve Pandoc's semantic table structure.
Beta Was this translation helpful? Give feedback.
All reactions