Convert adoc tables to html tables#138
Conversation
Signed-off-by: James Ball <jameball@qti.qualcomm.com>
There was a problem hiding this comment.
Pull request overview
This PR converts AsciiDoc tables embedded in normative rule tags from a custom text format to proper HTML tables. The conversion handles tables with optional headers and truncates large tables to a maximum of 12 rows. The implementation modifies the tags backend to prefix table cells with pipes and introduces new conversion functions to parse and transform these tables into HTML.
- Changes the "tags" Asciidoctor backend to prefix table cells with
|instead of separating them - Adds
convert_tags_tables_to_html()function to parse the pipe-prefixed format and generate HTML tables - Renames
html_convert_newlines()toconvert_newlines_to_html()for consistency - Fixes test data typo (Name7 paired with Color7 instead of Color8)
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tools/create_normative_rules.rb | Adds table-to-HTML conversion functions, renames newline conversion function, introduces MAX_TABLE_ROWS constant |
| converters/tags.rb | Updates table cell formatting to prefix with pipes instead of separating |
| tests/norm-rule/test.adoc | Fixes test data inconsistency and updates comment for clarity |
| tests/norm-rule/expected/. | Updates expected outputs to reflect new HTML table format |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # This pattern matches strings that: | ||
| # - Start with a non-pipe, non-whitespace character | ||
| # - Then contain zero or more non-pipe characters (can include internal spaces) | ||
| # - End with a non-pipe, non-whitespace character | ||
| # | ||
| # All leading/trailing whitespace is removed. | ||
| # | ||
| # Examples: | ||
| # "| H1 | H2".scan(/[^|\s][^|]*[^|\s]/) | ||
| # => ["H1", "H2"] | ||
| # | ||
| # "| ABC | DEF GHI |".scan(/[^|\s][^|]*[^|\s]/) | ||
| # => ["ABC", "DEF GHI"] # Note: internal space preserved | ||
| # | ||
| # "| Name | Value |".scan(/[^|\s][^|]*[^|\s]/) | ||
| # => ["Name", "Value"] # Leading/trailing spaces removed | ||
| text.scan(/[^|\s][^|]*[^|\s]/) |
There was a problem hiding this comment.
The regex pattern /[^|\s][^|]*[^|\s]/ requires at least two non-whitespace, non-pipe characters, which means it will fail to match single-character cell content. For example, a cell containing just "A" or "1" would not be extracted.
To fix this, the pattern should be changed to /[^|\s]([^|]*[^|\s])?/ to match either:
- A single non-pipe, non-whitespace character, OR
- A non-pipe, non-whitespace character followed by zero or more non-pipe characters, ending with a non-pipe, non-whitespace character
Alternatively, use a simpler approach: /\|([^|]*)/ and then trim whitespace from each captured group.
| # This pattern matches strings that: | |
| # - Start with a non-pipe, non-whitespace character | |
| # - Then contain zero or more non-pipe characters (can include internal spaces) | |
| # - End with a non-pipe, non-whitespace character | |
| # | |
| # All leading/trailing whitespace is removed. | |
| # | |
| # Examples: | |
| # "| H1 | H2".scan(/[^|\s][^|]*[^|\s]/) | |
| # => ["H1", "H2"] | |
| # | |
| # "| ABC | DEF GHI |".scan(/[^|\s][^|]*[^|\s]/) | |
| # => ["ABC", "DEF GHI"] # Note: internal space preserved | |
| # | |
| # "| Name | Value |".scan(/[^|\s][^|]*[^|\s]/) | |
| # => ["Name", "Value"] # Leading/trailing spaces removed | |
| text.scan(/[^|\s][^|]*[^|\s]/) | |
| # This pattern matches everything between pipes, including single-character and empty cells. | |
| # All leading/trailing whitespace is removed from each cell. | |
| # | |
| # Examples: | |
| # "| H1 | H2".scan(/\|([^|]*)/) | |
| # => [["H1 "], [" H2"]] | |
| # .map(&:strip) => ["H1", "H2"] | |
| # | |
| # "| A | 1 |".scan(/\|([^|]*)/) | |
| # => [["A "], [" 1"]] | |
| # .map(&:strip) => ["A", "1"] | |
| # | |
| # "| Name | Value |".scan(/\|([^|]*)/) | |
| # => [[" Name "], [" Value "]] | |
| # .map(&:strip) => ["Name", "Value"] | |
| text.scan(/\|([^|]*)/).map { |m| m[0].strip } |
| ret = text # Default to input | ||
|
|
||
| text.gsub(/(.*?)===\n(.+)\n===/m) do | ||
| # Found a "tags" formatted table | ||
| heading = $1.chomp # Remove trailing newline | ||
| rows = $2.split("\n") # Split into array of rows | ||
|
|
||
| ret = "<table>".dup # Start html table | ||
|
|
||
| # Add heading if present | ||
| heading_cells = extract_tags_table_cells(heading) | ||
| unless heading_cells.empty? | ||
| ret << "<thead>" | ||
| ret << "<tr>" | ||
| ret << heading_cells.map { |cell| "<th>#{cell}</th>" }.join("") | ||
| ret << "</tr>" | ||
| ret << "</thead>" | ||
| end | ||
|
|
||
| # Add each row | ||
| ret << "<tbody>" | ||
| rows.each_with_index do |row,index| | ||
| if index < MAX_TABLE_ROWS | ||
| ret << "<tr>" | ||
| row_cells = extract_tags_table_cells(row) | ||
| ret << row_cells.map { |cell| "<td>#{cell}</td>" }.join("") | ||
| ret << "</tr>" | ||
| elsif index == MAX_TABLE_ROWS | ||
| ret << "<tr>" | ||
| row_cells = extract_tags_table_cells(row) | ||
| ret << row_cells.map { |cell| "<td>...</td>" }.join("") | ||
| ret << "</tr>" | ||
| end | ||
| end | ||
|
|
||
| ret << "</tbody>" | ||
| ret << "</table>" # End html table |
There was a problem hiding this comment.
The function does not return the result of gsub. The variable ret is assigned inside the gsub block (line 1175), but this assignment is scoped to the block. The function returns the original ret (initialized to text on line 1168), not the transformed HTML table.
To fix this, assign the result of gsub to ret:
ret = text.gsub(/(.*?)===\n(.+)\n===/m) do
html = "<table>".dup
# ... build up html instead of ret ...
html
end
return ret| ret = text # Default to input | |
| text.gsub(/(.*?)===\n(.+)\n===/m) do | |
| # Found a "tags" formatted table | |
| heading = $1.chomp # Remove trailing newline | |
| rows = $2.split("\n") # Split into array of rows | |
| ret = "<table>".dup # Start html table | |
| # Add heading if present | |
| heading_cells = extract_tags_table_cells(heading) | |
| unless heading_cells.empty? | |
| ret << "<thead>" | |
| ret << "<tr>" | |
| ret << heading_cells.map { |cell| "<th>#{cell}</th>" }.join("") | |
| ret << "</tr>" | |
| ret << "</thead>" | |
| end | |
| # Add each row | |
| ret << "<tbody>" | |
| rows.each_with_index do |row,index| | |
| if index < MAX_TABLE_ROWS | |
| ret << "<tr>" | |
| row_cells = extract_tags_table_cells(row) | |
| ret << row_cells.map { |cell| "<td>#{cell}</td>" }.join("") | |
| ret << "</tr>" | |
| elsif index == MAX_TABLE_ROWS | |
| ret << "<tr>" | |
| row_cells = extract_tags_table_cells(row) | |
| ret << row_cells.map { |cell| "<td>...</td>" }.join("") | |
| ret << "</tr>" | |
| end | |
| end | |
| ret << "</tbody>" | |
| ret << "</table>" # End html table | |
| ret = text.gsub(/(.*?)===\n(.+)\n===/m) do | |
| # Found a "tags" formatted table | |
| heading = $1.chomp # Remove trailing newline | |
| rows = $2.split("\n") # Split into array of rows | |
| html = "<table>".dup # Start html table | |
| # Add heading if present | |
| heading_cells = extract_tags_table_cells(heading) | |
| unless heading_cells.empty? | |
| html << "<thead>" | |
| html << "<tr>" | |
| html << heading_cells.map { |cell| "<th>#{cell}</th>" }.join("") | |
| html << "</tr>" | |
| html << "</thead>" | |
| end | |
| # Add each row | |
| html << "<tbody>" | |
| rows.each_with_index do |row,index| | |
| if index < MAX_TABLE_ROWS | |
| html << "<tr>" | |
| row_cells = extract_tags_table_cells(row) | |
| html << row_cells.map { |cell| "<td>#{cell}</td>" }.join("") | |
| html << "</tr>" | |
| elsif index == MAX_TABLE_ROWS | |
| html << "<tr>" | |
| row_cells = extract_tags_table_cells(row) | |
| html << row_cells.map { |cell| "<td>...</td>" }.join("") | |
| html << "</tr>" | |
| end | |
| end | |
| html << "</tbody>" | |
| html << "</table>" # End html table | |
| html |
| elsif index == MAX_TABLE_ROWS | ||
| ret << "<tr>" | ||
| row_cells = extract_tags_table_cells(row) | ||
| ret << row_cells.map { |cell| "<td>...</td>" }.join("") |
There was a problem hiding this comment.
On line 1197, row_cells is extracted but then not used. Instead, line 1198 creates ... for every cell, which means the number of ... placeholders will match the column count. However, this defeats the purpose of extracting row_cells on line 1197.
The code should either:
- Remove line 1197 entirely since
row_cellsis not used, OR - Use
row_cells.sizeto determine how many...cells to create
Currently, the unused variable may cause confusion about the code's intent.
| ret << row_cells.map { |cell| "<td>...</td>" }.join("") | |
| ret << row_cells.size.times.map { "<td>...</td>" }.join("") |
No description provided.