diff --git a/tests/norm-rule/expected/test-norm-rules.adoc b/tests/norm-rule/expected/test-norm-rules.adoc index d2489cd..e4e4a72 100644 --- a/tests/norm-rule/expected/test-norm-rules.adoc +++ b/tests/norm-rule/expected/test-norm-rules.adoc @@ -13,14 +13,18 @@ | inside inline a| link:test.html#norm:inline[norm:inline] .2+| no_tag -| Normative rule without tag/tags | Rule's 'summary' property -| This normative rule has no references to the standard. This should only be used in extraordinary circumstances. | Rule's 'note' property +| Normative rule *without* tag/tags | Rule's 'summary' property +| This normative rule has no references to the standard. This should only be used in extraordinary circumstances. +It does include a link to <> (another normative rule). +Has basic adoc formatting such as *bold*, ita__lics__, `monospace`, 2^superscript^, ~subscript~, [.underline]#underline#, +and ≤ (Unicode text for less-than-equals-to) and ≠ (Unicode decimal value for not-equal-to). + | Rule's 'note' property .1+| inline-with-hash | includes a hash # symbol. a| link:test.html#norm:inline-with-hash[norm:inline-with-hash] .2+| paragraph-with-a-really-wide-rule-name -| Here's a description. +| Here's a [.underline]#description#. It's got 2 lines. | Rule's 'description' property | Paragraph without inline anchors a| link:test.html#norm:paragraph:no-inline-anchors[norm:paragraph:no-inline-anchors] @@ -40,11 +44,17 @@ It's got 2 lines. .1+| double_tags | This paragraph has two tags but we only ever get a tag for norm:def. a| link:test.html#norm:def[norm:def] +.1+| bold +| ABC is a network - Bold is removed by tags backend so I don't see it a| link:test.html#norm:bold[norm:bold] + +.1+| italics +| Let's have fun today - Italics is removed by tags backend so I don't see it a| link:test.html#norm:italics[norm:italics] + .1+| superscript -| xyz 2^32^ 123 a| link:test.html#norm:superscript[norm:superscript] +| both 2^32^ and ^32^ work a| link:test.html#norm:superscript[norm:superscript] .1+| subscript -| xyz X~i~ 123 a| link:test.html#norm:subscript[norm:subscript] +| both ~log~ and log~2~ work a| link:test.html#norm:subscript[norm:subscript] .1+| inline-underline | ABC [.underline]#inside tag# GHI a| link:test.html#norm:inline-underline[norm:inline-underline] diff --git a/tests/norm-rule/expected/test-norm-rules.html b/tests/norm-rule/expected/test-norm-rules.html index 35c97f7..4fcbf68 100644 --- a/tests/norm-rule/expected/test-norm-rules.html +++ b/tests/norm-rule/expected/test-norm-rules.html @@ -131,11 +131,11 @@

my-chapter_name

no_tag - Normative rule without tag/tags + Normative rule without tag/tags Rule's "summary" property - This normative rule has no references to the standard. This should only be used in extraordinary circumstances. + This normative rule has no references to the standard. This should only be used in extraordinary circumstances.
It does include a link to table1 (another normative rule).
Has basic adoc formatting such as bold, italics, monospace, 2superscript, subscript, underline,
and ≤ (Unicode text for less-than-equals-to) and ≠ (Unicode decimal value for not-equal-to).
Rule's "note" property @@ -145,7 +145,7 @@

my-chapter_name

paragraph-with-a-really-wide-rule-name - Here's a description.
It's got 2 lines.
+ Here's a description.
It's got 2 lines.
Rule's "description" property @@ -177,14 +177,24 @@

my-chapter_name

This paragraph has two tags but we only ever get a tag for norm:def. norm:def + + bold + ABC is a network - Bold is removed by tags backend so I don't see it + norm:bold + + + italics + Let's have fun today - Italics is removed by tags backend so I don't see it + norm:italics + superscript - xyz 232 123 + both 232 and 32 work norm:superscript subscript - xyz Xi 123 + both log and log2 work norm:subscript diff --git a/tests/norm-rule/expected/test-norm-rules.json b/tests/norm-rule/expected/test-norm-rules.json index f5a5e48..100ad05 100644 --- a/tests/norm-rule/expected/test-norm-rules.json +++ b/tests/norm-rule/expected/test-norm-rules.json @@ -23,8 +23,8 @@ "name": "no_tag", "def_filename": "tests/norm-rule/test.yaml", "chapter_name": "my-chapter_name", - "summary": "Normative rule without tag/tags", - "note": "This normative rule has no references to the standard. This should only be used in extraordinary circumstances.", + "summary": "Normative rule *without* tag/tags", + "note": "This normative rule has no references to the standard. This should only be used in extraordinary circumstances.\nIt does include a link to <> (another normative rule).\nHas basic adoc formatting such as *bold*, ita__lics__, `monospace`, 2^superscript^, ~subscript~, [.underline]#underline#,\nand ≤ (Unicode text for less-than-equals-to) and ≠ (Unicode decimal value for not-equal-to).\n", "tags": [] }, { @@ -44,7 +44,7 @@ "name": "paragraph-with-a-really-wide-rule-name", "def_filename": "tests/norm-rule/test.yaml", "chapter_name": "my-chapter_name", - "description": "Here's a description.\nIt's got 2 lines.\n", + "description": "Here's a [.underline]#description#.\nIt's got 2 lines.\n", "tags": [ { "name": "norm:paragraph:no-inline-anchors", @@ -119,6 +119,32 @@ } ] }, + { + "name": "bold", + "def_filename": "tests/norm-rule/test.yaml", + "chapter_name": "my-chapter_name", + "tags": [ + { + "name": "norm:bold", + "text": "ABC is a network - Bold is removed by tags backend so I don't see it", + "tag_filename": "/build/test-norm-tags.json", + "stds_doc_url": "test.html" + } + ] + }, + { + "name": "italics", + "def_filename": "tests/norm-rule/test.yaml", + "chapter_name": "my-chapter_name", + "tags": [ + { + "name": "norm:italics", + "text": "Let's have fun today - Italics is removed by tags backend so I don't see it", + "tag_filename": "/build/test-norm-tags.json", + "stds_doc_url": "test.html" + } + ] + }, { "name": "superscript", "def_filename": "tests/norm-rule/test.yaml", @@ -126,7 +152,7 @@ "tags": [ { "name": "norm:superscript", - "text": "xyz 2^32^ 123", + "text": "both 2^32^ and ^32^ work", "tag_filename": "/build/test-norm-tags.json", "stds_doc_url": "test.html" } @@ -139,7 +165,7 @@ "tags": [ { "name": "norm:subscript", - "text": "xyz X~i~ 123", + "text": "both ~log~ and log~2~ work", "tag_filename": "/build/test-norm-tags.json", "stds_doc_url": "test.html" } diff --git a/tests/norm-rule/expected/test-norm-rules.xlsx b/tests/norm-rule/expected/test-norm-rules.xlsx index 3fd27de..5fcc1f8 100644 Binary files a/tests/norm-rule/expected/test-norm-rules.xlsx and b/tests/norm-rule/expected/test-norm-rules.xlsx differ diff --git a/tests/norm-rule/expected/test-norm-tags.json b/tests/norm-rule/expected/test-norm-tags.json index 1292f4e..21f3b89 100644 --- a/tests/norm-rule/expected/test-norm-tags.json +++ b/tests/norm-rule/expected/test-norm-tags.json @@ -8,8 +8,10 @@ "norm:paragraph:tag_with_newlines": "Here’s the first line. Here’s the second line.", "norm:def": "This paragraph has two tags but we only ever get a tag for norm:def.", "norm:formulae": "This paragraph looks like a formulae to Excel because it has this < sign in it. Make sure this gets written as a string, not a formulae in the XLSX or else it will create an error in Excel.", - "norm:superscript": "xyz 2^32^ 123", - "norm:subscript": "xyz X~i~ 123", + "norm:bold": "ABC is a network - Bold is removed by tags backend so I don't see it", + "norm:italics": "Let's have fun today - Italics is removed by tags backend so I don't see it", + "norm:superscript": "both 2^32^ and ^32^ work", + "norm:subscript": "both ~log~ and log~2~ work", "norm:inline-underline": "ABC [.underline]#inside tag# GHI", "norm:paragraph-underline": "Paragraph underlined outside.", "norm:standalone_ampersand": "ABC & DEF", @@ -86,6 +88,8 @@ "id": "_chapter_1_3_asciidoc_formatting", "children": [], "tags": [ + "norm:bold", + "norm:italics", "norm:superscript", "norm:subscript", "norm:inline-underline", diff --git a/tests/norm-rule/test.adoc b/tests/norm-rule/test.adoc index faa7211..9d876fc 100644 --- a/tests/norm-rule/test.adoc +++ b/tests/norm-rule/test.adoc @@ -64,9 +64,13 @@ Make sure this gets written as a string, not a formulae in the XLSX or else it w These are present to test conversation of normative text that contains AsciiDoc formatting to output formats such as HTML. -Superscript [#norm:superscript]#xyz 2^32^ 123# outside. +Bold and italics are stripped by the tags backend somehow so they aren't present in the tags JSON file. +Bold [#norm:bold]#*ABC* is a n**et**work - Bold is removed by tags backend so I don't see it# outside. +Italics [#norm:italics]#Let's have _fun_ to__day__ - Italics is removed by tags backend so I don't see it# outside. -Subscript [#norm:subscript]#xyz X~i~ 123# outside. +Superscript [#norm:superscript]#both 2^32^ and ^32^ work# outside. + +Subscript [#norm:subscript]#both ~log~ and log~2~ work# outside. Inline underline [.underline]#before tag# [#norm:inline-underline]+ABC [.underline]#inside tag# GHI+ outside. diff --git a/tests/norm-rule/test.yaml b/tests/norm-rule/test.yaml index cd91a39..792f0ea 100644 --- a/tests/norm-rule/test.yaml +++ b/tests/norm-rule/test.yaml @@ -14,15 +14,19 @@ normative_rule_definitions: instances: [Zicsr, ABC] tag: "norm:inline" - name: no_tag - summary: Normative rule without tag/tags - note: This normative rule has no references to the standard. This should only be used in extraordinary circumstances. + summary: Normative rule *without* tag/tags + note: | + This normative rule has no references to the standard. This should only be used in extraordinary circumstances. + It does include a link to <> (another normative rule). + Has basic adoc formatting such as *bold*, ita__lics__, `monospace`, 2^superscript^, ~subscript~, [.underline]#underline#, + and ≤ (Unicode text for less-than-equals-to) and ≠ (Unicode decimal value for not-equal-to). - name: inline-with-hash tag: "norm:inline-with-hash" # Paragraph - name: paragraph-with-a-really-wide-rule-name description: | - Here's a description. + Here's a [.underline]#description#. It's got 2 lines. tag: "norm:paragraph:no-inline-anchors" - name: inline-anchors-in-paragraph-entire @@ -37,6 +41,10 @@ normative_rule_definitions: tag: "norm:def" # AsciiDoc formatting + - name: bold + tag: "norm:bold" + - name: italics + tag: "norm:italics" - name: superscript tag: "norm:superscript" - name: subscript diff --git a/tools/create_normative_rules.rb b/tools/create_normative_rules.rb index 3f2d05c..c295fca 100644 --- a/tools/create_normative_rules.rb +++ b/tools/create_normative_rules.rb @@ -566,22 +566,99 @@ def validate_defs_and_tags(defs, tags, warn_if_tags_no_rules) module Adoc2HTML extend self + # Apply constrained formatting pair transformation + # Single delimiter, bounded by whitespace/punctuation + # Matches: *text*, _text_, ^text^, ~text~ + # Example: "That is *strong* stuff!" or "This is *strong*!" + # + # @param text [String] The text to transform + # @param delimiter [String] The formatting delimiter (e.g., '*', '_', '^', '~') + # @yield [content] Block that transforms the captured content + # @yieldparam content [String] The text between the delimiters + # @yieldreturn [String] The transformed content + # @return [String] The text with formatting applied + def constrained_format_pattern(text, delimiter, &block) + escaped_delimiter = Regexp.escape(delimiter) + # (?:^|\s) - start of line or space before + # \K - keep assertion (excludes preceding pattern from match) + # #{escaped_delimiter} - single opening mark + # (\S(?:(?!\s).*?(? foo + def convert_bold(text) + text = constrained_format_pattern(text, "*") { |content| "#{content}" } + text = unconstrained_format_pattern(text, "*") { |content| "#{content}" } + end + + # Convert italics notation: _bar_ -> bar + def convert_italics(text) + text = constrained_format_pattern(text, "_") { |content| "#{content}" } + text = unconstrained_format_pattern(text, "_") { |content| "#{content}" } + end + + # Convert monospace notation: `zort` -> zort + def convert_monospace(text) + text = constrained_format_pattern(text, "`") { |content| "#{content}" } + text = unconstrained_format_pattern(text, "`") { |content| "#{content}" } + end + # Convert superscript notation: 2^32^ -> 232 - # Uses non-greedy matching and allows various content types + # ^32^ -> 32 + # Superscript uses continuous formatting (no spaces allowed in content) def convert_superscript(text) - # Match word followed by ^content^, where content doesn't contain ^ - text.gsub(/(\w+)\^([^\^]+?)\^/) do - "#{$1}#{$2}" - end + text = continuous_format_pattern(text, "^") { |content| "#{content}" } end # Convert subscript notation: X~i~ -> Xi - # Uses non-greedy matching and allows various content types + # ~i~ -> i + # Subscript uses continuous formatting (no spaces allowed in content) def convert_subscript(text) - # Match word followed by ~content~, where content doesn't contain ~ - text.gsub(/(\w+)~([^~]+?)~/) do - "#{$1}#{$2}" - end + text = continuous_format_pattern(text, "~") { |content| "#{content}" } end # Convert underline notation: [.underline]#text# -> text @@ -652,6 +729,9 @@ def convert_unicode_names(text) # Apply all format conversions (keeping numeric entities). def convert(text) result = text.dup + result = convert_bold(result) + result = convert_italics(result) + result = convert_monospace(result) result = convert_superscript(result) result = convert_subscript(result) result = convert_underline(result) @@ -1032,24 +1112,30 @@ def html_chapter_table(f, table_num, chapter_name, nr_defs, tags, tag_fname2url) f.puts(%Q{ #{nr.name}}) unless nr.summary.nil? + text = convert_adoc_links_to_html(convert_newlines_to_html(Adoc2HTML::convert(nr.summary))) + f.puts(%Q{ }) unless row_started - f.puts(%Q{ #{nr.summary}}) + f.puts(%Q{ #{text}}) f.puts(%Q{ Rule's "summary" property}) f.puts(%Q{ }) row_started = false end unless nr.note.nil? + text = convert_adoc_links_to_html(convert_newlines_to_html(Adoc2HTML::convert(nr.note))) + f.puts(%Q{ }) unless row_started - f.puts(%Q{ #{nr.note}}) + f.puts(%Q{ #{text}}) f.puts(%Q{ Rule's "note" property}) f.puts(%Q{ }) row_started = false end unless nr.description.nil? + text = convert_adoc_links_to_html(convert_newlines_to_html(Adoc2HTML::convert(nr.description))) + f.puts(%Q{ }) unless row_started - f.puts(%Q{ #{convert_newlines_to_html(nr.description)}}) + f.puts(%Q{ #{text}}) f.puts(%Q{ Rule's "description" property}) f.puts(%Q{ }) row_started = false @@ -1082,38 +1168,17 @@ def html_chapter_table(f, table_num, chapter_name, nr_defs, tags, tag_fname2url) tag = tags.get_tag(tag_ref) fatal("Normative rule #{nr.name} defined in file #{nr.def_filename} references non-existent tag #{tag_ref}") if tag.nil? - html_fname = tag_fname2url[tag.tag_filename] - fatal("No fname tag to HTML mapping (-tag2url cmd line arg) for tag fname #{tag.tag_filename} for tag name #{tag.name}") if html_fname.nil? + target_html_fname = tag_fname2url[tag.tag_filename] + fatal("No fname tag to HTML mapping (-tag2url cmd line arg) for tag fname #{tag.tag_filename} for tag name #{tag.name}") if target_html_fname.nil? tag_text = convert_newlines_to_html(convert_tags_tables_to_html(Adoc2HTML::convert(tag.text))) - # Convert adoc links to normative text in tag text to html links. - # - # Supported formats: - # <> - # <> - # + # Convert adoc links to HTML links. # Can assume that the link is to the same HTML standards document as the - # tag text that it is found in because these kind of links only link within their document. - # - # Note that I'm using the non-greedy regular expression (? after +) otherwise the regular expression - # will return multiple <> in the same text as one. - tag_text.gsub!(/#{LT_UNICODE_STR}#{LT_UNICODE_STR}(.+?)#{GT_UNICODE_STR}#{GT_UNICODE_STR}/) do - # Look to see if custom text has been provided. - split_texts = $1.split(",").map(&:strip) - - if split_texts.length == 0 - fail("Hyperlink '#{$1}' is empty") - elsif split_texts.length == 1 - tag2html_link(split_texts[0], split_texts[0], html_fname) - elsif split_texts.length == 2 - tag2html_link(split_texts[0], split_texts[1], html_fname) - else - fail("Hyperlink '#{$1}' contains too many commas") - end - end + # tag text that it is found in because these kind of adoc links only link within their document. + tag_text = convert_adoc_links_to_html(tag_text, target_html_fname) - tag_link = tag2html_link(tag_ref, tag_ref, html_fname) + tag_link = tag2html_link(tag_ref, tag_ref, target_html_fname) f.puts(%Q{ }) unless row_started f.puts(%Q{ #{tag_text}}) @@ -1128,12 +1193,17 @@ def html_chapter_table(f, table_num, chapter_name, nr_defs, tags, tag_fname2url) f.puts(%Q{ }) end -def tag2html_link(tag_ref, link_text, html_fname) +# If no target_html_fname is provided, assumes anchor is in same HTML file as link (i.e., an HTML "fragment" link). +def tag2html_link(tag_ref, link_text, target_html_fname = nil) fatal("Expected String for tag_ref but was passed a #{tag_ref.class}") unless tag_ref.is_a?(String) fatal("Expected String for link_text but was passed a #{link_text.class}") unless link_text.is_a?(String) - fatal("Expected String for html_fname but was passed a #{html_fname.class}") unless html_fname.is_a?(String) + unless target_html_fname.nil? + fatal("Expected String for target_html_fname but was passed a #{target_html_fname.class}") unless target_html_fname.is_a?(String) + end - return %Q{#{link_text}} + target_html_fname = "" if target_html_fname.nil? + + return %Q{#{link_text}} end def html_script(f) @@ -1299,6 +1369,37 @@ def convert_newlines_to_html(text) text.gsub(/\n/, '
') end +# Convert adoc links to HTML links. +# +# Supported adoc link formats: +# <> +# <> +# +# If target_html_fname is not provided, link will assume anchor is in the same HTML file as the link. +def convert_adoc_links_to_html(text, target_html_fname = nil) + raise ArgumentError, "Passed class #{text.class} for text but require String" unless text.is_a?(String) + unless target_html_fname.nil? + raise ArgumentError, "Passed class #{target_html_fname.class} for target_html_fname but require String" unless target_html_fname.is_a?(String) + end + + # Note that I'm using the non-greedy regular expression (? after +) otherwise the regular expression + # will return multiple <> in the same text as one. + text.gsub(/(<<|#{LT_UNICODE_STR}#{LT_UNICODE_STR})(.+?)(>>|#{GT_UNICODE_STR}#{GT_UNICODE_STR})/) do + # Look to see if custom text has been provided. + split_texts = $2.split(",").map(&:strip) + + if split_texts.length == 0 + fail("Hyperlink '#{$2}' is empty") + elsif split_texts.length == 1 + tag2html_link(split_texts[0], split_texts[0], target_html_fname) + elsif split_texts.length == 2 + tag2html_link(split_texts[0], split_texts[1], target_html_fname) + else + fail("Hyperlink '#{$2}' contains too many commas") + end + end +end + #main() info("Passed command-line: #{ARGV.join(' ')}")