Skip to content

Commit b92986a

Browse files
authored
Change comment directive parsing (#1149)
Fix comment directive parsing problem # Problem of comment parsing The main problem is that `@preprocess.handle` parses comment, removes directive, and process code_object at the same time. This pull request change RDoc to parse comment and extract directives first, and then apply directives to code object. ## Flow of legacy RDoc parsing method For example parsing this code ```ruby class A # :yields: x, y # :args: a, b # :call-seq: #-- # :not-new: # :category: foobar #++ # initialize(x, y, z) def initialize(*args, &block); end end ``` ### Step 1 RDoc performs `@preprocess.hanlde` to RDoc::NormalClass. - `:category:` is applied to klass and replaced with blank line - `:not-new:` and `:yields:` are replaced with blank line. maybe bug. - `:args: a, b` is replaced with `:args: a, b` ### Step 2 RDoc performs `@preprocess.hanlde` to RDoc::AnyMethod. `:args: a, b` is applied to `meth.params`. ### Step 3 RDoc removes private section that starts with `#--` and ends with `#++`. ### Step 4 RDoc normalizes comment by removing `#` and indentation. ### Step 5 RDoc extracts `":call-seq:\n initialize(x, y, z)` from comment and apply to method object. ## Problems RDoc removes directives and expand `:include:` twice in some case, and once in other case. To avoid all directives removed in the first `@preprocess.handle`, preprocess needs directive-replace mechanizm which is making things complex. Private section and call-seq are processed later. This is making RDoc accept weird comment like directive inside private section and private section inside call-seq. Handling meta programming method is also hard. `@preprocess.handle(comment, code_object)` requires code object already created. We need to parse the comment to know the code object type (method or attribute). After that, we can finally parse the comment with the code object. C comments are also complicated. :include: can include text containing `*/`. Removing directive line and private section from the comment might remove `/*` and `*/` which makes normalize_comment fail. The original implementation was avoiding this by using different processing order than ruby parser. This is not consistent. # Solution We need to parse comment first and only once to extract directives. Expand `:include:`, read directives (including `:call-seq:`), remove private section at the same time. Comment parser should return normalized comment text and directives as an attribute hash. Directive should also contain line number. # Changed things ## :call-seq: New type of directive called "multiline directive" is introduced to make `:call-seq:` also a directive. ``` # :multiline-directive: # html # head # title # # body # header # footer ``` Multiline directive ends with blank line. This restriction is for compatibility with old RDoc. Some invalid multiline directive (unindented, ends with other directive) is also accepted with warning. The resuld of parsing this call-seq is changed. I think it get better. ``` # :call-seq: # STDIN.getc() -> string # Only this line was call-seq # # STDIN.getc(a) -> string # # STDIN.getc(a, b) -> string # $stdin.getc(c) -> string # It's now call-seq until this line # # :other: ``` ## Private section `#----foobar` was accepted as private section start. `#++++foobar` was decomposed to `#++`(private end) and `++foobar`(normal comment). Start is now `/^#-{2,}$/` (two or more -), end is now `/^#\+{2}$/` (exactly two +). ## Unhandled directives In old RDoc, unhandled directive `# :unknown: foo` remain in normal comment. Now it is removed just like other directives. Unhandled directive is appended to code object's metadata. It does not make sence to leave metadata in the comment. I think this was just a side effect of avoiding double parsing problem. ## Normalize and remove private section Everything is done in parse phase ## C and Simple parser C used to accept `/*\n# :directive:\n*/` but now only accepts `* :directive:`. Changes for call-seq, private section and unhandled directive described above are also applied to C and Simple parser. # Old comment parsing `RDoc::Markup::PreProcess#handle` `RDoc::Comment#extract_call_seq` `RDoc::Comment#remove_private` is only used from `RDoc::Parser::Ruby`. We can remove them in the future. # Diff (updated: 2025/02/02) I compared generated html files of rdoc itself and in `ruby/ruby`. ## HTML meta tag content (ruby/ruby) Files: ``` Date/Error.html Enumerator/Generator.html Enumerator/Producer.html Enumerator/Yielder.html Fiddle/Pointer.html UnicodeNormalize.html ``` Example diff ```html <meta name="description" content="class Date::Error: Exception for invalid date/time "> ↓ <meta name="description" content="class Date::Error: Exception for invalid date/time"> ``` ## OpenSSL/Timestamp/Factory.html (ruby/ruby) This invalid document is parsed differentl ```c /* Document-class: OpenSSL::Timestamp::Factory * Document for default_policy_id * call-seq: * factory.default_policy_id = "string" -> string * Document for serial_number * call-seq: * factory.serial_number = number -> number * Document for gen_time * call-seq: * factory.gen_time = Time -> Time */ ``` ## Win32.html (ruby/ruby, RDOC_USE_PRISM_PARSER) This will no longer considered to be a private section(invisible comment surrounded by -- and ++) ``` --- info --- num_keys ``` ## History_rdoc.html (ruby/rdoc) Parsing this part is improved. ```md * Bug fixes * `ri []` and other special methods now work properly. Issue #52 by ddebernardy. * `ri` now has space between class comments from multiple files. * :stopdoc: no longer creates Object references. Issue #55 by Simon Chiang * :nodoc: works on class aliases now. Issue #51 by Steven G. Harms * Remove tokenizer restriction on header lengths for verbatim sections. Issue #49 by trans ``` The [current document](https://ruby.github.io/rdoc/History_rdoc.html#label-3.9+-2F+2011-07-30) looks like `* :stopdoc:` and `* :nodoc:` was processed as directive. ## lib/rdoc/markdown_kpeg.html (ruby/rdoc) Maybe it shouldn't be documented. https://ruby.github.io/rdoc/lib/rdoc/markdown_kpeg.html ## RDoc/MarkupReference.html (ruby/rdoc, RDOC_USE_PRISM_PARSER) `<pre>:call-seq: ` → `<pre>:call-seq:` (trailing space removed) ## RDoc/Parser/Ruby.html (ruby/rdoc, RDOC_USE_PRISM_PARSER) Escape of `# \:method: or :attr: directives in +comment+.` is now working. Note that this is related to an old bug in master branch ``` class Foo # A string constant with # \:nodoc: (this is documented. :nodoc: is escaped) A = ':nodoc: # Prints the word # \:nodoc: (this method is not documented. :nodoc: is not escaped) def print_colon_nodoc = puts(':nodoc:') end ```
1 parent afca4c1 commit b92986a

File tree

15 files changed

+650
-213
lines changed

15 files changed

+650
-213
lines changed

History.rdoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,7 @@
163163
* Moved old DEVELOPERS file to CONTRIBUTING to match github conventions.
164164
* TomDoc output now has a "Returns" heading. Issue #234 by Brian Henderson
165165
* Metaprogrammed methods can now use the :args: directive in addition to the
166-
:call-seq: directive. Issue #236 by Mike Moore.
166+
\:call-seq: directive. Issue #236 by Mike Moore.
167167
* Sections can be linked to using "@" like labels. If a section and a label
168168
have the same name the section will be preferred. Issue #233 by Brian
169169
Henderson.

doc/rdoc/markup_reference.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1264,7 +1264,7 @@ def dummy_instance_method(foo, bar); end;
12641264
#
12651265
# Here is the <tt>:call-seq:</tt> directive given for the method:
12661266
#
1267-
# :call-seq:
1267+
# \:call-seq:
12681268
# call_seq_directive(foo, bar)
12691269
# Can be anything -> bar
12701270
# Also anything more -> baz or bat

lib/rdoc/comment.rb

Lines changed: 190 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,12 @@ def normalize
162162
self
163163
end
164164

165+
# Change normalized, when creating already normalized comment.
166+
167+
def normalized=(value)
168+
@normalized = value
169+
end
170+
165171
##
166172
# Was this text normalized?
167173

@@ -223,14 +229,190 @@ def tomdoc?
223229
@format == 'tomdoc'
224230
end
225231

226-
##
227-
# Create a new parsed comment from a document
232+
MULTILINE_DIRECTIVES = %w[call-seq].freeze # :nodoc:
228233

229-
def self.from_document(document) # :nodoc:
230-
comment = RDoc::Comment.new('')
231-
comment.document = document
232-
comment.location = RDoc::TopLevel.new(document.file) if document.file
233-
comment
234-
end
234+
# There are more, but already handled by RDoc::Parser::C
235+
COLON_LESS_DIRECTIVES = %w[call-seq Document-method].freeze # :nodoc:
236+
237+
DIRECTIVE_OR_ESCAPED_DIRECTIV_REGEXP = /\A(?<colon>\\?:|:?)(?<directive>[\w-]+):(?<param>.*)/
238+
239+
private_constant :MULTILINE_DIRECTIVES, :COLON_LESS_DIRECTIVES, :DIRECTIVE_OR_ESCAPED_DIRECTIV_REGEXP
240+
241+
class << self
242+
243+
##
244+
# Create a new parsed comment from a document
235245

246+
def from_document(document) # :nodoc:
247+
comment = RDoc::Comment.new('')
248+
comment.document = document
249+
comment.location = RDoc::TopLevel.new(document.file) if document.file
250+
comment
251+
end
252+
253+
# Parse comment, collect directives as an attribute and return [normalized_comment_text, directives_hash]
254+
# This method expands include and removes everything not needed in the document text, such as
255+
# private section, directive line, comment characters `# /* * */` and indent spaces.
256+
#
257+
# RDoc comment consists of include, directive, multiline directive, private section and comment text.
258+
#
259+
# Include
260+
# # :include: filename
261+
#
262+
# Directive
263+
# # :directive-without-value:
264+
# # :directive-with-value: value
265+
#
266+
# Multiline directive (only :call-seq:)
267+
# # :multiline-directive:
268+
# # value1
269+
# # value2
270+
#
271+
# Private section
272+
# #--
273+
# # private comment
274+
# #++
275+
276+
def parse(text, filename, line_no, type, &include_callback)
277+
case type
278+
when :ruby
279+
text = text.gsub(/^#+/, '') if text.start_with?('#')
280+
private_start_regexp = /^-{2,}$/
281+
private_end_regexp = /^\+{2}$/
282+
indent_regexp = /^\s*/
283+
when :c
284+
private_start_regexp = /^(\s*\*)?-{2,}$/
285+
private_end_regexp = /^(\s*\*)?\+{2}$/
286+
indent_regexp = /^\s*(\/\*+|\*)?\s*/
287+
text = text.gsub(/\s*\*+\/\s*\z/, '')
288+
when :simple
289+
# Unlike other types, this implementation only looks for two dashes at
290+
# the beginning of the line. Three or more dashes are considered to be
291+
# a rule and ignored.
292+
private_start_regexp = /^-{2}$/
293+
private_end_regexp = /^\+{2}$/
294+
indent_regexp = /^\s*/
295+
end
296+
297+
directives = {}
298+
lines = text.split("\n")
299+
in_private = false
300+
comment_lines = []
301+
until lines.empty?
302+
line = lines.shift
303+
read_lines = 1
304+
if in_private
305+
# If `++` appears in a private section that starts with `--`, private section ends.
306+
in_private = false if line.match?(private_end_regexp)
307+
line_no += read_lines
308+
next
309+
elsif line.match?(private_start_regexp)
310+
# If `--` appears in a line, private section starts.
311+
in_private = true
312+
line_no += read_lines
313+
next
314+
end
315+
316+
prefix = line[indent_regexp]
317+
prefix_indent = ' ' * prefix.size
318+
line = line.byteslice(prefix.bytesize..)
319+
320+
if (directive_match = DIRECTIVE_OR_ESCAPED_DIRECTIV_REGEXP.match(line))
321+
colon = directive_match[:colon]
322+
directive = directive_match[:directive]
323+
raw_param = directive_match[:param]
324+
param = raw_param.strip
325+
else
326+
colon = directive = raw_param = param = nil
327+
end
328+
329+
if !directive
330+
comment_lines << prefix_indent + line
331+
elsif colon == '\\:'
332+
# If directive is escaped, unescape it
333+
comment_lines << prefix_indent + line.sub('\\:', ':')
334+
elsif raw_param.start_with?(':') || (colon.empty? && !COLON_LESS_DIRECTIVES.include?(directive))
335+
# Something like `:toto::` is not a directive
336+
# Only few directives allows to start without a colon
337+
comment_lines << prefix_indent + line
338+
elsif directive == 'include'
339+
filename_to_include = param
340+
include_callback.call(filename_to_include, prefix_indent).lines.each { |l| comment_lines << l.chomp }
341+
elsif MULTILINE_DIRECTIVES.include?(directive)
342+
value_lines = take_multiline_directive_value_lines(directive, filename, line_no, lines, prefix_indent.size, indent_regexp, !param.empty?)
343+
read_lines += value_lines.size
344+
lines.shift(value_lines.size)
345+
unless param.empty?
346+
# Accept `:call-seq: first-line\n second-line` for now
347+
value_lines.unshift(param)
348+
end
349+
value = value_lines.join("\n")
350+
directives[directive] = [value.empty? ? nil : value, line_no]
351+
else
352+
directives[directive] = [param.empty? ? nil : param, line_no]
353+
end
354+
line_no += read_lines
355+
end
356+
357+
normalized_comment = String.new(encoding: text.encoding) << normalize_comment_lines(comment_lines).join("\n")
358+
[normalized_comment, directives]
359+
end
360+
361+
# Remove preceding indent spaces and blank lines from the comment lines
362+
363+
private def normalize_comment_lines(lines)
364+
blank_line_regexp = /\A\s*\z/
365+
lines = lines.dup
366+
lines.shift while lines.first&.match?(blank_line_regexp)
367+
lines.pop while lines.last&.match?(blank_line_regexp)
368+
369+
min_spaces = lines.map do |l|
370+
l.match(/\A *(?=\S)/)&.end(0)
371+
end.compact.min
372+
if min_spaces && min_spaces > 0
373+
lines.map { |l| l[min_spaces..] || '' }
374+
else
375+
lines
376+
end
377+
end
378+
379+
# Take value lines of multiline directive
380+
381+
private def take_multiline_directive_value_lines(directive, filename, line_no, lines, base_indent_size, indent_regexp, has_param)
382+
return [] if lines.empty?
383+
384+
first_indent_size = lines.first.match(indent_regexp).end(0)
385+
386+
# Blank line or unindented line is not part of multiline-directive value
387+
return [] if first_indent_size <= base_indent_size
388+
389+
if has_param
390+
# :multiline-directive: line1
391+
# line2
392+
# line3
393+
#
394+
value_lines = lines.take_while do |l|
395+
l.rstrip.match(indent_regexp).end(0) > base_indent_size
396+
end
397+
min_indent = value_lines.map { |l| l.match(indent_regexp).end(0) }.min
398+
value_lines.map { |l| l[min_indent..] }
399+
else
400+
# Take indented lines accepting blank lines between them
401+
value_lines = lines.take_while do |l|
402+
l = l.rstrip
403+
indent = l[indent_regexp]
404+
if indent == l || indent.size >= first_indent_size
405+
true
406+
end
407+
end
408+
value_lines.map! { |l| (l[first_indent_size..] || '').chomp }
409+
410+
if value_lines.size != lines.size && !value_lines.last.empty?
411+
warn "#{filename}:#{line_no} Multiline directive :#{directive}: should end with a blank line."
412+
end
413+
value_lines.pop while value_lines.last&.empty?
414+
value_lines
415+
end
416+
end
417+
end
236418
end

lib/rdoc/markup/pre_process.rb

Lines changed: 34 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -97,18 +97,15 @@ def initialize(input_file_name, include_path)
9797
# RDoc::CodeObject#metadata for details.
9898

9999
def handle(text, code_object = nil, &block)
100-
first_line = 1
101100
if RDoc::Comment === text then
102101
comment = text
103102
text = text.text
104-
first_line = comment.line || 1
105103
end
106104

107105
# regexp helper (square brackets for optional)
108106
# $1 $2 $3 $4 $5
109107
# [prefix][\]:directive:[spaces][param]newline
110-
text = text.lines.map.with_index(first_line) do |line, num|
111-
next line unless line =~ /\A([ \t]*(?:#|\/?\*)?[ \t]*)(\\?):([\w-]+):([ \t]*)(.+)?(\r?\n|$)/
108+
text = text.gsub(/^([ \t]*(?:#|\/?\*)?[ \t]*)(\\?):([\w-]+):([ \t]*)(.+)?(\r?\n|$)/) do
112109
# skip something like ':toto::'
113110
next $& if $4.empty? and $5 and $5[0, 1] == ':'
114111

@@ -122,21 +119,48 @@ def handle(text, code_object = nil, &block)
122119
comment.format = $5.downcase
123120
next "#{$1.strip}\n"
124121
end
125-
126-
handle_directive $1, $3, $5, code_object, text.encoding, num, &block
127-
end.join
122+
handle_directive $1, $3, $5, code_object, text.encoding, &block
123+
end
128124

129125
if comment then
130126
comment.text = text
131127
else
132128
comment = text
133129
end
134130

131+
run_post_processes(comment, code_object)
132+
133+
text
134+
end
135+
136+
# Apply directives to a code object
137+
138+
def run_pre_processes(comment_text, code_object, start_line_no, type)
139+
comment_text, directives = parse_comment(comment_text, start_line_no, type)
140+
directives.each do |directive, (param, line_no)|
141+
handle_directive('', directive, param, code_object)
142+
end
143+
if code_object.is_a?(RDoc::AnyMethod) && (call_seq, = directives['call-seq']) && call_seq
144+
code_object.call_seq = call_seq.lines.map(&:chomp).reject(&:empty?).join("\n")
145+
end
146+
format, = directives['markup']
147+
[comment_text, format]
148+
end
149+
150+
# Perform post preocesses to a code object
151+
152+
def run_post_processes(comment, code_object)
135153
self.class.post_processors.each do |handler|
136154
handler.call comment, code_object
137155
end
156+
end
138157

139-
text
158+
# Parse comment and return [normalized_comment_text, directives_hash]
159+
160+
def parse_comment(text, line_no, type)
161+
RDoc::Comment.parse(text, @input_file_name, line_no, type) do |filename, prefix_indent|
162+
include_file(filename, prefix_indent, text.encoding)
163+
end
140164
end
141165

142166
##
@@ -151,7 +175,7 @@ def handle(text, code_object = nil, &block)
151175
# When 1.8.7 support is ditched prefix can be defaulted to ''
152176

153177
def handle_directive(prefix, directive, param, code_object = nil,
154-
encoding = nil, line = nil)
178+
encoding = nil)
155179
blankline = "#{prefix.strip}\n"
156180
directive = directive.downcase
157181

@@ -244,7 +268,7 @@ def handle_directive(prefix, directive, param, code_object = nil,
244268

245269
blankline
246270
else
247-
result = yield directive, param, line if block_given?
271+
result = yield directive, param if block_given?
248272

249273
case result
250274
when nil then

0 commit comments

Comments
 (0)