You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix comment directive parsing problem
# Problem of comment parsing
The main problem is that `@preprocess.handle` parses comment, removes
directive, and process code_object at the same time.
This pull request change RDoc to parse comment and extract directives
first, and then apply directives to code object.
## Flow of legacy RDoc parsing method
For example parsing this code
```ruby
class A
# :yields: x, y
# :args: a, b
# :call-seq:
#--
# :not-new:
# :category: foobar
#++
# initialize(x, y, z)
def initialize(*args, &block); end
end
```
### Step 1
RDoc performs `@preprocess.hanlde` to RDoc::NormalClass.
- `:category:` is applied to klass and replaced with blank line
- `:not-new:` and `:yields:` are replaced with blank line. maybe bug.
- `:args: a, b` is replaced with `:args: a, b`
### Step 2
RDoc performs `@preprocess.hanlde` to RDoc::AnyMethod.
`:args: a, b` is applied to `meth.params`.
### Step 3
RDoc removes private section that starts with `#--` and ends with `#++`.
### Step 4
RDoc normalizes comment by removing `#` and indentation.
### Step 5
RDoc extracts `":call-seq:\n initialize(x, y, z)` from comment and apply
to method object.
## Problems
RDoc removes directives and expand `:include:` twice in some case, and
once in other case.
To avoid all directives removed in the first `@preprocess.handle`,
preprocess needs directive-replace mechanizm which is making things
complex.
Private section and call-seq are processed later. This is making RDoc
accept weird comment like directive inside private section and private
section inside call-seq.
Handling meta programming method is also hard.
`@preprocess.handle(comment, code_object)` requires code object already
created.
We need to parse the comment to know the code object type (method or
attribute). After that, we can finally parse the comment with the code
object.
C comments are also complicated. :include: can include text containing
`*/`.
Removing directive line and private section from the comment might
remove `/*` and `*/` which makes normalize_comment fail.
The original implementation was avoiding this by using different
processing order than ruby parser. This is not consistent.
# Solution
We need to parse comment first and only once to extract directives.
Expand `:include:`, read directives (including `:call-seq:`), remove
private section at the same time.
Comment parser should return normalized comment text and directives as
an attribute hash. Directive should also contain line number.
# Changed things
## :call-seq:
New type of directive called "multiline directive" is introduced to make
`:call-seq:` also a directive.
```
# :multiline-directive:
# html
# head
# title
#
# body
# header
# footer
```
Multiline directive ends with blank line. This restriction is for
compatibility with old RDoc.
Some invalid multiline directive (unindented, ends with other directive)
is also accepted with warning.
The resuld of parsing this call-seq is changed. I think it get better.
```
# :call-seq:
# STDIN.getc() -> string # Only this line was call-seq
#
# STDIN.getc(a) -> string
#
# STDIN.getc(a, b) -> string
# $stdin.getc(c) -> string # It's now call-seq until this line
#
# :other:
```
## Private section
`#----foobar` was accepted as private section start.
`#++++foobar` was decomposed to `#++`(private end) and `++foobar`(normal
comment).
Start is now `/^#-{2,}$/` (two or more -), end is now `/^#\+{2}$/`
(exactly two +).
## Unhandled directives
In old RDoc, unhandled directive `# :unknown: foo` remain in normal
comment.
Now it is removed just like other directives. Unhandled directive is
appended to code object's metadata. It does not make sence to leave
metadata in the comment. I think this was just a side effect of avoiding
double parsing problem.
## Normalize and remove private section
Everything is done in parse phase
## C and Simple parser
C used to accept `/*\n# :directive:\n*/` but now only accepts `*
:directive:`.
Changes for call-seq, private section and unhandled directive described
above are also applied to C and Simple parser.
# Old comment parsing
`RDoc::Markup::PreProcess#handle` `RDoc::Comment#extract_call_seq`
`RDoc::Comment#remove_private` is only used from `RDoc::Parser::Ruby`.
We can remove them in the future.
# Diff (updated: 2025/02/02)
I compared generated html files of rdoc itself and in `ruby/ruby`.
## HTML meta tag content (ruby/ruby)
Files:
```
Date/Error.html
Enumerator/Generator.html
Enumerator/Producer.html
Enumerator/Yielder.html
Fiddle/Pointer.html
UnicodeNormalize.html
```
Example diff
```html
<meta name="description" content="class Date::Error: Exception for invalid date/time ">
↓
<meta name="description" content="class Date::Error: Exception for invalid date/time">
```
## OpenSSL/Timestamp/Factory.html (ruby/ruby)
This invalid document is parsed differentl
```c
/* Document-class: OpenSSL::Timestamp::Factory
* Document for default_policy_id
* call-seq:
* factory.default_policy_id = "string" -> string
* Document for serial_number
* call-seq:
* factory.serial_number = number -> number
* Document for gen_time
* call-seq:
* factory.gen_time = Time -> Time
*/
```
## Win32.html (ruby/ruby, RDOC_USE_PRISM_PARSER)
This will no longer considered to be a private section(invisible comment
surrounded by -- and ++)
```
--- info
--- num_keys
```
## History_rdoc.html (ruby/rdoc)
Parsing this part is improved.
```md
* Bug fixes
* `ri []` and other special methods now work properly. Issue #52 by
ddebernardy.
* `ri` now has space between class comments from multiple files.
* :stopdoc: no longer creates Object references. Issue #55 by Simon Chiang
* :nodoc: works on class aliases now. Issue #51 by Steven G. Harms
* Remove tokenizer restriction on header lengths for verbatim sections.
Issue #49 by trans
```
The [current
document](https://ruby.github.io/rdoc/History_rdoc.html#label-3.9+-2F+2011-07-30)
looks like `* :stopdoc:` and `* :nodoc:` was processed as directive.
## lib/rdoc/markdown_kpeg.html (ruby/rdoc)
Maybe it shouldn't be documented.
https://ruby.github.io/rdoc/lib/rdoc/markdown_kpeg.html
## RDoc/MarkupReference.html (ruby/rdoc, RDOC_USE_PRISM_PARSER)
`<pre>:call-seq: ` → `<pre>:call-seq:` (trailing space removed)
## RDoc/Parser/Ruby.html (ruby/rdoc, RDOC_USE_PRISM_PARSER)
Escape of `# \:method: or :attr: directives in +comment+.` is now
working.
Note that this is related to an old bug in master branch
```
class Foo
# A string constant with
# \:nodoc: (this is documented. :nodoc: is escaped)
A = ':nodoc:
# Prints the word
# \:nodoc: (this method is not documented. :nodoc: is not escaped)
def print_colon_nodoc = puts(':nodoc:')
end
```
0 commit comments