Skip to content

Commit 443f2f5

Browse files
committed
fix: codeblock without "<" consumes extra char
Problem: If a codeblock does not have a terminating "<" char, it consumes the first char of the next token. Solution: Define (codeblock) only in terms of its lines; it doesn't need to look for its "end". Instead, add $._line_end_codeblock to the list of things that can terminate a (block).
1 parent eee1c58 commit 443f2f5

File tree

7 files changed

+150
-65
lines changed

7 files changed

+150
-65
lines changed

README.md

Lines changed: 28 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -7,48 +7,46 @@ well-formed; the _input_ (vimdoc) is secondary. The first step should always be
77
to try to fix the input (within reason) rather than insist on a grammar that
88
handles vimdoc's endless quirks.
99

10-
Notes
11-
-----
10+
Overview
11+
--------
1212

1313
- vimdoc format "spec":
1414
- [:help help-writing](https://neovim.io/doc/user/helphelp.html#help-writing)
1515
- https://github.com/nanotee/vimdoc-notes
16-
- whitespace is intentionally captured in `(word)`, because it is often necessary to be
17-
able to correctly layout vim help files (especially old/legacy).
18-
- `(codeblock)` is contained by `(line)` because `>` can start a code block at the end of a line.
19-
- `(column_heading)` is contained by `(line)` because `>` (to close
20-
a `(codeblock)` can appear at the start of `(column_heading)`.
21-
- `h1` ("Heading 1"): `======` followed by text and optional `*tags*`.
22-
- `h2` ("Heading 2"): `------` followed by text and optional `*tags*`.
23-
- `h3` ("Heading 3"): only UPPERCASE WORDS, followed by optional `*tags*`.
16+
- whitespace is intentionally captured in all atoms, because it is often used
17+
for "layout" and ascii art in legacy help files.
18+
- `block` is the main top-level node which contains `line` nodes.
19+
- ends at blank line(s) or a line starting with `<`.
20+
- `line`:
21+
- contains atoms (words, tags, taglinks, …).
22+
- contains `codeblock` because `>` can start a codeblock at the end of a line.
23+
- contains `column_heading` because `<` (the `codeblock` terminating char)
24+
can appear at the start of a `column_heading`.
25+
- `codeblock`:
26+
- contains `line` nodes which do not contain `word` nodes, it's just the full
27+
raw text line including whitespace. This is somewhat dictated by its
28+
"preformatted" nature; parsing the contents would require loading a "child"
29+
language (injection). See [#2](https://github.com/neovim/tree-sitter-vimdoc/issues/2).
30+
- the terminating `<` (and any following whitespace) is discarded (anonymous).
31+
- `h1` = "Heading 1": `======` followed by text and optional `*tags*`.
32+
- `h2` = "Heading 2": `------` followed by text and optional `*tags*`.
33+
- `h3` = "Heading 3": only UPPERCASE WORDS, followed by optional `*tags*`.
2434

2535
Known issues
2636
------------
2737

28-
- `line_li` ("list item") is _experimental_. It doesn't support nesting yet and
29-
it may not work well; you can treat it as a normal `line` for layout purposes.
30-
- `codeblock` ">" must not be preceded only by tabs, a space char is required (" >").
31-
See `:help lcs-tab` for example. Currently the grammar doesn't enforce this.
32-
- `codeblock` terminated by an "implicit stop" (i.e. no terminating `<`)
33-
consumes the first char of the terminating line, and continues the parent
34-
`block`, preventing top-level forms like `h1`, `h2` from being recognized
35-
until a blank line is encountered.
36-
- `line` in a `codeblock` does not contain `word` atoms, it's just the full
37-
raw text line including whitespace. This is somewhat dictated by its
38-
"preformatted" nature; parsing the contents would require loading a "child"
39-
language (injection). See [#2](https://github.com/vigoux/tree-sitter-vimdoc/issues/2).
38+
- `line_li` ("list item") is experimental. It doesn't support nesting yet.
39+
- Spec requires that `codeblock` delimiter ">" must be preceded by a space
40+
(" >"), not a tab. But currently the grammar doesn't enforce this. Example:
41+
`:help lcs-tab`.
42+
- `codeblock` terminated by an "implicit stop" (no terminating `<`) consumes
43+
blank lines, preventing top-level forms like `h1` from being recognized.
4044
- `url` doesn't handle _surrounding_ parens. E.g. `(https://example.com/#yay)` yields `word`
4145
- `url` doesn't handle _nested_ parens. E.g. `(https://example.com/(foo)#yay)`
42-
- Ideally `block_end` should consume the last block of the document _only_ if that
43-
block is missing a trailing blank line or EOL ("\n").
44-
- TODO: consider simply _not supporting_ docs without EOL?
45-
- Ideally `line_noeol` should consume the last line of the document _only_ if
46-
that line is missing EOL ("\n").
47-
- TODO: consider simply _not supporting_ docs without EOL?
4846

4947
TODO
5048
----
5149

5250
- `line_noeol` is a special-case to support documents that don't end in EOL.
53-
Grammar could be a bit simpler if we just require EOL at end of document.
54-
- `line_modeline` (only at EOF)
51+
Grammar could be simpler if we require EOL at end of document.
52+
- `line_modeline` ?

corpus/arguments.txt

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,10 @@ NOT an argument
4343
(line
4444
(argument
4545
(word)
46-
(ERROR))
46+
(MISSING "}"))
47+
(word)
48+
(argument
49+
(word))
4750
(word)
4851
(codespan
4952
(word))

corpus/codeblock.txt

Lines changed: 82 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@ block3:
2828
(word))
2929
(line
3030
(codeblock
31-
(line)))
31+
(line))))
32+
(block
3233
(line
3334
(word)))
3435
(block
@@ -100,6 +101,9 @@ codeblock with implicit stop (FIXME)
100101

101102
-------------------------------
102103
h1-headline *foo*
104+
line1
105+
106+
line2
103107

104108
-------------------------------
105109
h1-headline *foo*
@@ -118,7 +122,12 @@ h1-headline *foo*
118122
(line
119123
(word)
120124
(tag
121-
(word))))
125+
(word)))
126+
(line
127+
(word)))
128+
(block
129+
(line
130+
(word)))
122131
(h2
123132
(word)
124133
(tag
@@ -155,7 +164,9 @@ x
155164
(line)
156165
(line)
157166
(line)
158-
(line)))))
167+
(line)))
168+
(line
169+
(word))))
159170

160171
================================================================================
161172
tricky codeblock
@@ -166,7 +177,17 @@ tricky codeblock
166177
< line3
167178
<
168179

180+
Example: >
181+
182+
vim.spell.check()
183+
-->
184+
{
185+
{'quik', 'bad', 4}
186+
}
187+
<
188+
169189
tricky
190+
170191
--------------------------------------------------------------------------------
171192

172193
(help_file
@@ -176,6 +197,16 @@ tricky
176197
(line)
177198
(line)
178199
(line))))
200+
(block
201+
(line
202+
(word)
203+
(codeblock
204+
(line)
205+
(line)
206+
(line)
207+
(line)
208+
(line)
209+
(line))))
179210
(block
180211
(line
181212
(word))))
@@ -243,3 +274,51 @@ To test for a non-empty string, use empty(): >
243274
(word)
244275
(codeblock
245276
(line)))))
277+
278+
================================================================================
279+
codeblock stop and start on same line
280+
================================================================================
281+
Examples: >
282+
:lua vim.api.nvim_command('echo "Hello, Nvim!"')
283+
< LuaJIT: >
284+
:lua =jit.version
285+
<
286+
*:lua-heredoc*
287+
:lua << [endmarker]
288+
{script}
289+
290+
Example: >
291+
lua << EOF
292+
EOF
293+
<
294+
295+
--------------------------------------------------------------------------------
296+
297+
(help_file
298+
(block
299+
(line
300+
(word)
301+
(codeblock
302+
(line))))
303+
(block
304+
(line
305+
(word)
306+
(codeblock
307+
(line))))
308+
(block
309+
(line
310+
(tag
311+
(word)))
312+
(line
313+
(word)
314+
(word)
315+
(word))
316+
(line
317+
(argument
318+
(word))))
319+
(block
320+
(line
321+
(word)
322+
(codeblock
323+
(line)
324+
(line)))))

corpus/codespan.txt

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -46,11 +46,10 @@ an error`.
4646
(word))))
4747

4848
================================================================================
49-
NOT a codespan
49+
NOT codespan
5050
================================================================================
51-
*'* *'a* *`* *`a*
52-
'{a-z} `{a-z} Jump to the mark.
53-
*g'* *g'a* *g`* *g`a*
51+
*'* *'a* *`* *`a*
52+
*g'* *g'a* *g`* *g`a*
5453
g'{mark} g`{mark}
5554

5655
--------------------------------------------------------------------------------
@@ -66,14 +65,6 @@ g'{mark} g`{mark}
6665
(word))
6766
(tag
6867
(word)))
69-
(ERROR)
70-
(line
71-
(argument
72-
(word))
73-
(word)
74-
(word)
75-
(word)
76-
(word))
7768
(line
7869
(tag
7970
(word))

corpus/heading3-column_heading.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,8 @@ column_heading should NOT parse atoms (links, tags, etc.) (FIXME)
133133
(line
134134
(word)
135135
(codeblock
136-
(line)))
136+
(line))))
137+
(block
137138
(line
138139
(column_heading
139140
(optionlink

corpus/optionlink.txt

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ world 'hello' world
2929
(word))))
3030

3131
================================================================================
32-
NOT an optionlink #7 #14
32+
NOT optionlink #7 #14
3333
================================================================================
3434

3535
Let's see if that works.
@@ -85,3 +85,14 @@ number: '04' 'ISO-10646-1' 'python3'
8585
(word)
8686
(word)
8787
(word))))
88+
89+
================================================================================
90+
NOT optionlink (FIXME)
91+
================================================================================
92+
93+
'{a-z} `{a-z} Jump to the mark.
94+
95+
--------------------------------------------------------------------------------
96+
97+
(help_file
98+
(ERROR))

grammar.js

Lines changed: 19 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
// https://tree-sitter.github.io/tree-sitter/creating-parsers
2+
// - Rules starting with underscore are hidden in the syntax tree.
3+
14
const _uppercase_word = /[A-Z0-9.()][-A-Z0-9.()_]+/;
25

36
module.exports = grammar({
@@ -70,39 +73,42 @@ module.exports = grammar({
7073
)),
7174

7275
// Text block/paragraph: adjacent lines followed by blank line(s).
73-
block: ($) => prec.right(seq(
74-
repeat1(choice($.line, $.line_li)),
75-
repeat1(_blank()),
76-
)
76+
block: ($) => seq(
77+
repeat1(choice($.line, $.line_li)),
78+
choice(
79+
token.immediate('<'), // Eat codeblock terminating char.
80+
$._blank),
81+
repeat($._blank),
7782
),
7883
// Special case: last block in the document may not end with blank line (nor even EOL).
79-
block_end: ($) => prec.right(choice(
84+
block_end: ($) => choice(
8085
choice(
8186
alias($.line_noeol, $.line),
8287
alias($.line_li_noeol, $.line_li)),
8388
seq(repeat1(choice($.line, $.line_li)),
8489
choice(
8590
alias($.line_noeol, $.line),
86-
alias($.line_li_noeol, $.line_li))))
91+
alias($.line_li_noeol, $.line_li)))
8792
),
8893

89-
// Code block: preformatted lines delimited by ">" and "<".
94+
// Codeblock: preformatted block of lines starting with ">".
9095
codeblock: ($) => prec.right(seq(
9196
/>[\t ]*\n/,
9297
repeat1(alias($.line_code, $.line)),
93-
// Code block ends if a line starts with "<" or a non-empty line starts with a visible char.
94-
token.immediate(choice(/<[\t ]*\n/, /[^\t\n ]/)),
98+
// Codeblock ends if a line starts with non-whitespace.
99+
// The terminating "<" is consumed in other rules.
95100
)),
96101

97102
// Lines.
103+
_blank: () => field('blank', /[\t ]*\n/),
98104
line: ($) => _line($, true),
99105
line_noeol: ($) => _line($, false),
100106
// Listitem line.
101107
line_li: ($) => seq(/[-*+][ ]+/, repeat1($._atom), '\n'),
102108
line_li_noeol: ($) => seq(/[-*+][ ]+/, repeat1($._atom)),
103109
// Codeblock lines: must be indented by at least 1 space/tab.
104110
// Line content (incl. whitespace) is captured as a single atom.
105-
line_code: () => choice('\n', seq(/[\t ]+[^\n]+/, /\n/)),
111+
line_code: () => choice('\n', /[\t ]+[^\n]+\n/),
106112

107113
// "Column heading": plaintext followed by "~".
108114
// Intended for table column names per `:help help-writing`.
@@ -117,15 +123,15 @@ module.exports = grammar({
117123
token.immediate(field('delimiter', /============+[\t ]*\n/)),
118124
repeat1($._atom),
119125
'\n',
120-
repeat(_blank()),
126+
repeat($._blank),
121127
),
122128

123129
h2: ($) =>
124130
seq(
125131
token.immediate(field('delimiter', /------------+[\t ]*\n/)),
126132
repeat1($._atom),
127133
'\n',
128-
repeat(_blank()),
134+
repeat($._blank),
129135
),
130136

131137
// Heading 3: UPPERCASE NAME, followed by optional *tags*.
@@ -134,7 +140,7 @@ module.exports = grammar({
134140
field('name', $.uppercase_name),
135141
repeat($.tag),
136142
'\n',
137-
repeat(_blank()),
143+
repeat($._blank),
138144
),
139145

140146
tag: ($) => _word($,
@@ -185,7 +191,3 @@ function _line($, require_eol) {
185191
seq(optional($.uppercase_words), repeat1($._atom), choice($.codeblock, eol)),
186192
);
187193
}
188-
189-
function _blank() {
190-
return field('blank', /[\t ]*\n/);
191-
}

0 commit comments

Comments
 (0)