refactor(markdown-parser): promote list structural tokens from skipped trivia to explicit CST nodes#9274
Conversation
|
c65ad0f to
c31f169
Compare
…d trivia to explicit CST nodes Introduce MdListMarkerPrefix to wrap list marker structure (pre-marker indent, marker, post-marker space, content indent) as real CST nodes instead of skipped trivia. This mirrors the MdQuotePrefix pattern from Phase 1 and makes list structure visible to the formatter harness. - Add MdListMarkerPrefix, MdIndentToken, MdIndentTokenList to grammar - Replace skip_list_marker_indent() with emit_indent_char_list() - Emit MD_LIST_POST_MARKER_SPACE as an explicit token - Promote marker-only line newline to MdNewline node - Remove trim_range(); use raw node ranges for metadata recording - Fix to_html.rs renderer to handle new CST shape correctly - Add verbatim formatter stubs for new node types
c31f169 to
eeedde8
Compare
WalkthroughThis pull request introduces structured representation of Markdown indentation and list marker prefixes. The parser changes convert previously skipped range trivia into explicit CST nodes (MdIndentToken, MdListMarkerPrefix, MdIndentTokenList), refactoring list item parsing to emit these nodes directly. The grammar is updated to decompose list bullets into prefix structures containing pre-marker indent, marker, post-marker space, and content indent. Corresponding formatter implementations and HTML generation logic are updated to handle these new node types. Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_markdown_parser/src/syntax/list.rs`:
- Around line 750-757: The code currently skips emitting content indent when
first_line_empty is true, which leaves extra spaces before NEWLINE (e.g. "-
\n") and prevents handle_first_line_marker_only from seeing NEWLINE; update the
condition in the block around emit_indent_char_list so that whenever
!setext_marker && spaces_after_marker > 1 you call emit_indent_char_list(p, 0)
(remove the first_line_empty requirement), and make the same change in the
corresponding block around lines 1015-1020; this ensures remaining whitespace is
emitted as MD_INDENT_TOKEN_LIST tokens so handle_first_line_marker_only (and
NEWLINE detection) works correctly.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (17)
crates/biome_markdown_factory/src/generated/node_factory.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_factory/src/generated/syntax_factory.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/lazy_continuation.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/list_continuation_edge_cases.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/list_indentation.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/list_interrupt_bullet.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/list_interrupt_ordered.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/list_tightness.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/multiline_list.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/ordered_list.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph_interruption.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_edge_cases.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_syntax/src/generated/kind.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_syntax/src/generated/macros.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_syntax/src/generated/nodes.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_syntax/src/generated/nodes_mut.rsis excluded by!**/generated/**,!**/generated/**and included by**
📒 Files selected for processing (11)
crates/biome_markdown_formatter/src/generated.rscrates/biome_markdown_formatter/src/markdown/auxiliary/indent_token.rscrates/biome_markdown_formatter/src/markdown/auxiliary/list_marker_prefix.rscrates/biome_markdown_formatter/src/markdown/auxiliary/mod.rscrates/biome_markdown_formatter/src/markdown/lists/indent_token_list.rscrates/biome_markdown_formatter/src/markdown/lists/mod.rscrates/biome_markdown_parser/src/parser.rscrates/biome_markdown_parser/src/syntax/list.rscrates/biome_markdown_parser/src/to_html.rsxtask/codegen/markdown.ungramxtask/codegen/src/markdown_kinds_src.rs
💤 Files with no reviewable changes (1)
- crates/biome_markdown_parser/src/parser.rs
| // Content indent (remaining whitespace tokens on first line) | ||
| if !setext_marker && !first_line_empty && spaces_after_marker > 1 { | ||
| emit_indent_char_list(p, 0); | ||
| } else { | ||
| // Empty first line or no content indent -- emit empty MdIndentTokenList | ||
| let empty_m = p.start(); | ||
| empty_m.complete(p, MD_INDENT_TOKEN_LIST); | ||
| } |
There was a problem hiding this comment.
Marker-only lines with extra spaces can miss the newline path.
On Line 751 and Line 1015, content indent is skipped when first_line_empty is true. For inputs like - \n, only one post-marker space is consumed, leaving remaining whitespace before NEWLINE. Because handle_first_line_marker_only on Line 1540 requires being at NEWLINE, marker-only handling can be bypassed.
💡 Suggested fix
- if !setext && !first_line_empty && spaces_after_marker > 1 {
+ if !setext && spaces_after_marker > 1 {
emit_indent_char_list(p, 0);
} else {
// Empty first line or no content indent -- emit empty MdIndentTokenList
let empty_m = p.start();
empty_m.complete(p, MD_INDENT_TOKEN_LIST);
}- if !first_line_empty && spaces_after_marker > 1 {
+ if spaces_after_marker > 1 {
emit_indent_char_list(p, 0);
} else {
let empty_m = p.start();
empty_m.complete(p, MD_INDENT_TOKEN_LIST);
}Also applies to: 1015-1020
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/biome_markdown_parser/src/syntax/list.rs` around lines 750 - 757, The
code currently skips emitting content indent when first_line_empty is true,
which leaves extra spaces before NEWLINE (e.g. "- \n") and prevents
handle_first_line_marker_only from seeing NEWLINE; update the condition in the
block around emit_indent_char_list so that whenever !setext_marker &&
spaces_after_marker > 1 you call emit_indent_char_list(p, 0) (remove the
first_line_empty requirement), and make the same change in the corresponding
block around lines 1015-1020; this ensures remaining whitespace is emitted as
MD_INDENT_TOKEN_LIST tokens so handle_first_line_marker_only (and NEWLINE
detection) works correctly.
dyc3
left a comment
There was a problem hiding this comment.
At a glance this looks like you're going in the right direction.
Note
AI Assistance Disclosure: This PR was developed with assistance from Claude Code.
Summary
MdQuotePrefixpattern established in Phase 1.MdListMarkerPrefix,MdIndentToken, andMdIndentTokenListto the markdown grammar, making list indentation structure visible and traversable in the CST.skip_list_marker_indent()(which discarded whitespace as trivia) withemit_indent_char_list()that wraps each indent character in a proper node.MD_LIST_POST_MARKER_SPACEas an explicit token instead of silently consuming it.trim_range()— a legacy workaround that tried to normalize node ranges by stripping leading/trailing whitespace. With structural tokens now in the CST, raw node ranges are correct by construction.to_html.rsto navigate the new CST shape (bullet.prefix().marker()instead ofbullet.bullet()) and correctly handle leading newlines in list item rendering.Test Plan
cargo test -p biome_markdown_parserjust test-markdown-conformanceResults:
129total,0failures)652/652,100%)Docs
N/A — internal parser refactor with no user-facing behavior change.