Skip to content

refactor(markdown-parser): promote list structural tokens from skipped trivia to explicit CST nodes#9274

Open
jfmcdowell wants to merge 1 commit intobiomejs:mainfrom
jfmcdowell:refactor/list-marker-prefix
Open

refactor(markdown-parser): promote list structural tokens from skipped trivia to explicit CST nodes#9274
jfmcdowell wants to merge 1 commit intobiomejs:mainfrom
jfmcdowell:refactor/list-marker-prefix

Conversation

@jfmcdowell
Copy link
Contributor

@jfmcdowell jfmcdowell commented Feb 28, 2026

Note

AI Assistance Disclosure: This PR was developed with assistance from Claude Code.

Summary

  • Promote list structural tokens (pre-marker indent, marker, post-marker space, content indent) from skipped trivia to explicit CST nodes, mirroring the MdQuotePrefix pattern established in Phase 1.
  • Introduce MdListMarkerPrefix, MdIndentToken, and MdIndentTokenList to the markdown grammar, making list indentation structure visible and traversable in the CST.
  • Replace skip_list_marker_indent() (which discarded whitespace as trivia) with emit_indent_char_list() that wraps each indent character in a proper node.
  • Emit MD_LIST_POST_MARKER_SPACE as an explicit token instead of silently consuming it.
  • Remove trim_range() — a legacy workaround that tried to normalize node ranges by stripping leading/trailing whitespace. With structural tokens now in the CST, raw node ranges are correct by construction.
  • Fix to_html.rs to navigate the new CST shape (bullet.prefix().marker() instead of bullet.bullet()) and correctly handle leading newlines in list item rendering.
  • Add verbatim formatter stubs for all new node types.

Test Plan

  • cargo test -p biome_markdown_parser
  • just test-markdown-conformance

Results:

  • Parser tests pass (129 total, 0 failures)
  • CommonMark conformance passes (652/652, 100%)

Docs

N/A — internal parser refactor with no user-facing behavior change.

@changeset-bot
Copy link

changeset-bot bot commented Feb 28, 2026

⚠️ No Changeset found

Latest commit: eeedde8

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools labels Feb 28, 2026
@jfmcdowell jfmcdowell force-pushed the refactor/list-marker-prefix branch 3 times, most recently from c65ad0f to c31f169 Compare February 28, 2026 14:53
@jfmcdowell jfmcdowell changed the title fix(markdown-parser): restore metadata range contract for list/quote rendering refactor(markdown-parser): promote list structural tokens from skipped trivia to explicit CST nodes Feb 28, 2026
…d trivia to explicit CST nodes

Introduce MdListMarkerPrefix to wrap list marker structure (pre-marker
indent, marker, post-marker space, content indent) as real CST nodes
instead of skipped trivia. This mirrors the MdQuotePrefix pattern from
Phase 1 and makes list structure visible to the formatter harness.

- Add MdListMarkerPrefix, MdIndentToken, MdIndentTokenList to grammar
- Replace skip_list_marker_indent() with emit_indent_char_list()
- Emit MD_LIST_POST_MARKER_SPACE as an explicit token
- Promote marker-only line newline to MdNewline node
- Remove trim_range(); use raw node ranges for metadata recording
- Fix to_html.rs renderer to handle new CST shape correctly
- Add verbatim formatter stubs for new node types
@jfmcdowell jfmcdowell force-pushed the refactor/list-marker-prefix branch from c31f169 to eeedde8 Compare February 28, 2026 15:39
@jfmcdowell jfmcdowell marked this pull request as ready for review February 28, 2026 15:41
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 28, 2026

Walkthrough

This pull request introduces structured representation of Markdown indentation and list marker prefixes. The parser changes convert previously skipped range trivia into explicit CST nodes (MdIndentToken, MdListMarkerPrefix, MdIndentTokenList), refactoring list item parsing to emit these nodes directly. The grammar is updated to decompose list bullets into prefix structures containing pre-marker indent, marker, post-marker space, and content indent. Corresponding formatter implementations and HTML generation logic are updated to handle these new node types.

Possibly related PRs

Suggested reviewers

  • ematipico
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: promoting list structural tokens from trivia to explicit CST nodes, which is reflected throughout the changeset.
Description check ✅ Passed The description comprehensively relates to the changeset, detailing the promotion of structural tokens, new grammar nodes, refactored parsing logic, and testing results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/list.rs`:
- Around line 750-757: The code currently skips emitting content indent when
first_line_empty is true, which leaves extra spaces before NEWLINE (e.g. "-  
\n") and prevents handle_first_line_marker_only from seeing NEWLINE; update the
condition in the block around emit_indent_char_list so that whenever
!setext_marker && spaces_after_marker > 1 you call emit_indent_char_list(p, 0)
(remove the first_line_empty requirement), and make the same change in the
corresponding block around lines 1015-1020; this ensures remaining whitespace is
emitted as MD_INDENT_TOKEN_LIST tokens so handle_first_line_marker_only (and
NEWLINE detection) works correctly.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 412a08d and eeedde8.

⛔ Files ignored due to path filters (17)
  • crates/biome_markdown_factory/src/generated/node_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_factory/src/generated/syntax_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/lazy_continuation.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/list_continuation_edge_cases.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/list_indentation.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/list_interrupt_bullet.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/list_interrupt_ordered.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/list_tightness.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/multiline_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/ordered_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph_interruption.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_edge_cases.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_syntax/src/generated/kind.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_syntax/src/generated/macros.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_syntax/src/generated/nodes.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_syntax/src/generated/nodes_mut.rs is excluded by !**/generated/**, !**/generated/** and included by **
📒 Files selected for processing (11)
  • crates/biome_markdown_formatter/src/generated.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/indent_token.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/list_marker_prefix.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/mod.rs
  • crates/biome_markdown_formatter/src/markdown/lists/indent_token_list.rs
  • crates/biome_markdown_formatter/src/markdown/lists/mod.rs
  • crates/biome_markdown_parser/src/parser.rs
  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/src/to_html.rs
  • xtask/codegen/markdown.ungram
  • xtask/codegen/src/markdown_kinds_src.rs
💤 Files with no reviewable changes (1)
  • crates/biome_markdown_parser/src/parser.rs

Comment on lines +750 to +757
// Content indent (remaining whitespace tokens on first line)
if !setext_marker && !first_line_empty && spaces_after_marker > 1 {
emit_indent_char_list(p, 0);
} else {
// Empty first line or no content indent -- emit empty MdIndentTokenList
let empty_m = p.start();
empty_m.complete(p, MD_INDENT_TOKEN_LIST);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Marker-only lines with extra spaces can miss the newline path.

On Line 751 and Line 1015, content indent is skipped when first_line_empty is true. For inputs like - \n, only one post-marker space is consumed, leaving remaining whitespace before NEWLINE. Because handle_first_line_marker_only on Line 1540 requires being at NEWLINE, marker-only handling can be bypassed.

💡 Suggested fix
-    if !setext && !first_line_empty && spaces_after_marker > 1 {
+    if !setext && spaces_after_marker > 1 {
         emit_indent_char_list(p, 0);
     } else {
         // Empty first line or no content indent -- emit empty MdIndentTokenList
         let empty_m = p.start();
         empty_m.complete(p, MD_INDENT_TOKEN_LIST);
     }
-    if !first_line_empty && spaces_after_marker > 1 {
+    if spaces_after_marker > 1 {
         emit_indent_char_list(p, 0);
     } else {
         let empty_m = p.start();
         empty_m.complete(p, MD_INDENT_TOKEN_LIST);
     }

Also applies to: 1015-1020

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/list.rs` around lines 750 - 757, The
code currently skips emitting content indent when first_line_empty is true,
which leaves extra spaces before NEWLINE (e.g. "-   \n") and prevents
handle_first_line_marker_only from seeing NEWLINE; update the condition in the
block around emit_indent_char_list so that whenever !setext_marker &&
spaces_after_marker > 1 you call emit_indent_char_list(p, 0) (remove the
first_line_empty requirement), and make the same change in the corresponding
block around lines 1015-1020; this ensures remaining whitespace is emitted as
MD_INDENT_TOKEN_LIST tokens so handle_first_line_marker_only (and NEWLINE
detection) works correctly.

Copy link
Contributor

@dyc3 dyc3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance this looks like you're going in the right direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Formatter Area: formatter A-Parser Area: parser A-Tooling Area: internal tools

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants