refactor(markdown-parser): promote list structural tokens from skipped trivia to explicit CST nodes by jfmcdowell · Pull Request #9274 · biomejs/biome

jfmcdowell · 2026-02-28T14:24:45Z

Note

AI Assistance Disclosure: This PR was developed with assistance from Claude Code.

Summary

Promote list structural tokens (pre-marker indent, marker, post-marker space, content indent) from skipped trivia to explicit CST nodes, mirroring the MdQuotePrefix pattern established in Phase 1.
Introduce MdListMarkerPrefix, MdIndentToken, and MdIndentTokenList to the markdown grammar, making list indentation structure visible and traversable in the CST.
Replace skip_list_marker_indent() (which discarded whitespace as trivia) with emit_indent_char_list() that wraps each indent character in a proper node.
Emit MD_LIST_POST_MARKER_SPACE as an explicit token instead of silently consuming it.
Remove trim_range() — a legacy workaround that tried to normalize node ranges by stripping leading/trailing whitespace. With structural tokens now in the CST, raw node ranges are correct by construction.
Fix to_html.rs to navigate the new CST shape (bullet.prefix().marker() instead of bullet.bullet()) and correctly handle leading newlines in list item rendering.
Add verbatim formatter stubs for all new node types.

Test Plan

cargo test -p biome_markdown_parser
just test-markdown-conformance

Results:

Parser tests pass (129 total, 0 failures)
CommonMark conformance passes (652/652, 100%)

Docs

N/A — internal parser refactor with no user-facing behavior change.

changeset-bot · 2026-02-28T14:24:49Z

⚠️ No Changeset found

Latest commit: 2bbbff0

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

…d trivia to explicit CST nodes Introduce MdListMarkerPrefix to wrap list marker structure (pre-marker indent, marker, post-marker space, content indent) as real CST nodes instead of skipped trivia. This mirrors the MdQuotePrefix pattern from Phase 1 and makes list structure visible to the formatter harness. - Add MdListMarkerPrefix, MdIndentToken, MdIndentTokenList to grammar - Replace skip_list_marker_indent() with emit_indent_char_list() - Emit MD_LIST_POST_MARKER_SPACE as an explicit token - Promote marker-only line newline to MdNewline node - Remove trim_range(); use raw node ranges for metadata recording - Fix to_html.rs renderer to handle new CST shape correctly - Add verbatim formatter stubs for new node types

coderabbitai · 2026-02-28T15:48:45Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a68fa06 and 2bbbff0.

⛔ Files ignored due to path filters (1)

crates/biome_markdown_parser/tests/md_test_suite/ok/list_marker_trailing_spaces.md.snap is excluded by !**/*.snap and included by **

📒 Files selected for processing (1)

crates/biome_markdown_parser/tests/md_test_suite/ok/list_marker_trailing_spaces.md

✅ Files skipped from review due to trivial changes (1)

crates/biome_markdown_parser/tests/md_test_suite/ok/list_marker_trailing_spaces.md

Walkthrough

This PR restructures Markdown list indentation handling by introducing a dedicated indent token system. Three new syntax nodes (MdIndentToken, MdListMarkerPrefix, MdIndentTokenList) are added to the grammar, along with corresponding formatter implementations and parser logic. The list parser now explicitly emits these nodes instead of relying on implicit trivia handling, and HTML generation is adjusted to handle the new marker structure and edge cases involving leading newlines.

Possibly related PRs

PR #8962 — Provides the formatter wiring foundation (generated.rs) that this PR directly extends with new trait implementations
PR #9228 — Modifies the same list parser file (syntax/list.rs) to refactor marker and indentation emission logic
PR #9224 — Implements an analogous indent token pattern for block quote prefixes, establishing parity across Markdown constructs

Suggested reviewers

ematipico
dyc3

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarises the primary change: promoting list structural tokens from trivia to explicit CST nodes, which aligns perfectly with the substantial refactoring across parser, formatter and grammar files.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, outlining the refactoring goals, implementation strategy, test results and the removal of legacy workarounds.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/list.rs`:
- Around line 750-757: The code currently skips emitting content indent when
first_line_empty is true, which leaves extra spaces before NEWLINE (e.g. "-  
\n") and prevents handle_first_line_marker_only from seeing NEWLINE; update the
condition in the block around emit_indent_char_list so that whenever
!setext_marker && spaces_after_marker > 1 you call emit_indent_char_list(p, 0)
(remove the first_line_empty requirement), and make the same change in the
corresponding block around lines 1015-1020; this ensures remaining whitespace is
emitted as MD_INDENT_TOKEN_LIST tokens so handle_first_line_marker_only (and
NEWLINE detection) works correctly.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 412a08d and eeedde8.

⛔ Files ignored due to path filters (17)

crates/biome_markdown_factory/src/generated/node_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_factory/src/generated/syntax_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/lazy_continuation.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/list_continuation_edge_cases.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/list_indentation.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/list_interrupt_bullet.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/list_interrupt_ordered.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/list_tightness.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/multiline_list.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/ordered_list.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph_interruption.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_edge_cases.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_syntax/src/generated/kind.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_syntax/src/generated/macros.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_syntax/src/generated/nodes.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_syntax/src/generated/nodes_mut.rs is excluded by !**/generated/**, !**/generated/** and included by **

📒 Files selected for processing (11)

crates/biome_markdown_formatter/src/generated.rs
crates/biome_markdown_formatter/src/markdown/auxiliary/indent_token.rs
crates/biome_markdown_formatter/src/markdown/auxiliary/list_marker_prefix.rs
crates/biome_markdown_formatter/src/markdown/auxiliary/mod.rs
crates/biome_markdown_formatter/src/markdown/lists/indent_token_list.rs
crates/biome_markdown_formatter/src/markdown/lists/mod.rs
crates/biome_markdown_parser/src/parser.rs
crates/biome_markdown_parser/src/syntax/list.rs
crates/biome_markdown_parser/src/to_html.rs
xtask/codegen/markdown.ungram
xtask/codegen/src/markdown_kinds_src.rs

💤 Files with no reviewable changes (1)

crates/biome_markdown_parser/src/parser.rs

crates/biome_markdown_parser/src/syntax/list.rs

dyc3

At a glance this looks like you're going in the right direction.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

crates/biome_markdown_parser/src/syntax/list.rs (1)

739-747: Keep bullet and ordered first_line_empty detection aligned.

The ordered path treats MD_HARD_LINE_LITERAL as an empty first line, but the bullet path does not. Aligning them avoids subtle divergence in marker-only handling.

Suggested fix

     let first_line_empty = if setext_marker {
         true
     } else {
         p.lookahead(|p| {
             while p.at(MD_TEXTUAL_LITERAL) && is_whitespace_only(p.cur_text()) {
                 p.bump(MD_TEXTUAL_LITERAL);
             }
-            p.at(NEWLINE) || p.at(T![EOF])
+            p.at(NEWLINE) || p.at(T![EOF]) || p.at(MD_HARD_LINE_LITERAL)
         })
     };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/list.rs` around lines 739 - 747, The
bullet-list branch computing first_line_empty diverges from the ordered-list
path by overlooking MD_HARD_LINE_LITERAL; update the lookahead in the bullet
path (the code setting first_line_empty) to treat MD_HARD_LINE_LITERAL as empty
the same way the ordered path does — i.e., inside the p.lookahead closure,
additionally consider p.at(MD_HARD_LINE_LITERAL) as a terminating/empty
condition (or skip/bump it similarly to MD_TEXTUAL_LITERAL) so that
is_whitespace_only and subsequent p.bump calls handle marker-only lines
consistently; reference the variable first_line_empty, the p.lookahead closure,
MD_TEXTUAL_LITERAL, MD_HARD_LINE_LITERAL, is_whitespace_only, and p.bump when
making the change.

crates/biome_markdown_parser/tests/spec_test.rs (1)

214-236: Consider extracting the shared parse/assert harness.

check and the bullet test body repeat the same parse → validate → render pattern. A tiny helper would keep future edge-case tests easier to add and maintain.

Also applies to: 263-290

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/tests/spec_test.rs` around lines 214 - 236,
Extract the repeated parse→validate→render logic into a small helper (e.g., fn
render_checked(input: &str) -> String) that runs parse_markdown(input), checks
!root.syntax().descendants().any(|n| n.kind().is_bogus()), asserts
root.diagnostics().is_empty(), casts with MdDocument::cast(root.syntax()), calls
document_to_html(&doc, root.list_tightness(), root.list_item_indents(),
root.quote_indents()), and returns the resulting HTML string; then update the
existing check function and the other test bodies to call this helper and only
perform the assert_eq!(expected_html, html, "...") so the parsing/validation
code isn't duplicated.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/list.rs`:
- Around line 163-182: emit_indent_char_list currently counts each '\t' as a
fixed TAB_STOP_SPACES which is wrong when preceding spaces exist; change the tab
width calculation so each tab expands to the next tab stop relative to the
current column (tab_width = TAB_STOP_SPACES - (consumed % TAB_STOP_SPACES)), use
that computed width when checking the max_columns cap and when adding to
consumed, and leave the token emission (p.bump_remap/M...complete) logic
unchanged; update the width computation in the loop inside emit_indent_char_list
accordingly.

---

Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/list.rs`:
- Around line 739-747: The bullet-list branch computing first_line_empty
diverges from the ordered-list path by overlooking MD_HARD_LINE_LITERAL; update
the lookahead in the bullet path (the code setting first_line_empty) to treat
MD_HARD_LINE_LITERAL as empty the same way the ordered path does — i.e., inside
the p.lookahead closure, additionally consider p.at(MD_HARD_LINE_LITERAL) as a
terminating/empty condition (or skip/bump it similarly to MD_TEXTUAL_LITERAL) so
that is_whitespace_only and subsequent p.bump calls handle marker-only lines
consistently; reference the variable first_line_empty, the p.lookahead closure,
MD_TEXTUAL_LITERAL, MD_HARD_LINE_LITERAL, is_whitespace_only, and p.bump when
making the change.

In `@crates/biome_markdown_parser/tests/spec_test.rs`:
- Around line 214-236: Extract the repeated parse→validate→render logic into a
small helper (e.g., fn render_checked(input: &str) -> String) that runs
parse_markdown(input), checks !root.syntax().descendants().any(|n|
n.kind().is_bogus()), asserts root.diagnostics().is_empty(), casts with
MdDocument::cast(root.syntax()), calls document_to_html(&doc,
root.list_tightness(), root.list_item_indents(), root.quote_indents()), and
returns the resulting HTML string; then update the existing check function and
the other test bodies to call this helper and only perform the
assert_eq!(expected_html, html, "...") so the parsing/validation code isn't
duplicated.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eeedde8 and 726e321.

📒 Files selected for processing (2)

crates/biome_markdown_parser/src/syntax/list.rs
crates/biome_markdown_parser/tests/spec_test.rs

coderabbitai · 2026-02-28T20:01:15Z

crates/biome_markdown_parser/src/syntax/list.rs

+fn emit_indent_char_list(p: &mut MarkdownParser, max_columns: usize) -> usize {
+    let list_m = p.start();
+    let mut consumed = 0usize;
+    while p.at(MD_TEXTUAL_LITERAL) && is_whitespace_only(p.cur_text()) {
+        let text = p.cur_text();
+        let width: usize = text
+            .chars()
+            .map(|c| if c == '\t' { TAB_STOP_SPACES } else { 1 })
+            .sum();
+        if max_columns > 0 && consumed + width > max_columns {
+            break;
+        }
+        consumed += width;
+        let char_m = p.start();
+        p.bump_remap(MD_INDENT_CHAR);
+        char_m.complete(p, MD_INDENT_TOKEN);
+    }
+    list_m.complete(p, MD_INDENT_TOKEN_LIST);
+    consumed
+}


⚠️ Potential issue | 🟡 Minor

emit_indent_char_list miscomputes tab columns after preceding spaces.

On Line 170, each tab is counted as a fixed TAB_STOP_SPACES, but tab expansion depends on the current column. For inputs like " \t", this overcounts columns and can break max_columns gating when a cap is used.

Suggested fix

fn emit_indent_char_list(p: &mut MarkdownParser, max_columns: usize) -> usize { let list_m = p.start(); let mut consumed = 0usize; while p.at(MD_TEXTUAL_LITERAL) && is_whitespace_only(p.cur_text()) { let text = p.cur_text(); - let width: usize = text - .chars() - .map(|c| if c == '\t' { TAB_STOP_SPACES } else { 1 }) - .sum(); + let mut width = 0usize; + for c in text.chars() { + width += if c == '\t' { + TAB_STOP_SPACES - ((consumed + width) % TAB_STOP_SPACES) + } else { + 1 + }; + } if max_columns > 0 && consumed + width > max_columns { break; } consumed += width; let char_m = p.start(); p.bump_remap(MD_INDENT_CHAR); char_m.complete(p, MD_INDENT_TOKEN); } list_m.complete(p, MD_INDENT_TOKEN_LIST); consumed }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/list.rs` around lines 163 - 182, emit_indent_char_list currently counts each '\t' as a fixed TAB_STOP_SPACES which is wrong when preceding spaces exist; change the tab width calculation so each tab expands to the next tab stop relative to the current column (tab_width = TAB_STOP_SPACES - (consumed % TAB_STOP_SPACES)), use that computed width when checking the max_columns cap and when adding to consumed, and leave the token emission (p.bump_remap/M...complete) logic unchanged; update the width computation in the loop inside emit_indent_char_list accordingly.

Replace programmatic assertions with a proper snapshot fixture for marker-only list items with trailing spaces. This aligns with the project's existing test convention where CST shape is validated via insta snapshots rather than ad-hoc assertions.

github-actions bot added A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools labels Feb 28, 2026

jfmcdowell force-pushed the refactor/list-marker-prefix branch 3 times, most recently from c65ad0f to c31f169 Compare February 28, 2026 14:53

jfmcdowell changed the title ~~fix(markdown-parser): restore metadata range contract for list/quote rendering~~ refactor(markdown-parser): promote list structural tokens from skipped trivia to explicit CST nodes Feb 28, 2026

jfmcdowell force-pushed the refactor/list-marker-prefix branch from c31f169 to eeedde8 Compare February 28, 2026 15:39

jfmcdowell marked this pull request as ready for review February 28, 2026 15:41

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

crates/biome_markdown_parser/src/syntax/list.rs Show resolved Hide resolved

dyc3 approved these changes Feb 28, 2026

View reviewed changes

fix(markdown-parser): handle hard-line marker-only list items

726e321

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

autofix-ci bot and others added 2 commits February 28, 2026 20:03

[autofix.ci] apply automated fixes

a68fa06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(markdown-parser): promote list structural tokens from skipped trivia to explicit CST nodes#9274

refactor(markdown-parser): promote list structural tokens from skipped trivia to explicit CST nodes#9274
jfmcdowell wants to merge 4 commits intobiomejs:mainfrom
jfmcdowell:refactor/list-marker-prefix

jfmcdowell commented Feb 28, 2026 •

edited

Loading

Uh oh!

changeset-bot bot commented Feb 28, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 28, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

dyc3 left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jfmcdowell commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Docs

Uh oh!

changeset-bot bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dyc3 left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jfmcdowell commented Feb 28, 2026 •

edited

Loading

changeset-bot bot commented Feb 28, 2026 •

edited

Loading

coderabbitai bot commented Feb 28, 2026 •

edited

Loading