Fix MarkdownElementNodeParser to extract code blocks by Br1an67 · Pull Request #20840 · run-llama/llama_index

Br1an67 · 2026-03-01T17:54:45Z

Description

Fix MarkdownElementNodeParser.extract_elements() to properly extract code blocks (fenced with ```````) as code type elements instead of merging them into surrounding text.

Two issues prevented code blocks from being extracted:

Parsing: Opening backtick fences that weren't ending an existing code block fell through to a branch that either appended the line to existing text or created a new text element, instead of starting a new code block.
Post-processing: After parsing, the post-processing loop (line 269-275) converted all non-table elements to type="text", erasing the code type from correctly parsed code elements. These were then merged with adjacent text in the consecutive-text merge step.

Fixes #19085

New Package?

N/A

Version Bump?

N/A — bug fix only.

Type of Change

Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Added two tests:

test_code_block_extraction: Verifies a simple fenced code block is extracted as a code element
test_code_block_with_language: Verifies code blocks with language identifiers (``````python`) are handled

All 9 tests (7 existing + 2 new) pass.

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Two issues prevented code blocks from being parsed: 1. Opening backtick fences (```) that weren't ending an existing code block were treated as text instead of starting a new code block. 2. The post-processing loop converted all non-table elements to 'text' type, erasing the 'code' type from correctly parsed code blocks. Fix both by starting a code element on unmatched backtick fences and preserving the original element type in post-processing.

AstraBert · 2026-03-02T10:04:22Z

Hey @Br1an67 just as a heads up: you opened 5 PRs in (presumably) less than 1 hour. This behavior is borderline spamming and makes me think that there might be some crawling and AI automation behind all of these PRs (also judging from the commit messages).
We are a community of human developers and maintainers and, while AI assisted code is always welcome, human oversight is fundamental: crawling + spamming PRs is not an acceptable behavior and consequences might follow if this behavior continues on your side

AstraBert · 2026-03-02T10:07:22Z

llama-index-core/llama_index/core/node_parser/relational/markdown_element.py

-                elif currentElement is not None and currentElement.type == "text":
-                    currentElement.element += "\n" + line


Why was this eliminated?

This elif branch was the root cause of the bug. When encountering an opening backtick fence (```), if there was already a text element being accumulated, this branch would append the fence line to the text element instead of starting a new code block. By removing it, opening fences now correctly fall through to the else branch, which saves the current element and starts a new code element. This is what allows code blocks to be properly extracted rather than being swallowed into surrounding text.

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Mar 1, 2026

Br1an67 mentioned this pull request Mar 1, 2026

[Bug]: MarkdownElementParser does not extract code blocks #19085

Open

AstraBert reviewed Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MarkdownElementNodeParser to extract code blocks#20840

Fix MarkdownElementNodeParser to extract code blocks#20840
Br1an67 wants to merge 1 commit intorun-llama:mainfrom
Br1an67:fix/issue-19085-code-block-extraction

Br1an67 commented Mar 1, 2026

Uh oh!

AstraBert commented Mar 2, 2026 •

edited

Loading

Uh oh!

AstraBert Mar 2, 2026

Uh oh!

Br1an67 Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		elif currentElement is not None and currentElement.type == "text":
		currentElement.element += "\n" + line

Conversation

Br1an67 commented Mar 1, 2026

Description

Fixes #19085

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

Uh oh!

AstraBert commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AstraBert Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Br1an67 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AstraBert commented Mar 2, 2026 •

edited

Loading