fix: extract content from div elements inside code blocks#6
Merged
abimaelmartell merged 3 commits intomasterfrom Feb 3, 2026
Merged
fix: extract content from div elements inside code blocks#6abimaelmartell merged 3 commits intomasterfrom
abimaelmartell merged 3 commits intomasterfrom
Conversation
Previously, the inlineCodeContent walker would return early when encountering a div element, writing only a newline and skipping all children. This caused content inside syntax-highlighted code blocks to be completely lost. Many syntax highlighters wrap code in structures like: <pre><code><div class="highlight">actual code</div></code></pre> The fix allows div elements to fall through to child processing while still treating them as block-level elements that introduce line breaks. Also adds testhtml/ to .gitignore for local test files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where content inside <div> elements within code blocks was being lost during HTML to Markdown conversion. The fix allows div children to be processed while still preserving newline behavior, which is critical for syntax-highlighted code blocks that wrap content in div elements.
Changes:
- Modified
inlineCodeContentwalker to process div children instead of returning early - Updated test golden files to reflect corrected code block extraction
- Added comprehensive test coverage for various syntax highlighter patterns
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| utils.go | Changed div handling to process children while preserving newline behavior |
| testdata/TestRealWorld/snippets/pre_code/output.default.golden | Updated expected output to include previously lost code content |
| testdata/TestRealWorld/snippets/pre_code/goldmark.golden | Updated expected output to include previously lost code content |
| code_block_test.go | Added tests verifying div content extraction for multiple syntax highlighter patterns |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
<div>elements within code blocks was being completely lostinlineCodeContentwalker was returning early on div elements instead of processing childrentesthtml/to.gitignorefor local test HTML filesTest plan
TestCodeBlockDivContenttests verify the fix with multiple HTML patterns🤖 Generated with Claude Code