Skip to content

fix: extract content from div elements inside code blocks#6

Merged
abimaelmartell merged 3 commits intomasterfrom
fix/code-block-div-content
Feb 3, 2026
Merged

fix: extract content from div elements inside code blocks#6
abimaelmartell merged 3 commits intomasterfrom
fix/code-block-div-content

Conversation

@abimaelmartell
Copy link
Member

Summary

  • Fixed bug where content inside <div> elements within code blocks was being completely lost
  • The inlineCodeContent walker was returning early on div elements instead of processing children
  • Added tests covering various syntax highlighter patterns (ReadMe, token-line divs, nested divs)
  • Added testhtml/ to .gitignore for local test HTML files

Test plan

  • All existing tests pass
  • New TestCodeBlockDivContent tests verify the fix with multiple HTML patterns
  • Verified json.html now correctly extracts JSON content from syntax-highlighted code blocks

🤖 Generated with Claude Code

Previously, the inlineCodeContent walker would return early when
encountering a div element, writing only a newline and skipping all
children. This caused content inside syntax-highlighted code blocks
to be completely lost.

Many syntax highlighters wrap code in structures like:
<pre><code><div class="highlight">actual code</div></code></pre>

The fix allows div elements to fall through to child processing while
still treating them as block-level elements that introduce line breaks.

Also adds testhtml/ to .gitignore for local test files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where content inside <div> elements within code blocks was being lost during HTML to Markdown conversion. The fix allows div children to be processed while still preserving newline behavior, which is critical for syntax-highlighted code blocks that wrap content in div elements.

Changes:

  • Modified inlineCodeContent walker to process div children instead of returning early
  • Updated test golden files to reflect corrected code block extraction
  • Added comprehensive test coverage for various syntax highlighter patterns

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

File Description
utils.go Changed div handling to process children while preserving newline behavior
testdata/TestRealWorld/snippets/pre_code/output.default.golden Updated expected output to include previously lost code content
testdata/TestRealWorld/snippets/pre_code/goldmark.golden Updated expected output to include previously lost code content
code_block_test.go Added tests verifying div content extraction for multiple syntax highlighter patterns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@abimaelmartell abimaelmartell merged commit 25e9840 into master Feb 3, 2026
1 of 4 checks passed
@abimaelmartell abimaelmartell deleted the fix/code-block-div-content branch February 3, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants