-
Notifications
You must be signed in to change notification settings - Fork 29
Trim links and include images #2011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+954
−17
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
0f6f690
Trim links and include images
jonathanKingston b6fc2ea
Update img tag to use alt attribute for markdown
jonathanKingston 71bf4e1
Update image rendering to use getAttributeOrBlank
jonathanKingston cf215bc
Update comments on whitespace handling
jonathanKingston 1900169
Clarify handling of excluded inert elements
jonathanKingston 9fcf84b
Update parameter type from Node to Element
jonathanKingston 8c771ff
Lint fixes
jonathanKingston 152cef4
Add fallback to body collection if main is empty
jonathanKingston c176603
Avoid triggering security errors for iframes
jonathanKingston 0041b8c
Ignore blank sandbox attr
jonathanKingston 9710ac3
Update comment for excluded inert elements handling
jonathanKingston ce3c984
Fixes sandbox attr check
jonathanKingston 86a6039
Add test cases
jonathanKingston 047130f
Lint fix
jonathanKingston File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# Page Context DOM-to-Markdown Tests | ||
|
||
This directory contains test fixtures for testing the `domToMarkdown` function from `page-context.js`. | ||
|
||
## Directory Structure | ||
|
||
- `output/` - Generated markdown files from test runs (temporary, regenerated on each run) | ||
- `expected/` - Expected markdown output files (committed to git) | ||
|
||
## How It Works | ||
|
||
The test suite (`page-context-dom.spec.js`) does the following: | ||
|
||
1. **Creates test cases** with HTML snippets and settings for `domToMarkdown` | ||
2. **Converts HTML to Markdown** using JSDom to simulate a browser environment | ||
3. **Writes output** to `output/` directory for inspection | ||
4. **Compares output** with expected files in `expected/` directory | ||
5. **Fails if different** - Any difference between output and expected causes test failure | ||
|
||
## Test Cases | ||
|
||
The suite includes 20 test cases covering: | ||
|
||
- Basic HTML elements (paragraphs, headings, lists, links, images) | ||
- Formatting (bold, italic, mixed formatting) | ||
- Complex structures (nested lists, articles, blog posts) | ||
- Edge cases (hidden content, empty links, whitespace handling) | ||
- Configuration options (max length truncation, excluded selectors, trim blank links) | ||
|
||
## Updating Expected Output | ||
|
||
When the `domToMarkdown` function behavior changes: | ||
|
||
1. Review the changes in `output/` directory | ||
2. If changes are correct, copy them to `expected/`: | ||
```bash | ||
cp unit-test/fixtures/page-context/output/*.md unit-test/fixtures/page-context/expected/ | ||
``` | ||
3. Commit the updated expected files | ||
|
||
## Running Tests | ||
|
||
```bash | ||
npm run test-unit -- unit-test/page-context-dom.spec.js | ||
``` | ||
|
||
## Why This Approach? | ||
|
||
- **Visibility**: Output files make it easy to review markdown generation | ||
- **Regression detection**: Tests fail on any unintended changes | ||
- **Documentation**: Expected files serve as examples of the function's behavior | ||
- **Easy updates**: Simple to update baselines when behavior intentionally changes | ||
|
15 changes: 15 additions & 0 deletions
15
injected/unit-test/fixtures/page-context/expected/article-structure.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Article Title | ||
By **Author Name** | ||
This is the introduction paragraph with some *emphasis*. | ||
|
||
## First Section | ||
Content of the first section. | ||
|
||
|
||
- Point one | ||
|
||
- Point two | ||
|
||
|
||
## Second Section | ||
Content with a [link](https://example.com). |
16 changes: 16 additions & 0 deletions
16
injected/unit-test/fixtures/page-context/expected/blog-post.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Blog Post Title | ||
Published on January 1, 2024 | ||
|
||
 | ||
Lorem ipsum dolor sit amet, consectetur adipiscing elit. | ||
|
||
## Key Takeaways | ||
|
||
|
||
- First takeaway | ||
|
||
- Second takeaway | ||
|
||
- Third takeaway | ||
|
||
Read more on [our blog](https://blog.example.com). |
1 change: 1 addition & 0 deletions
1
injected/unit-test/fixtures/page-context/expected/bold-and-italic.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
This is **bold** and this is *italic*. |
5 changes: 5 additions & 0 deletions
5
injected/unit-test/fixtures/page-context/expected/complex-nested.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Article Title | ||
Introduction paragraph. | ||
|
||
## Section 1 | ||
Section content with **bold** text. |
Empty file.
1 change: 1 addition & 0 deletions
1
injected/unit-test/fixtures/page-context/expected/empty-link-without-trim.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
[](https://example.com) |
2 changes: 2 additions & 0 deletions
2
injected/unit-test/fixtures/page-context/expected/excluded-selectors.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Keep this | ||
Keep this too |
5 changes: 5 additions & 0 deletions
5
injected/unit-test/fixtures/page-context/expected/headings.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Main Heading | ||
|
||
## Subheading | ||
|
||
### Sub-subheading |
1 change: 1 addition & 0 deletions
1
injected/unit-test/fixtures/page-context/expected/hidden-content.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Visible text |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
 |
3 changes: 3 additions & 0 deletions
3
injected/unit-test/fixtures/page-context/expected/line-breaks.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
First line | ||
Second line | ||
Third line |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Visit [our website](https://example.com) for more info. |
1 change: 1 addition & 0 deletions
1
injected/unit-test/fixtures/page-context/expected/max-length-truncation.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
This is a very long paragraph ... |
1 change: 1 addition & 0 deletions
1
injected/unit-test/fixtures/page-context/expected/mixed-formatting.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
This has ***bold and italic*** together. |
3 changes: 3 additions & 0 deletions
3
injected/unit-test/fixtures/page-context/expected/multiple-paragraphs.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
First paragraph. | ||
Second paragraph. | ||
Third paragraph. |
3 changes: 3 additions & 0 deletions
3
injected/unit-test/fixtures/page-context/expected/nested-lists.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
- Item 1 - Subitem 1.1 - Subitem 1.2 | ||
|
||
- Item 2 |
5 changes: 5 additions & 0 deletions
5
injected/unit-test/fixtures/page-context/expected/ordered-list.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
- First step | ||
|
||
- Second step | ||
|
||
- Third step |
1 change: 1 addition & 0 deletions
1
injected/unit-test/fixtures/page-context/expected/simple-paragraph.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
This is a simple paragraph. |
5 changes: 5 additions & 0 deletions
5
injected/unit-test/fixtures/page-context/expected/unordered-list.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
- First item | ||
|
||
- Second item | ||
|
||
- Third item |
1 change: 1 addition & 0 deletions
1
injected/unit-test/fixtures/page-context/expected/whitespace-handling.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Text with multiple spaces |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.