-
Notifications
You must be signed in to change notification settings - Fork 189
Open
Labels
Description
Describe the bug
I am using go-trafilatura to extract Text from web pages.
The output is a html.Node tree and sometimes it lets some empty Text nodes.
This causes a panic in the collapse process when converting to markdown.
runtime error: index out of range [0] with length 0
/Users/bmartinez/go/pkg/mod/github.com/!johannes!kaufmann/html-to-markdown/[v2@v2.5.0](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)/collapse/collapse.go:125 +0x628
github.com/JohannesKaufmann/html-to-markdown/v2/plugin/base.(*base).preRenderCollapse(0x1020fdb60, {0x101993228, 0x1400180a618}, 0x140034ec540)
/Users/bmartinez/go/pkg/mod/github.com/!johannes!kaufmann/html-to-markdown/[v2@v2.5.0](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)/plugin/base/base.go:88 +0xb8
github.com/JohannesKaufmann/html-to-markdown/v2/converter.(*Converter).ConvertNode(0x140019bea80, 0x140034ec540, {0x14001b05af8, 0x1, 0x1})
/Users/bmartinez/go/pkg/mod/github.com/!johannes!kaufmann/html-to-markdown/[v2@v2.5.0](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)/converter/convert.go:103
As workaround, I render the node to HTML code (string) and then parse it again.
I proposed a PR for the fix:
#196
Code Snippet
func TestCollapse_EmptyTextNode(t *testing.T) {
input := `<html><body> <span>Hello </span> <span> World </span></body></html>`
doc, err := html.Parse(strings.NewReader(input))
if err != nil {
t.Error(err)
}
for d := range doc.Descendants() {
if d.Type == html.TextNode {
d.Data = ""
break
}
}
Collapse(doc, nil)
}Generated Markdown
/Users/bmartinez/go/pkg/mod/github.com/!johannes!kaufmann/html-to-markdown/[v2@v2.5.0](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)/collapse/collapse.go:125 +0x628
github.com/JohannesKaufmann/html-to-markdown/v2/plugin/base.(*base).preRenderCollapse(0x1020fdb60, {0x101993228, 0x1400180a618}, 0x140034ec540)
/Users/bmartinez/go/pkg/mod/github.com/!johannes!kaufmann/html-to-markdown/[v2@v2.5.0](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)/plugin/base/base.go:88 +0xb8
github.com/JohannesKaufmann/html-to-markdown/v2/converter.(*Converter).ConvertNode(0x140019bea80, 0x140034ec540, {0x14001b05af8, 0x1, 0x1})
/Users/bmartinez/go/pkg/mod/github.com/!johannes!kaufmann/html-to-markdown/[v2@v2.5.0](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)/converter/convert.go:103Expected Markdown
-What plugins did you use?
Base plugin
Reactions are currently unavailable