Skip to content

[Bug]: Panic with empty text node #197

@brunomartinez-lmi

Description

@brunomartinez-lmi

Describe the bug

I am using go-trafilatura to extract Text from web pages.
The output is a html.Node tree and sometimes it lets some empty Text nodes.

This causes a panic in the collapse process when converting to markdown.

runtime error: index out of range [0] with length 0

	/Users/bmartinez/go/pkg/mod/github.com/!johannes!kaufmann/html-to-markdown/[v2@v2.5.0](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)/collapse/collapse.go:125 +0x628
github.com/JohannesKaufmann/html-to-markdown/v2/plugin/base.(*base).preRenderCollapse(0x1020fdb60, {0x101993228, 0x1400180a618}, 0x140034ec540)
	/Users/bmartinez/go/pkg/mod/github.com/!johannes!kaufmann/html-to-markdown/[v2@v2.5.0](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)/plugin/base/base.go:88 +0xb8
github.com/JohannesKaufmann/html-to-markdown/v2/converter.(*Converter).ConvertNode(0x140019bea80, 0x140034ec540, {0x14001b05af8, 0x1, 0x1})
	/Users/bmartinez/go/pkg/mod/github.com/!johannes!kaufmann/html-to-markdown/[v2@v2.5.0](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)/converter/convert.go:103

As workaround, I render the node to HTML code (string) and then parse it again.

I proposed a PR for the fix:
#196

Code Snippet

func TestCollapse_EmptyTextNode(t *testing.T) {
	input := `<html><body>  <span>Hello </span> <span> World </span></body></html>`

	doc, err := html.Parse(strings.NewReader(input))
	if err != nil {
		t.Error(err)
	}

	for d := range doc.Descendants() {
		if d.Type == html.TextNode {
			d.Data = ""
			break
		}
	}

	Collapse(doc, nil)
}

Generated Markdown

/Users/bmartinez/go/pkg/mod/github.com/!johannes!kaufmann/html-to-markdown/[v2@v2.5.0](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)/collapse/collapse.go:125 +0x628
github.com/JohannesKaufmann/html-to-markdown/v2/plugin/base.(*base).preRenderCollapse(0x1020fdb60, {0x101993228, 0x1400180a618}, 0x140034ec540)
	/Users/bmartinez/go/pkg/mod/github.com/!johannes!kaufmann/html-to-markdown/[v2@v2.5.0](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)/plugin/base/base.go:88 +0xb8
github.com/JohannesKaufmann/html-to-markdown/v2/converter.(*Converter).ConvertNode(0x140019bea80, 0x140034ec540, {0x14001b05af8, 0x1, 0x1})
	/Users/bmartinez/go/pkg/mod/github.com/!johannes!kaufmann/html-to-markdown/[v2@v2.5.0](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)/converter/convert.go:103

Expected Markdown

-

What plugins did you use?

Base plugin

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingv2version v2.x.x

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions