-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Description
Hi,
Thanks for your contribution, it's really useful to see evaluations on real-world data! There are further extraction tools for Python which this repository doesn't feature yet and which could be more efficient than some of the ones you're mentioning. You might have a look at
goose3jusText(especially with a custom configuration)inscriptis(html-to-txt conversion)trafilatura(disclaimer: I'm the author).
Or is there a reason why you didn't use them in the first place? I'd be curious to hear about it.
For more details please refer to the evaluation I've performed. The code including baselines is available here.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels