Skip to content
Discussion options

You must be logged in to vote

A big part of the reason is that the provided trained pipelines aren't trained on texts that include XML tags like this, so you'll get fairly unpredictable results. In general it would be better to store this information is some other form than inserting special tokens in the text, especially if you want to use the provided English pipelines and not a custom model that is trained on texts containing these kinds of tokens.

Beyond that, there are a couple things going on:

Replies: 2 comments 4 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
4 replies
@Nickersoft
Comment options

@rmitsch
Comment options

@Nickersoft
Comment options

@adrianeboyd
Comment options

Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / parser Feature: Dependency Parser feat / tokenizer Feature: Tokenizer
4 participants
Converted from issue

This discussion was converted from issue #12123 on January 23, 2023 09:25.