Skip to content
Discussion options

You must be logged in to vote

So this isn't true for coref (yet, we should take a look at this), but in general all components should be able to support training from misaligned tokenization. The get_loss methods should basically ignore instances where the tokens can't be aligned. (But be aware that the alignment code is full of special cases depending on the attribute to try to keep as much annotation as possible, like only the token start char matters for SENT_START and it will align AB/TAG1 to A/TAG2 B/TAG2 as long as the tags are the same for the whole sequence.)

Not all components can train from partial annotation, though, mainly because there's not always a way to mark partial annotation, like for spancat traini…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@KennethEnevoldsen
Comment options

Answer selected by KennethEnevoldsen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / coref Feature: Coreference resolution
2 participants