Parser not annotating sentence boundaries during training #11369
-
I'm trying to create a pipeline for relation extraction. For this i've modified the relation extractor in this spacy project to use different features. The features that i've chosen are the words in a sentence between the two entities. Because of this i'm using doc.sents in the get_instances() function. i've added a parser to the pipeline and added However i'm getting this error after running
i have a suspicion it might have something to do with the custom data reader, but i don't see how it interferes. My understanding of the reader is that it also loads in the entities defined in the training data (gold data) because the relation component needs it to make predictions. However this holds no clue as to why the parser is not annotating the examples being yielded here.
Entire config:
Entire error:
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 9 replies
-
I've also tried to train with just the sentencizer and both parser and sentencizer and both result in the same error. |
Beta Was this translation helpful? Give feedback.
-
I tried to code a workaround by adding the sentence boundaries in the file loader: I made these changes to
But this results in the same error. |
Beta Was this translation helpful? Give feedback.
-
For the custom reader, the tokenization for For testing, you can also have the corpus reader add the sentence boundaries to the For simplicity, a |
Beta Was this translation helpful? Give feedback.
annotating_components
only sets annotation in thepredicted
doc, not in thereference
docs, so if you need sentence boundaries inget_instances
for the reference docs, you have to set them separately before training, either directly in the saved.spacy
annotation or with a custom corpus reader.For the custom reader, the tokenization for
blank:en
may not match the saved tokenization, so it would be better to process thegold
Doc
object with thesentencizer
rather thangold.text
.For testing, you can also have the corpus reader add the sentence boundaries to the
predicted
docs, but in practice you would want a component in the pipeline that adds this or you wouldn't be able to run the comp…