Attribute ruler and Parser inconsistent state #11008
-
How to reproduce the behaviourThis is a follow up on the issue 9782. It's closed but it's still unresolved in the new spaCy relase: 3.3.0 The fix mentioned in the link here by @adrianeboyd is working, but can be an issue if we need to use the dependencies tags on a pipeline later on the model. Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
This issue should be fixed in the v3.3.0 model releases. Double-check that you've also upgraded the models and not just spacy itself by using With spacy v3.3.1 and
The v3.2.0 models are not going to change in terms of this behavior because it's a problem with the attribute ruler rules in the model rather than a bug within the spacy library. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your response. I'm using the last version of spaCy, as well as the last version of the model. Actually, it's more about using the Parser with the Attribute ruler. When "parser" is enabled (need to use "DEP" in some matcher other pipeline), along with the attribute ruler and some segmentation pipeline, I get this error:
If I add this to the attribute ruler:
I do not get the error anymore, but the document will not be parsed, and I can't get dependency tags anymore. I made a reproducible example here: https://github.com/databill86/spacy-tests If you comment this line: https://github.com/databill86/spacy-tests/blob/3a33c03931e95d26195df10ed7fb7df9aebfae7a/src/my_pipelines.py#L38 You will see the error. |
Beta Was this translation helpful? Give feedback.
-
It's a restriction in spacy If you delete all the dependency annotation (as in your example rule), then you can set the sentence boundaries again with |
Beta Was this translation helpful? Give feedback.
It's a restriction in spacy
Doc
objects that the sentence boundaries have to correspond to the dependency parses. So if your doc includes any dependency annotation (even a partial parse), the only way to modify the sentence boundaries is to modify the parses.If you delete all the dependency annotation (as in your example rule), then you can set the sentence boundaries again with
token.is_sent_start
.