NER differences in spaCy v2 and v3. #8804
Replies: 1 comment 5 replies
-
My understanding is that the NER architecture didn't change much between v2 and v3. However, some data augmentations (case modification) were accidentally left out for the 3.0 models (see #8380). This has been resolved in the 3.1 models, so I would suggest you try them. It's not obvious to me that your issues have anything to do with case augmentation, though I would note that some of your entities have titles ("Rev Dr Hkalam Samson"), which sometimes have inconsistent annotations (annotators may be unclear about whether to include the titles in a PERSON entity or not). I think this is resolved in the version of OntoNotes we're using but it's still something worth keeping in mind. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've noticed recently that there are some differences in spaCy NER performance for recognizing person names with 3 tokens. One example would be this snippet. Entity of interest here is
Min Aung Hlaing
:spaCy NER v2 (2.3.7):
spaCy NER v3 (3.0.6):
I think v2 is doing a better job compared to v3 in general.
My main questions is: What are the main differences (if any) between v2 and v3 NER. Is this documented somewhere?
FYI: The outputs of
en_core_web_lg
models are more "consistent"/"equivalent" across spaCy v2 and v3. Not sure why so much difference in theen_core_web_md
models.Beta Was this translation helpful? Give feedback.
All reactions