Norwegian tagging is wrong (both large and small model)? #12103
-
The result is: So basically it tags "Det er så kaldt ute." as: I think the correct tagging should be "PRON AUX ADV ADJ ADV". But I am not exactly a linguist... can someone validate if I got it wrong, spaCy got it wrong, both or neither? Thank you a lot! I appreciate working with spaCy, but right now I am confused how "right" it is in various cases. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I don't know enough about Norwegian to know for sure what the best analysis is for this sentence, but if you take a look at the training data (UD_Norwegian-Bokmaal), you can see that the counts for UPOS / DEP labels for the token "ute" look like this:
Given this, it's not surprising that the spacy model predicts The tag Looking at similar cases in UD_English-EWT, you see "outside" frequently as In general, you can see how well the model performs on the UD dev data in the evaluations provided in the model meta and in the expandable "Accuracy Evaluation" tables for each model under https://spacy.io/models. |
Beta Was this translation helpful? Give feedback.
I don't know enough about Norwegian to know for sure what the best analysis is for this sentence, but if you take a look at the training data (UD_Norwegian-Bokmaal), you can see that the counts for UPOS / DEP labels for the token "ute" look like this:
Given this, it's not surprising that the spacy model predicts
ADP
orobl
. The wide range of possible dependency labels would suggest that this word might have more than one usage or meaning and with so much variation in the annotation the model is going to have difficulty getting this c…