Difference in performance of postags between small and large models of portuguese #8633
Replies: 1 comment 1 reply
-
Besides being larger, medium and large models typically incorporate word vectors that aren't used in the small models, which can improve accuracy. The architecture is not different.
A large model is not a small model with some of the errors fixed. Large models are expected to make fewer errors on average, but the distribution of those errors will be different, so there will be cases where the small model is right and the large model is wrong. If there is a strong pattern to those cases or they seem very frequent that could indicate an issue, but the fact that such cases exist is neither surprising nor a cause for concern. Also see #3052 for notes on model errors in general. It might be helpful to think of models like students. If you have a student with high grades (Alice) and a student with low grades (Bob), the student with high grades will make fewer mistakes, but there's no guarantee that on any particular question it's impossible for Bob to get it right and Alice to get it wrong. Grades, like accuracy scores, are aggregate statistics. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am using version 3.0.6 and as shown in https://spacy.io/models/pt the small model has 0.96 accuracy in POS_ACC while large model has 0.97. Is this due solely to the size of the models or is there any other differences between them in this task?
I have seen some differences in performance in both models where the small model performs better than the large model, one case is "Não vou falar com ele nem que ele peça" ("I won't talk to him even if he asks") and "Não vou falar com ele nem que ele me peça" ("I won't talk to him even if he asks me"). As can be seen the difference between the two sentences is the pronoun "me". For the small model, it has no effect as it correctly tags "peça" as verb in both cases. But in the large model, when the pronoun is inserted, the tag for "peça" is the erroneous noun.
Is it expected that a model that performs 0.97 while other performs 0.96 to have errors that the other doesn't have? Intuitively, the answer should be 'no', as one is increasing on the other.
Beta Was this translation helpful? Give feedback.
All reactions