Norwegian tagging is wrong (both large and small model)? #12103

kungfooman · 2023-01-14T16:35:21Z

kungfooman
Jan 14, 2023

https://demos.explosion.ai/displacy?text=Det%20er%20s%C3%A5%20kaldt%20ute.&model=nb_core_news_sm&cpu=1&cph=0

The result is:

So basically it tags "Det er så kaldt ute." as:
PRON - Det
AUX - er
ADV - så
ADJ - kaldt
ADP - ute

I think the correct tagging should be "PRON AUX ADV ADJ ADV".

But I am not exactly a linguist... can someone validate if I got it wrong, spaCy got it wrong, both or neither?

Thank you a lot! I appreciate working with spaCy, but right now I am confused how "right" it is in various cases.

Answered by adrianeboyd

Jan 16, 2023

I don't know enough about Norwegian to know for sure what the best analysis is for this sentence, but if you take a look at the training data (UD_Norwegian-Bokmaal), you can see that the counts for UPOS / DEP labels for the token "ute" look like this:

      4 ute	ADP	advcl
     19 ute	ADP	case
      5 ute	ADP	compound:prt
      3 ute	ADP	conj
      5 ute	ADP	nmod
      6 ute	ADP	obl
      4 ute	SCONJ	mark

Given this, it's not surprising that the spacy model predicts ADP or obl. The wide range of possible dependency labels would suggest that this word might have more than one usage or meaning and with so much variation in the annotation the model is going to have difficulty getting this c…

View full answer

adrianeboyd · 2023-01-16T09:45:08Z

adrianeboyd
Jan 16, 2023

I don't know enough about Norwegian to know for sure what the best analysis is for this sentence, but if you take a look at the training data (UD_Norwegian-Bokmaal), you can see that the counts for UPOS / DEP labels for the token "ute" look like this:

      4 ute	ADP	advcl
     19 ute	ADP	case
      5 ute	ADP	compound:prt
      3 ute	ADP	conj
      5 ute	ADP	nmod
      6 ute	ADP	obl
      4 ute	SCONJ	mark

Given this, it's not surprising that the spacy model predicts ADP or obl. The wide range of possible dependency labels would suggest that this word might have more than one usage or meaning and with so much variation in the annotation the model is going to have difficulty getting this correct.

The tag ADP for the non-case instances does seem unexpected at first glance. It could be a conversion issue (most UD treebanks are converted from more language-specific dependency formats) or a convention that has been chosen for Norwegian for a difficult case. I'm afraid I'm just not really sure.

Looking at similar cases in UD_English-EWT, you see "outside" frequently as ADP/case ("outside the house") and ADV/advmod ("the event took place outside").

In general, you can see how well the model performs on the UD dev data in the evaluations provided in the model meta and in the expandable "Accuracy Evaluation" tables for each model under https://spacy.io/models.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Norwegian tagging is wrong (both large and small model)? #12103

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Norwegian tagging is wrong (both large and small model)? #12103

Uh oh!

Uh oh!

kungfooman Jan 14, 2023

Replies: 1 comment

Uh oh!

adrianeboyd Jan 16, 2023

kungfooman
Jan 14, 2023

adrianeboyd
Jan 16, 2023