Understanding feature overwriting and the Morphologizer #11676

megamattc · 2022-10-19T23:36:17Z

megamattc
Oct 19, 2022

Hello,

I have a custom pipeline for an Akkadian corpus which includes an AttributeRuler that, based on pre-given data, assigns some morphological feature tags to many tokens (e.g. gender, case for nouns, tense, person, number for verbs). The assigned feature sets are often not always fully specified, however, because a form out of context can be ambiguous (e.g. a form could be 1st or 3rd person). I also include a Morphologizer trained on corpus data.

I was wondering if I set the Morphologizer's overwrite parameter to true in the config file, and the Morphologizer decides to assign features to a token that already has some features given to it by the AttributeRuler, will the latter's features be completely erased by the former, or is it possible for the Morphologizer simply to append new features but overwrite those with conflicting values. Fir instance, if the token is assigned 'Gender=Masc|Number=Plur' by the AttributeRuler but the Morphologizer assigns 'Gender=Fem|Case=Nom', would the result be 'Gender=Fem|Case=Nom' or 'Gender=Fem|Number=Plur|Case=Nom'?

More generally, is it possible to structure the pipeline so that the AttributeRuler assigns features to tokens that aren't overwritten, but language model can still add features learnable from the token's context?

Answered by polm

Oct 21, 2022

The way the Morphologizer works is that it will overwrite all or no attributes - overwrite does not happen per-field.

In this case I think it would be easier to put the AttributeRuler second and have it overwrite the Morphologizer annotations. Or if you need some more complex way of referencing both the predictions and the rule-based assignments, you could wrap the AttributeRuler to take its predictions and combine them with the existing annotations manually. See how __call__ is implemented for an example of what that might look like.

View full answer

polm · 2022-10-21T06:03:39Z

polm
Oct 21, 2022

The way the Morphologizer works is that it will overwrite all or no attributes - overwrite does not happen per-field.

In this case I think it would be easier to put the AttributeRuler second and have it overwrite the Morphologizer annotations. Or if you need some more complex way of referencing both the predictions and the rule-based assignments, you could wrap the AttributeRuler to take its predictions and combine them with the existing annotations manually. See how __call__ is implemented for an example of what that might look like.

5 replies

megamattc Oct 21, 2022
Author

Ok. Thank you.

megamattc Nov 5, 2022
Author

Hello again,

Regarding this issue, I see that the Morphologizer has both an overwrite and extend feature w.r.t. prior feature annotations. In the description of how these options work (https://spacy.io/api/morphologizer/#section-init), it says that if both overwrite and extend are set to true, the Morphologizer will overwrite the values of pre-existing fields and add new ones:

"overwrite=True, extend=True: overwrite values of existing features, add any new features (A=B|C=D + C=E|X=Y → A=B|C=E|X=Y)"

Also, if overwrite=False, extend=True, the Morphologizer only adds new fields and does not overwrite any old ones:

"overwrite=False, extend=True: keep values of existing features, add any new features (A=B|C=D + C=E|X=Y → A=B|C=D|X=Y)"

Is this consistent with what you said above? Using overwrite=False, extend=True, would putting the AttributeRuler before the Morphologizer allow the latter to add addition features to a token determinable by context, which the AttributeRuler doesn't know about?

adrianeboyd Nov 7, 2022

Ah, the API docs should be correct. We don't use this option in any of the trained pipelines and even I had nearly forgotten that I implemented this.

megamattc Nov 7, 2022
Author

Ok, so to double check, even though you guys do not use the feature, what I said above about setting the AttributeRuler's features overwrite=False, extend=True and putting it before the Morphologizer is correct?

adrianeboyd Nov 7, 2022

There are tests for the feature, it's just not used in any of the trained pipelines. I think it should be correct, but if it doesn't do what you expect, it would be fine to file an issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Understanding feature overwriting and the Morphologizer #11676

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Understanding feature overwriting and the Morphologizer #11676

Uh oh!

megamattc Oct 19, 2022

Replies: 1 comment · 5 replies

Uh oh!

polm Oct 21, 2022

Uh oh!

megamattc Oct 21, 2022 Author

Uh oh!

megamattc Nov 5, 2022 Author

Uh oh!

adrianeboyd Nov 7, 2022

Uh oh!

megamattc Nov 7, 2022 Author

Uh oh!

adrianeboyd Nov 7, 2022

megamattc
Oct 19, 2022

Replies: 1 comment 5 replies

polm
Oct 21, 2022

megamattc Oct 21, 2022
Author

megamattc Nov 5, 2022
Author

megamattc Nov 7, 2022
Author