Understanding feature overwriting and the Morphologizer #11676
-
Hello, I have a custom pipeline for an Akkadian corpus which includes an AttributeRuler that, based on pre-given data, assigns some morphological feature tags to many tokens (e.g. gender, case for nouns, tense, person, number for verbs). The assigned feature sets are often not always fully specified, however, because a form out of context can be ambiguous (e.g. a form could be 1st or 3rd person). I also include a Morphologizer trained on corpus data. I was wondering if I set the Morphologizer's overwrite parameter to true in the config file, and the Morphologizer decides to assign features to a token that already has some features given to it by the AttributeRuler, will the latter's features be completely erased by the former, or is it possible for the Morphologizer simply to append new features but overwrite those with conflicting values. Fir instance, if the token is assigned 'Gender=Masc|Number=Plur' by the AttributeRuler but the Morphologizer assigns 'Gender=Fem|Case=Nom', would the result be 'Gender=Fem|Case=Nom' or 'Gender=Fem|Number=Plur|Case=Nom'? More generally, is it possible to structure the pipeline so that the AttributeRuler assigns features to tokens that aren't overwritten, but language model can still add features learnable from the token's context? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
The way the Morphologizer works is that it will overwrite all or no attributes - overwrite does not happen per-field. In this case I think it would be easier to put the AttributeRuler second and have it overwrite the Morphologizer annotations. Or if you need some more complex way of referencing both the predictions and the rule-based assignments, you could wrap the AttributeRuler to take its predictions and combine them with the existing annotations manually. See how |
Beta Was this translation helpful? Give feedback.
The way the Morphologizer works is that it will overwrite all or no attributes - overwrite does not happen per-field.
In this case I think it would be easier to put the AttributeRuler second and have it overwrite the Morphologizer annotations. Or if you need some more complex way of referencing both the predictions and the rule-based assignments, you could wrap the AttributeRuler to take its predictions and combine them with the existing annotations manually. See how
__call__
is implemented for an example of what that might look like.