morph reading in token is not merged properly when using merge_entities pipeline #12856
-
How to reproduce the behaviour
Command to test
returns
Note how for 4月1日, it shows morph": "Reading=ツイタチ". It removed the reading from 4月 Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
The retokenizer can merge different morph features like As a workaround, you can merge the values using your own custom method before retokenizing. Set the same value on all tokens in the entity/span to be sure that this value gets used for the new token:
|
Beta Was this translation helpful? Give feedback.
The retokenizer can merge different morph features like
A=1
+B=2
->A=1|B=2
, but it doesn't know how to automatically merge multiple values for the same feature likeA=1
+A=2
->A=???
, so it uses the value from one token instead of trying to merge them. I'd have to double-check to be sure, but I think the default is to take the value from the head token in the phrase, and if there's no parse then it's taken from the first token.As a workaround, you can merge the values using your own custom method before retokenizing. Set the same value on all tokens in the entity/span to be sure that this value gets used for the new token: