Skip to content
Discussion options

You must be logged in to vote

The retokenizer can merge different morph features like A=1 + B=2 -> A=1|B=2, but it doesn't know how to automatically merge multiple values for the same feature like A=1 + A=2 -> A=???, so it uses the value from one token instead of trying to merge them. I'd have to double-check to be sure, but I think the default is to take the value from the head token in the phrase, and if there's no parse then it's taken from the first token.

As a workaround, you can merge the values using your own custom method before retokenizing. Set the same value on all tokens in the entity/span to be sure that this value gets used for the new token:

from spacy.tokens import MorphAnalysis
span = doc[0:2]
reading…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@lawctan
Comment options

Answer selected by lawctan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / doc Feature: Doc, Span and Token objects feat / morphology Feature: Morphology and MorphAnalysis
2 participants
Converted from issue

This discussion was converted from issue #12854 on July 25, 2023 06:24.