What does "Expands multi-word tokens (MWT) predicted by the TokenizeProcessor. This is only applicable to some languages." mean? #1498
-
The source is https://stanfordnlp.github.io/stanza/pipeline.html#processors Does this mean that some languages absolutely do not have any multi-word tokens, so MWTProcessor simply treats every word as a token directly? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Yes, that's it exactly. The MWT processor only works on languages where the training data supports it. So, for example, English has We can word it differently if you suggest how, but I will say the linked article does a good job of explaining it (using French as an example, not English). Maybe we could put a modal window there with an example? |
Beta Was this translation helpful? Give feedback.
Yes, that's it exactly. The MWT processor only works on languages where the training data supports it. So, for example, English has
don't
and similar contractions, but Chinese doesn't have anything like that.We can word it differently if you suggest how, but I will say the linked article does a good job of explaining it (using French as an example, not English). Maybe we could put a modal window there with an example?