Disambiguation for tokenization #425
Replies: 3 comments
-
Quick update: This might be a nice use case for the new custom processing pipeline components and extension attributes introduced in v2.0! |
Beta Was this translation helpful? Give feedback.
-
This seems to be a really old thread, but I would like to take it up if there is still a need for this. @ines is this relevant as of today? If there is a plan to proceed with this, I might really benefit from a few more functional examples of the same. |
Beta Was this translation helpful? Give feedback.
-
Hi @ines @honnibal is the idea to keep multiple possible tokenizations for the same sentence where applicable and evaluate the one that is grammatically most probable in later stages of the pipeline before committing to one of them? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
First of all thanks for your great work.
I have read many issues posted here about tokenization. It is a tough task. For example "Adam's" may mean "of Adam" or "Adam is".
Would you consider any disambiguation step that will merge or split "'s" automatically based on later functions?
Beta Was this translation helpful? Give feedback.
All reactions