Disambiguation for tokenization #425

ozwiz · 2016-06-11T23:38:40Z

ozwiz
Jun 11, 2016

First of all thanks for your great work.
I have read many issues posted here about tokenization. It is a tough task. For example "Adam's" may mean "of Adam" or "Adam is".
Would you consider any disambiguation step that will merge or split "'s" automatically based on later functions?

ines · 2017-11-09T16:53:23Z

ines
Nov 9, 2017
Maintainer

Quick update: This might be a nice use case for the new custom processing pipeline components and extension attributes introduced in v2.0!

0 replies

naveenjafer · 2019-10-10T17:06:48Z

naveenjafer
Oct 10, 2019

This seems to be a really old thread, but I would like to take it up if there is still a need for this. @ines is this relevant as of today? If there is a plan to proceed with this, I might really benefit from a few more functional examples of the same.

0 replies

naveenjafer · 2019-10-17T04:17:21Z

naveenjafer
Oct 17, 2019

Hi @ines @honnibal is the idea to keep multiple possible tokenizations for the same sentence where applicable and evaluate the one that is grammatically most probable in later stages of the pipeline before committing to one of them?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Disambiguation for tokenization #425

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Disambiguation for tokenization #425

Uh oh!

ozwiz Jun 11, 2016

Replies: 3 comments

Uh oh!

ines Nov 9, 2017 Maintainer

Uh oh!

Uh oh!

naveenjafer Oct 10, 2019

Uh oh!

naveenjafer Oct 17, 2019

ozwiz
Jun 11, 2016

ines
Nov 9, 2017
Maintainer

naveenjafer
Oct 10, 2019

naveenjafer
Oct 17, 2019