Usage of `ignore_token` parameter to `word_segmentation` not documented enough, does not work

I have phrases with named entities that I want the `word_segmentation` API to ignore. I tried replacing the named entities with `SPECIAL_TOKEN_1`, `SPECIAL_TOKEN_2` etc in the phrase itself, then passing `SPECIAL_TOKEN_1` and `SPECIAL_TOKEN_2` as `ignore_token` to the call to `word_segmentation`. I cannot get this to work.

```
phrase = "Hello SPECIAL_TOKEN_1, I am happyto meet you tomorrowmorning. Thanks, SPECIAL_TOKEN_2"
phrase_suggestions = sym_spell.word_segmentation(test_phrase)
```

phrase_suggestions looks like this:
```
Composition(segmented_string='Hello **SPECIAL _TOKEN_ 1,** I am happy to meet you tomorrow morning. Thanks, **SPECIAL_ TOKEN_2**', corrected_string='Hello Special token of I am happy to meet you tomorrow morning Thanks Special Token', distance_sum=14, log_prob_sum=-55.6460931972679)
```

Notice how `SPECIAL_TOKEN_1` and `SPECIAL_TOKEN_2` get broken.

I tried using the `ignore_token` argument but cannot get it to work--

```
phrase = "Hello SPECIAL_TOKEN_1, I am happyto meet you tomorrowmorning. Thanks, SPECIAL_TOKEN_2"
phrase_suggestions = sym_spell.word_segmentation(test_phrase, ignore_token='SPECIAL_TOKEN_1')
```

I get back the same `phrase_suggestions` as before. Also not sure how to pass multiple tokens to ignore.

Also tried:
```
phrase_suggestions = sym_spell.word_segmentation(test_phrase, ignore_token=r"SPECIAL_TOKEN_\d")
```

and I get the following returned as `phrase_suggestions`:
```
Composition(segmented_string='Hello **SPECIAL _TOKEN_ 1**, I am happy to meet you tomorrow morning. Thanks, **SPECIAL_ TOKEN_2**', corrected_string='Hello Special token of I am happy to meet you tomorrow morning Thanks Special Token', distance_sum=14, log_prob_sum=-55.6460931972679)
```

Could you please help and also add more documentation on using this parameter?

What's the recommended way to deal with named entities?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Usage of `ignore_token` parameter to `word_segmentation` not documented enough, does not work #87

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Usage of ignore_token parameter to word_segmentation not documented enough, does not work #87

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Usage of `ignore_token` parameter to `word_segmentation` not documented enough, does not work #87