Skip to content

Usage of ignore_token parameter to word_segmentation not documented enough, does not work #87

@sbhaktha

Description

@sbhaktha

I have phrases with named entities that I want the word_segmentation API to ignore. I tried replacing the named entities with SPECIAL_TOKEN_1, SPECIAL_TOKEN_2 etc in the phrase itself, then passing SPECIAL_TOKEN_1 and SPECIAL_TOKEN_2 as ignore_token to the call to word_segmentation. I cannot get this to work.

phrase = "Hello SPECIAL_TOKEN_1, I am happyto meet you tomorrowmorning. Thanks, SPECIAL_TOKEN_2"
phrase_suggestions = sym_spell.word_segmentation(test_phrase)

phrase_suggestions looks like this:

Composition(segmented_string='Hello **SPECIAL _TOKEN_ 1,** I am happy to meet you tomorrow morning. Thanks, **SPECIAL_ TOKEN_2**', corrected_string='Hello Special token of I am happy to meet you tomorrow morning Thanks Special Token', distance_sum=14, log_prob_sum=-55.6460931972679)

Notice how SPECIAL_TOKEN_1 and SPECIAL_TOKEN_2 get broken.

I tried using the ignore_token argument but cannot get it to work--

phrase = "Hello SPECIAL_TOKEN_1, I am happyto meet you tomorrowmorning. Thanks, SPECIAL_TOKEN_2"
phrase_suggestions = sym_spell.word_segmentation(test_phrase, ignore_token='SPECIAL_TOKEN_1')

I get back the same phrase_suggestions as before. Also not sure how to pass multiple tokens to ignore.

Also tried:

phrase_suggestions = sym_spell.word_segmentation(test_phrase, ignore_token=r"SPECIAL_TOKEN_\d")

and I get the following returned as phrase_suggestions:

Composition(segmented_string='Hello **SPECIAL _TOKEN_ 1**, I am happy to meet you tomorrow morning. Thanks, **SPECIAL_ TOKEN_2**', corrected_string='Hello Special token of I am happy to meet you tomorrow morning Thanks Special Token', distance_sum=14, log_prob_sum=-55.6460931972679)

Could you please help and also add more documentation on using this parameter?

What's the recommended way to deal with named entities?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions