-
-
Notifications
You must be signed in to change notification settings - Fork 127
Description
I have phrases with named entities that I want the word_segmentation API to ignore. I tried replacing the named entities with SPECIAL_TOKEN_1, SPECIAL_TOKEN_2 etc in the phrase itself, then passing SPECIAL_TOKEN_1 and SPECIAL_TOKEN_2 as ignore_token to the call to word_segmentation. I cannot get this to work.
phrase = "Hello SPECIAL_TOKEN_1, I am happyto meet you tomorrowmorning. Thanks, SPECIAL_TOKEN_2"
phrase_suggestions = sym_spell.word_segmentation(test_phrase)
phrase_suggestions looks like this:
Composition(segmented_string='Hello **SPECIAL _TOKEN_ 1,** I am happy to meet you tomorrow morning. Thanks, **SPECIAL_ TOKEN_2**', corrected_string='Hello Special token of I am happy to meet you tomorrow morning Thanks Special Token', distance_sum=14, log_prob_sum=-55.6460931972679)
Notice how SPECIAL_TOKEN_1 and SPECIAL_TOKEN_2 get broken.
I tried using the ignore_token argument but cannot get it to work--
phrase = "Hello SPECIAL_TOKEN_1, I am happyto meet you tomorrowmorning. Thanks, SPECIAL_TOKEN_2"
phrase_suggestions = sym_spell.word_segmentation(test_phrase, ignore_token='SPECIAL_TOKEN_1')
I get back the same phrase_suggestions as before. Also not sure how to pass multiple tokens to ignore.
Also tried:
phrase_suggestions = sym_spell.word_segmentation(test_phrase, ignore_token=r"SPECIAL_TOKEN_\d")
and I get the following returned as phrase_suggestions:
Composition(segmented_string='Hello **SPECIAL _TOKEN_ 1**, I am happy to meet you tomorrow morning. Thanks, **SPECIAL_ TOKEN_2**', corrected_string='Hello Special token of I am happy to meet you tomorrow morning Thanks Special Token', distance_sum=14, log_prob_sum=-55.6460931972679)
Could you please help and also add more documentation on using this parameter?
What's the recommended way to deal with named entities?