How to explicitly split hyphen/dash #10211
-
|
I tokenize a whole pandas DataFrame column with these lines of code: This wont split any words with hypens inside of them. I know this might be useful for some usecases but I need them to be seperated. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Hi @Emporea , you might want to define a |
Beta Was this translation helpful? Give feedback.
Hi @Emporea , you might want to define a
Tokenizerwith special cases. For hyphens, you can pass a regex pattern to theTokenizer'sinfix_finditerparam as demonstrated in this example.