How to explicitly split hyphen/dash #10211
-
I tokenize a whole pandas DataFrame column with these lines of code:
This wont split any words with hypens inside of them. I know this might be useful for some usecases but I need them to be seperated. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @Emporea , you might want to define a |
Beta Was this translation helpful? Give feedback.
Hi @Emporea , you might want to define a
Tokenizer
with special cases. For hyphens, you can pass a regex pattern to theTokenizer
'sinfix_finditer
param as demonstrated in this example.