can't find tokenizer rule that keeps "A.B.C." together but splits "a.b.c." into "a.b.c" and "." #13732
jefhil
started this conversation in
Language Support
Replies: 1 comment
-
from spacy import blank --> ABC=[('TOKEN', 'A.B.C.')] abc=[('TOKEN', 'a.b.c'), ('SUFFIX', '.')] |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I want to treat upper and lower case "a.b.c." the same but can't figure out where the rule that splits/keeps is located
TIA
Beta Was this translation helpful? Give feedback.
All reactions