Positive Tokenization? #13383
Unanswered
dave-richards
asked this question in
Help: Coding & Implementations
Replies: 1 comment
-
Hi! Just to be sure: are you aware that we are supporting "Ancient greek" with the language tag |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am new to NLU and spacy, but I have been reading he docs and doing some testing. I would like to implement a custom tokenizer for Biblical Greek. My reading of the tokenizer docs is that the customizations are "negative", i.e. a token is not a whitespace character and it's not a prefix and its not a suffix and its not an infix. Everything else is a valid token. I would like to work the other way around. I would like to define exactly what is a token and continues down the pipeline and skip over what is not. Is my understanding correct and is it possible to invert the logic to work as I would like?
Beta Was this translation helpful? Give feedback.
All reactions