Skip to content
Discussion options

You must be logged in to vote

The tokenizer only supports exact string matches as special cases. Instead, you can add a custom component at the beginning of your pipeline that matches based on regexes and uses the retokenizer to adjust the tokenization before the rest of the components are run: https://spacy.io/usage/linguistic-features#retokenization

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / tokenizer Feature: Tokenizer feat / doc Feature: Doc, Span and Token objects
2 participants