Skip to content
Discussion options

You must be logged in to vote

Please don't post screenshots of code or terminal output, post them as text.

nlp.tokenizer.explain is not intended for normal tokenization, it's used to explain or debug the output of the tokenizer. It is not intended to be efficient, and is not the normal way to use the tokenizer.

In spaCy, generally the fastest way to tokenize things is basically to use a blank pipeline (like spacy.blank("en")) which just runs the tokenizer. You can also just call the tokenizer directly (nlp.tokenizer(text)). Note that spaCy tokenizers don't use the language model.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / tokenizer Feature: Tokenizer feat / sentencizer Feature: Sentencizer (rule-based sentence segmenter)
2 participants