Skip to content
Discussion options

You must be logged in to vote

Let me back up a step and ask a few questions for clarification:

  • which model are you loading for nlp?
  • are you only interested in tokenizing your text or are you interested in annotation from additional pipeline components (part-of-speech tags, parses, entities, etc.)?

General comments:

  • we'd recommend splitting your input text up into smaller logical chunks for processing with spacy (paragraphs, pages, sections)
  • 20 minutes is an extremely long time for a single text no matter what (do you have very slow custom pipeline components? are you running a transformer model on CPU? is it possibly thrashing?)

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@nhershy
Comment options

@adrianeboyd
Comment options

Answer selected by polm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf / speed Performance: speed
2 participants