Side-effects of as_tuples in multiprocessing #10354
-
Hey everybody - first of all, spacy is awesome! Secondly: I'm seing some weird interplay between as_tuples and multiprocessing (n_process). What I did:
Is this expected? Does this mean that when as_tuples=False, the multiprocessing doesn't separate the processes properly? Thank you for any elaboration on this! Originally posted by @svonava in #9597 (comment) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Unless you have really limited RAM or really long texts, 100 is a pretty small batch size for |
Beta Was this translation helpful? Give feedback.
nlp.pipe()
returns the documents in the same order as the input texts.as_tuples
is just to pair some external context with each returned doc, so if you only have input texts, you shouldn't need to use it. (Theas_tuples
option isn't really needed as of v3.2 because you can pass docs with custom attributes to the pipeline instead.)Unless you have really limited RAM or really long texts, 100 is a pretty small batch size for
en_core_web_lg
. It might well be faster to have larger batch sizes and fewer processes.