How to use multiprocessing.Pool to run spaCy? #13818
Unanswered
Jiebro0109
asked this question in
Help: Coding & Implementations
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to process 100 Parquet files (~1GB each, ~10M rows) for Named Entity Recognition (NER) using spaCy’s en_core_web_trf model with multiprocessing.Pool (2 processes) on a single 8GB GPU. Each process loads a file, applies NER to extract ORG entities, and saves results to a Parquet file. However, the program hangs after printing "Pool started" with no further output, and the GPU is not utilized (nvidia-smi shows 0% usage).
I want to confirm if multiprocessing.Pool can be used to process multiple files in parallel with spaCy on a single GPU, and how to resolve the hanging issue.
Here’s a minimal example of my script:
Beta Was this translation helpful? Give feedback.
All reactions