nlp.pipe 3x slower on m1 mac (python installed natively, no rosetta) #9314
-
I have an intel macbook pro quad core i7 with 16mb ram using anaconda versus a mac mini with 16mb ram using arm python installation using miniforge. I verified python is using arm and not rosetta in the activity monitor. Numpy and pandas calculations are all 2x faster on the m1 mac mini than the intel macbook. But spacy is ~3x slower on the m1. Any suggestions on what I might do to optimize spacy for the m1? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
The underlying problem is that thinc primarily uses We do have a solution, which uses apple's accelerate library instead of blis for GEMM. We should get this published and documented/advertised, because it makes a huge difference. In some simple benchmarks it's about 8x faster vs. the unoptimized blis (and about 1.5x faster than numpy's openblas). If you upgrade to thinc v8.0.9+ and have this package installed, it should automatically switch to Edited to remove repo install instructions, see the comment below for the official package. |
Beta Was this translation helpful? Give feedback.
The underlying problem is that thinc primarily uses
blis
(rather than numpy's openblas) for matrix multiplication, which isn't optimized for the apple m1 yet (maybe upstream inflame/blis
by now, but not in our pythonexplosion/cython-blis
package yet).We do have a solution, which uses apple's accelerate library instead of blis for GEMM. We should get this published and documented/advertised, because it makes a huge difference. In some simple benchmarks it's about 8x faster vs. the unoptimized blis (and about 1.5x faster than numpy's openblas).
If you upgrade to thinc v8.0.9+ and have this package installed, it should automatically switch to
AppleOps
instead ofNumpyOps
as the default op…