nlp.pipe 3x slower on m1 mac (python installed natively, no rosetta) #9314

jmichaelschmidt · 2021-09-28T01:59:29Z

jmichaelschmidt
Sep 28, 2021

I have an intel macbook pro quad core i7 with 16mb ram using anaconda versus a mac mini with 16mb ram using arm python installation using miniforge.

I verified python is using arm and not rosetta in the activity monitor. Numpy and pandas calculations are all 2x faster on the m1 mac mini than the intel macbook. But spacy is ~3x slower on the m1.

Any suggestions on what I might do to optimize spacy for the m1?

Answered by adrianeboyd

Sep 28, 2021

The underlying problem is that thinc primarily uses blis (rather than numpy's openblas) for matrix multiplication, which isn't optimized for the apple m1 yet (maybe upstream in flame/blis by now, but not in our python explosion/cython-blis package yet).

We do have a solution, which uses apple's accelerate library instead of blis for GEMM. We should get this published and documented/advertised, because it makes a huge difference. In some simple benchmarks it's about 8x faster vs. the unoptimized blis (and about 1.5x faster than numpy's openblas).

If you upgrade to thinc v8.0.9+ and have this package installed, it should automatically switch to AppleOps instead of NumpyOps as the default op…

View full answer

adrianeboyd · 2021-09-28T06:28:40Z

adrianeboyd
Sep 28, 2021

The underlying problem is that thinc primarily uses blis (rather than numpy's openblas) for matrix multiplication, which isn't optimized for the apple m1 yet (maybe upstream in flame/blis by now, but not in our python explosion/cython-blis package yet).

We do have a solution, which uses apple's accelerate library instead of blis for GEMM. We should get this published and documented/advertised, because it makes a huge difference. In some simple benchmarks it's about 8x faster vs. the unoptimized blis (and about 1.5x faster than numpy's openblas).

If you upgrade to thinc v8.0.9+ and have this package installed, it should automatically switch to AppleOps instead of NumpyOps as the default ops and you shouldn't need to manually set anything.

Edited to remove repo install instructions, see the comment below for the official package.

2 replies

adrianeboyd Sep 28, 2021

Now there is a published package so you can just run:

pip install thinc-apple-ops

Documentation and advertising to come...

jmichaelschmidt Sep 28, 2021
Author

WOW! Beautiful. After installing thinc-apple-ops and restarting the kernel, Tok2vec took 1.14 seconds on the same sample that took 9.01 seconds before.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

nlp.pipe 3x slower on m1 mac (python installed natively, no rosetta) #9314

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

nlp.pipe 3x slower on m1 mac (python installed natively, no rosetta) #9314

Uh oh!

jmichaelschmidt Sep 28, 2021

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

adrianeboyd Sep 28, 2021

Uh oh!

adrianeboyd Sep 28, 2021

Uh oh!

jmichaelschmidt Sep 28, 2021 Author

jmichaelschmidt
Sep 28, 2021

Replies: 1 comment 2 replies

adrianeboyd
Sep 28, 2021

jmichaelschmidt Sep 28, 2021
Author