48x Difference between TRF model running in Spacy Vs. HF #10740

ofirnk · 2022-05-01T14:13:01Z

ofirnk
May 1, 2022

Running the benchmark script provided here:
https://spacy.io/usage/facts-figures#benchmarks-speed

Gives the following results with small/big text files:

library	name	gpu	articles	characters	words	seconds	k wps	time stamp
spacy	en_core_web_sm	False	2	1334	220	0.057466435000606	3.82832169766021	2022-05-01T13:44:04
spacy	en_core_web_lg	False	2	1334	220	0.0589063009992969	3.73474477717801	2022-05-01T13:44:12
spacy	en_core_web_trf	False	2	1334	220	1.32473215500067	0.166071306693607	2022-05-01T13:44:23
stanza	en_ewt	False	2	1334	220	0.903665433001152	0.243452932872912	2022-05-01T13:44:31
hf_trf	roberta-base	False	2	1334	220	1.20050543999969	0.183256145844751	2022-05-01T13:44:55
spacy	en_core_web_sm	False	3	158011	27882	8.23362864400042	3.38635627200854	2022-05-01T13:52:07
spacy	en_core_web_lg	False	3	158011	27882	8.79624469800001	3.16976175143689	2022-05-01T13:52:24
spacy	en_core_web_trf	False	3	158011	27882	260.379056734	0.107082345061584	2022-05-01T13:56:54
stanza	en_ewt	False	3	158011	27882	130.610605370999	0.213474242162811	2022-05-01T13:59:13
hf_trf	roberta-base	False	3	158011	27882	5.71602788599921	4.87786283693506	2022-05-01T13:59:43

Note: These are CPU only results

I'll format it properly in next comment/s, but main things I saw is

48x difference between running in spacy vs. hf

I've used colab environment (the infamous one I've set up for #9858 ).
There I see 60x difference between published spacy v3.3 TRF model and LG model.
Where published benchmarks shows 15x difference/ratio.

How to reproduce the behaviour

https://github.com/explosion/projects/blob/v3/benchmarks/speed

Answered by adrianeboyd

May 2, 2022

Hi, can you try running the benchmark again after downloading the project assets (spacy project assets) so that there are more texts to benchmark? This just really isn't enough texts/time to be a meaningful comparison. To be honest, even the provided default of 1000 texts is a bit a low. Something that runs for at least a few minutes in each instance would provide a more useful comparison.

The WPS ratios do not look that different overall though? 20x in your short example vs 15x in our table? A lot of the exact details depend on the CPU and environment in ways that are hard to replicate exactly, in colab especially I bet. (As another data point, locally with 10000 texts I see a 10x differ…

View full answer

adrianeboyd · 2022-05-02T06:52:56Z

adrianeboyd
May 2, 2022

Hi, can you try running the benchmark again after downloading the project assets (spacy project assets) so that there are more texts to benchmark? This just really isn't enough texts/time to be a meaningful comparison. To be honest, even the provided default of 1000 texts is a bit a low. Something that runs for at least a few minutes in each instance would provide a more useful comparison.

The WPS ratios do not look that different overall though? 20x in your short example vs 15x in our table? A lot of the exact details depend on the CPU and environment in ways that are hard to replicate exactly, in colab especially I bet. (As another data point, locally with 10000 texts I see a 10x difference.)

0 replies

ofirnk · 2022-05-02T07:08:05Z

ofirnk
May 2, 2022
Author

Hi, can you try running the benchmark again after downloading the project assets (spacy project assets) so that there are more texts to benchmark? This just really isn't enough texts/time to be a meaningful comparison. To be honest, even the provided default of 1000 texts is a bit a low. Something that runs for at least a few minutes in each instance would provide a more useful comparison.

The WPS ratios do not look that different overall though? 20x in your short example vs 15x in our table? A lot of the exact details depend on the CPU and environment in ways that are hard to replicate exactly, in colab especially I bet. (As another data point, locally with 10000 texts I see a 10x difference.)

Hi, sorry about not updating about the testing environment yesterday -
I benchmarked also on a large text, which I downloaded into /texts/ folder.
The difference you/I see may be related to the fact you're running with many small texts, where I run it on a larger text.

It doesn't look to be quadratic though - only ratio seems really high - 40x and up to 60x pretty constantly, with spikes to 75x.

I'll be sharing colab notebook soon, but results are reproducible also on my local machine.

Note that I won't forget: Had to change benchmark project a bit as trf_model.attrs["tokenizer"].model_max_length is no longer valid code

0 replies

ofirnk · 2022-05-02T07:09:12Z

ofirnk
May 2, 2022
Author

Testing code is:

import en_core_web_lg
import en_core_web_trf

nlp_lg = en_core_web_lg.load()
nlp_lg("warm up")
nlp_trf = en_core_web_trf.load()
nlp_trf("warm up")
nlps_to_run = [
  nlp_lg,
  nlp_trf
]

import urllib
large_txt_url = "https://www.gutenberg.org/cache/epub/15466/pg15466.txt"
large_txt = urllib.request.urlopen(large_txt_url).read().decode('utf-8')
print(f"Large text length is {len(large_txt)}")

import timeit

piece_size = 200
for p in range(1, 20):
  cut_text_to_parse = large_txt[:piece_size*p]
  times = []
  for nlp in nlps_to_run:
    results = timeit.timeit(lambda: nlp(cut_text_to_parse), number=3)
    times.append(results)
    print(f'For text of length {len(cut_text_to_parse)} it took {nlp.meta["name"]}\t{results:.2f} seconds')
  print(f"Ratio between the 2 models running on {len(cut_text_to_parse)} is {int(times[1]/times[0])}")

0 replies

ofirnk · 2022-05-02T07:10:13Z

ofirnk
May 2, 2022
Author

Results from one of the runs:

For text of length 200 it took core_web_trf	0.72 seconds
Ratio between the 2 models running on 200 is 19
For text of length 400 it took core_web_lg	0.06 seconds
For text of length 400 it took core_web_trf	1.14 seconds
Ratio between the 2 models running on 400 is 20
For text of length 600 it took core_web_lg	0.07 seconds
For text of length 600 it took core_web_trf	1.81 seconds
Ratio between the 2 models running on 600 is 25
For text of length 800 it took core_web_lg	0.10 seconds
For text of length 800 it took core_web_trf	3.85 seconds
Ratio between the 2 models running on 800 is 39
For text of length 1000 it took core_web_lg	0.11 seconds
For text of length 1000 it took core_web_trf	3.98 seconds
Ratio between the 2 models running on 1000 is 36
For text of length 1200 it took core_web_lg	0.12 seconds
For text of length 1200 it took core_web_trf	5.86 seconds
Ratio between the 2 models running on 1200 is 47
For text of length 1400 it took core_web_lg	0.14 seconds
For text of length 1400 it took core_web_trf	5.93 seconds
Ratio between the 2 models running on 1400 is 42
For text of length 1600 it took core_web_lg	0.16 seconds
For text of length 1600 it took core_web_trf	7.43 seconds
Ratio between the 2 models running on 1600 is 46
For text of length 1800 it took core_web_lg	0.18 seconds
For text of length 1800 it took core_web_trf	10.19 seconds
Ratio between the 2 models running on 1800 is 55
For text of length 2000 it took core_web_lg	0.20 seconds
For text of length 2000 it took core_web_trf	10.19 seconds
Ratio between the 2 models running on 2000 is 49
For text of length 2200 it took core_web_lg	0.23 seconds
For text of length 2200 it took core_web_trf	12.67 seconds
Ratio between the 2 models running on 2200 is 54
For text of length 2400 it took core_web_lg	0.27 seconds
For text of length 2400 it took core_web_trf	12.69 seconds
Ratio between the 2 models running on 2400 is 47
For text of length 2600 it took core_web_lg	0.29 seconds
For text of length 2600 it took core_web_trf	15.11 seconds
Ratio between the 2 models running on 2600 is 52
For text of length 2800 it took core_web_lg	0.30 seconds
For text of length 2800 it took core_web_trf	15.10 seconds
Ratio between the 2 models running on 2800 is 50
For text of length 3000 it took core_web_lg	0.30 seconds
For text of length 3000 it took core_web_trf	17.58 seconds
Ratio between the 2 models running on 3000 is 57
For text of length 3200 it took core_web_lg	0.34 seconds
For text of length 3200 it took core_web_trf	17.56 seconds
Ratio between the 2 models running on 3200 is 51
For text of length 3400 it took core_web_lg	0.35 seconds
For text of length 3400 it took core_web_trf	20.00 seconds
Ratio between the 2 models running on 3400 is 56
For text of length 3600 it took core_web_lg	0.38 seconds
For text of length 3600 it took core_web_trf	20.06 seconds
Ratio between the 2 models running on 3600 is 52
For text of length 3800 it took core_web_lg	0.40 seconds
For text of length 3800 it took core_web_trf	22.52 seconds
Ratio between the 2 models running on 3800 is 56

0 replies

ofirnk · 2022-05-02T07:15:57Z

ofirnk
May 2, 2022
Author

Colab notebook:
https://colab.research.google.com/drive/1AM0VOCEIdXDLPqO9T4q9ly6-3rbs2_ga?usp=sharing

0 replies

ofirnk · 2022-05-02T07:40:24Z

ofirnk
May 2, 2022
Author

With GPU, ratio seems to be ~2 constantly.
(sorry for previous results, didn't verify GPU is utilized and it wasn't)

0 replies

adrianeboyd · 2022-05-02T09:05:15Z

adrianeboyd
May 2, 2022

Let me move this to the discussion board...

2 replies

ofirnk May 2, 2022
Author

Why is that a discussion?
I don't think there should be such high difference between spacy-transformers and HF directly

adrianeboyd May 3, 2022

The benchmark script was written for shorter texts like the provided assets and for the hf_trf option it just truncates texts that are longer than the model max length. So the speed results in the output are not accurate for long texts for hf_trf.

Uh oh!

48x Difference between TRF model running in Spacy Vs. HF #10740

Uh oh!

Uh oh!

ofirnk May 1, 2022

Note: These are CPU only results

48x difference between running in spacy vs. hf

How to reproduce the behaviour

Replies: 7 comments · 2 replies

Uh oh!

adrianeboyd May 2, 2022

Uh oh!

Uh oh!

ofirnk May 2, 2022 Author

Uh oh!

ofirnk May 2, 2022 Author

Uh oh!

ofirnk May 2, 2022 Author

Uh oh!

ofirnk May 2, 2022 Author

Uh oh!

ofirnk May 2, 2022 Author

Uh oh!

adrianeboyd May 2, 2022

Uh oh!

ofirnk May 2, 2022 Author

Uh oh!

adrianeboyd May 3, 2022

ofirnk
May 1, 2022

Replies: 7 comments 2 replies

adrianeboyd
May 2, 2022

ofirnk
May 2, 2022
Author

ofirnk
May 2, 2022
Author

ofirnk
May 2, 2022
Author

ofirnk
May 2, 2022
Author

ofirnk
May 2, 2022
Author

adrianeboyd
May 2, 2022

ofirnk May 2, 2022
Author