Inference speed between distilled and parent model is almost the same #11257

probavee · 2022-08-02T16:34:05Z

probavee
Aug 2, 2022

Hello !
I trained distilCamemBert using spacy's project for parser and tagging to have a lighter and faster model but there's almost no difference with the fr_core_news_trf based on Camembert. However the paper on distilbert suggest a 2x speed up.

I did some benchmark on this text using different approach.
My environment :
jupyter notebook served with jupyter lab in a docker container
gpu: Tesla P100-PCIE-16GB
cuda: 11.6
spacy 3.4.1
distilcamembert pipeline:
["transformer","tagger","morphologizer","trainable_lemmatizer","parser"]

import spacy
import pickle
model = "model-best" # distilcamembert, else: fr_dep_news_trf
GPU = True
nlp = spacy.load(model)
if GPU:
    print(spacy.prefer_gpu()) # True
text_chars_size = []
text_tokens_size = []
text_process_time = []
print(f"start benchmark {model} with GPU = {GPU}")
for i, sent in enumerate(my_db_test): # mydb contains 525 strings
    char_len_sent = len(sent) # number of char 
    t0 = dt.datetime.now()
    doc = nlp(sent)
    t1 = dt.datetime.now()
    tok_len = len(doc) #  number of spacy's token
    text_chars_size.append(char_len_sent)
    text_tokens_size.append(tok_len)
    text_process_time.append(t1-t0)
    # ---- CUDA out of memory without below code
    doc._.trf_data = None
    cupy.get_default_memory_pool().free_all_blocks()
    torch.cuda.empty_cache()
    # -----
results = {"sentence_chars_size": text_chars_size, "sentence_tokens_size":text_tokens_size, "sentence_process_time":text_process_time}

(I don't know why it spikes: google cloud infra, spacy or notebook)

Then for bigger documents

So I wanted to know if someone has clues on why it isn't twice as fast and where does those spikes comes from ?
Also the transformer component complexity is expected to be quadratic ?
Finally does the training config can influence the inference speed ?

Thank you !

adrianeboyd · 2022-08-03T06:52:48Z

adrianeboyd
Aug 3, 2022

For a more accurate comparison, train with the tagger_parser_ud project for both transformer models, where the only difference between the configs is the transformer model name, rather than comparing to fr_dep_news_trf. There are a number of different settings in the provided trained pipeline configs that can affect the speed and results.

The speed of the following components in the pipeline is similar no matter which transformer model is used, so you won't see a 2x difference in the whole pipeline. If you want to test just the transformer speed, you can try disabling all the other components in the pipeline temporarily.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inference speed between distilled and parent model is almost the same #11257

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Inference speed between distilled and parent model is almost the same #11257

Uh oh!

Uh oh!

probavee Aug 2, 2022

Replies: 1 comment

Uh oh!

adrianeboyd Aug 3, 2022

probavee
Aug 2, 2022

adrianeboyd
Aug 3, 2022