First inference is slower than following #757

hvassard · 2026-02-18T15:35:05Z

hvassard
Feb 18, 2026

❓ Questions and Help

Hi there! First of all, thank you for this super nice tool ! I love it ! 🚀

ℹ️ Context

In my case I'm using it in order to determine if an audio contains speech before sending it to a TTS model (as some TTS model hallucinates when processing speechless audio).
As a consequence, the time spent calling this tool is an important metric for my usecase, and after some tests of this repo, it looks like the first inference is slower than following calls, and I was wondering why (maybe I did something wrong ?).

🐛 Description of the bug

I created a dataset of 10 copies of the same wav audio file (length: 1 min)

And when I call the model on each of this file sequentially, it seems that the first calls is always slower (like 2-5 times slower). I don't think it is directly linked to the model loading as I made this step before timing the inferences.
Question : Is this a normal behavior ? Is there a simple workaround to avoid that ?

🔁 Reproduce this behavior

Technical details

Environment : CPU
Python version : 3.12.10 + uv virtual environment
- uv init --python 3.12.10 then uv add silero-vad torchcodec loguru
Main packages versions :
- silero-vad: 6.2.0
- torch: 2.10.0
- torchaudio: 2.10.0
- torchcodec: 0.10.0
- ... (if needed I can provide my exact project.toml and uv.lock file)
Launch the script given below that first loads the model, then calls the model on each audio file. Here is an example of the output of the script :

I randomly shuffle my list of 10 audio files before calling the model. See this other run where again the first call is slower :

Script used :

import random
import time
from pathlib import Path

from loguru import logger
from silero_vad import get_speech_timestamps, load_silero_vad, read_audio


def load_model():
    start = time.time()
    model = load_silero_vad()
    end = time.time()
    elapsed = end - start
    logger.info(f"Model loaded in {elapsed:.2f}s")
    return model


def main():
    dataset_folder = Path("data/my_dataset")
    model = load_model()
    files = [file_path for file_path in dataset_folder.glob("*.wav")]
    random.shuffle(files)

    for audio_path in files:
        start = time.time()
        wav = read_audio(str(audio_path))
        speech_timestamps = get_speech_timestamps(
            wav,
            model,
            return_seconds=True,  # Return speech timestamps in seconds (default is samples)
        )
        end = time.time()
        elapsed = end - start
        logger.info(
            f"{audio_path.name} ({elapsed:.2f}s) : {len(speech_timestamps)} speech segment"
        )


if __name__ == "__main__":
    main()

I'd love to have (approximately) the same inference time over each run of the same file, but maybe this coldstart is just inevitable ? Let me know if you have any idea 😉

Answered by snakers4

Feb 18, 2026

Hi,

Looks like you are using the torch jit-model. It is inevitable that is has some warm-up during first inference.
If the inference takes a second, it is a rather lengthy file (i.e. 1 minute). You can warm-up the model on a shorter file.

You can compare the jit-model with the onnx model.
Also if the session is ended properly (see .reset_state), you do not need to recreate the model each time you run new inference.

View full answer

snakers4 · 2026-02-18T19:11:12Z

snakers4
Feb 18, 2026
Maintainer

Hi,

Looks like you are using the torch jit-model. It is inevitable that is has some warm-up during first inference.
If the inference takes a second, it is a rather lengthy file (i.e. 1 minute). You can warm-up the model on a shorter file.

You can compare the jit-model with the onnx model.
Also if the session is ended properly (see .reset_state), you do not need to recreate the model each time you run new inference.

3 replies

hvassard Feb 19, 2026
Author

Hi ! Thank you for you very quick answer ! 🔥

In the script I shared I don't think I'm recreating the model before every inference (am I ?). I load the model once at the beggining and then use the same model for all inferences.

I didn't notice I could change the model to Onnx. If I got it right all I had to do to use the Onnx model was :

changing load_silero_vad() to load_silero_vad(onnx=True) and
installing onnxruntime

And from what I understand the .reset_states() call is made here for OnnxWrapper so I guess I don't have to do some manual reset on my own ?

Here are the results I get with and without a warmup on onnx vs non-onnx model. To warmup the model i'm calling get_speech_timestamps() on a 0.1s .wav audio file before running my 10 inferences, is that what you meant by a "warmup" ?

Config	Results
`onnx=True` with warmup
`onnx=True` without warmup
`onnx=False` with warmup
`onnx=False` without warmup

TLDR : It looks like in my case warming up with a 0.1s audio file takes as long time as calling the model on my 60s audio file with no warmup for both Onnx and Non Onnx model. However Onnx inferences are faster than non onnx inference (normal, it's one of Onnx benefits)

snakers4 Feb 19, 2026
Maintainer

And from what I understand the .reset_states() call is made here for OnnxWrapper so I guess I don't have to do some manual reset on my own ?

Ideally, you should reset the state, if you reuse the wrapper object and start a new audio track.

Here are the results I get with and without a warmup on onnx vs non-onnx model. To warmup the model i'm calling get_speech_timestamps() on a 0.1s .wav audio file before running my 10 inferences, is that what you meant by a "warmup" ?

Yes. It is strange that warmup takes so much time. We need to re-check on our side as well.

hvassard Feb 19, 2026
Author

OK thanks for the tip !
I added model.reset_states() after each call made to get_speech_timestamps(...).
However I don't see any impact on inference time :

Logs when reseting state between each inference :

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First inference is slower than following #757

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

First inference is slower than following #757

Uh oh!

Uh oh!

hvassard Feb 18, 2026

❓ Questions and Help

ℹ️ Context

🐛 Description of the bug

🔁 Reproduce this behavior

Replies: 1 comment · 3 replies

Uh oh!

Uh oh!

snakers4 Feb 18, 2026 Maintainer

Uh oh!

Uh oh!

hvassard Feb 19, 2026 Author

Uh oh!

snakers4 Feb 19, 2026 Maintainer

Uh oh!

hvassard Feb 19, 2026 Author

hvassard
Feb 18, 2026

Replies: 1 comment 3 replies

snakers4
Feb 18, 2026
Maintainer

hvassard Feb 19, 2026
Author

snakers4 Feb 19, 2026
Maintainer

hvassard Feb 19, 2026
Author