Is there any built-in way in Silero VAD to enforce a minimum output segment length? #763

NenarTu · 2026-03-04T06:51:36Z

NenarTu
Mar 4, 2026

Hi! I am using Silero VAD to segment my audio. Some of the detected speech segments are very short because the spoken utterances themselves are short. However, for my downstream processing, I would like every final speech segment to be at least 3 seconds long.

Initially, I thought that the parameter min_speech_duration_ms controlled the minimum length of each output segment. However, after reviewing the code and documentation, I realized that this parameter simply discards speech segments shorter than the specified duration. But I do not want to remove short speech segments. Instead, I would like to keep all detected speech but ensure that the final segments are no shorter than 3 seconds, possibly by merging adjacent segments when necessary.

My question is: is there any built-in way in Silero VAD to enforce a minimum output segment length (e.g., 3 seconds) without discarding short segments? Or is post-processing (manually merging adjacent segments) the recommended approach in this case?

Answered by snakers4

Mar 4, 2026

min_silence_duration_ms

One approach would be to try increasing this.
If you domain has quite long utterances separated by long silences, it can achieve your goal.

Or is post-processing (manually merging adjacent segments) the recommended approach in this case?

Probably yes.
You see, if there is a small silence between some speech, and we merge it - we kind of lose information, hence we do not do it.
If we enforce minimal speech length and there is no proper speech of such length, we will be either deleting information or introducing bias.

View full answer

snakers4 · 2026-03-04T07:55:45Z

snakers4
Mar 4, 2026
Maintainer

min_silence_duration_ms

One approach would be to try increasing this.
If you domain has quite long utterances separated by long silences, it can achieve your goal.

Or is post-processing (manually merging adjacent segments) the recommended approach in this case?

Probably yes.
You see, if there is a small silence between some speech, and we merge it - we kind of lose information, hence we do not do it.
If we enforce minimal speech length and there is no proper speech of such length, we will be either deleting information or introducing bias.

1 reply

NenarTu Mar 4, 2026
Author

I got it, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any built-in way in Silero VAD to enforce a minimum output segment length? #763

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is there any built-in way in Silero VAD to enforce a minimum output segment length? #763

Uh oh!

NenarTu Mar 4, 2026

Replies: 1 comment · 1 reply

Uh oh!

snakers4 Mar 4, 2026 Maintainer

Uh oh!

NenarTu Mar 4, 2026 Author

NenarTu
Mar 4, 2026

Replies: 1 comment 1 reply

snakers4
Mar 4, 2026
Maintainer

NenarTu Mar 4, 2026
Author