VAD Issues using medium.en

As described in #3139 grab the audio for Aladdin from [Archive.org](https://archive.org/details/short_story_034_0810_librivox/shortstory034_aladdinandthemagiclamp_llf.mp3) and run it through whisper-cli to get the hang of things (I rename the horrid filename to aladdin.mp3)

Now run it through VAD and output to json so something like this...

> whisper-cli -m ggml-medium.en.bin -f aladdin.mp3 --vad -vm ggml-silero-v5.1.2.bin -ojf -of aladdin

Examine the json and note that the first word is VERY long... 16 seconds to say 'there'

```
{
	"text": " There",
	"timestamps": {
		"from": "00:00:01,190",
		"to": "00:00:17,320"
	},
	"offsets": {
		"from": 1190,
		"to": 17320
	},
	"id": 1318,
	"p": 0.156763,
	"t_dtw": -1
},
```

In reality what's happened is that medium.en has cut a load of text at the start (credit stuff) and included it in 'there'

This happens with varying degrees depending on which model is used. The smaller the model the less de-crediting occurs


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VAD Issues using medium.en #3207

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VAD Issues using medium.en #3207

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions