Skip to content

VAD Issues using medium.en #3207

@peardox

Description

@peardox

As described in #3139 grab the audio for Aladdin from Archive.org and run it through whisper-cli to get the hang of things (I rename the horrid filename to aladdin.mp3)

Now run it through VAD and output to json so something like this...

whisper-cli -m ggml-medium.en.bin -f aladdin.mp3 --vad -vm ggml-silero-v5.1.2.bin -ojf -of aladdin

Examine the json and note that the first word is VERY long... 16 seconds to say 'there'

{
	"text": " There",
	"timestamps": {
		"from": "00:00:01,190",
		"to": "00:00:17,320"
	},
	"offsets": {
		"from": 1190,
		"to": 17320
	},
	"id": 1318,
	"p": 0.156763,
	"t_dtw": -1
},

In reality what's happened is that medium.en has cut a load of text at the start (credit stuff) and included it in 'there'

This happens with varying degrees depending on which model is used. The smaller the model the less de-crediting occurs

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions