-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Open
Description
As described in #3139 grab the audio for Aladdin from Archive.org and run it through whisper-cli to get the hang of things (I rename the horrid filename to aladdin.mp3)
Now run it through VAD and output to json so something like this...
whisper-cli -m ggml-medium.en.bin -f aladdin.mp3 --vad -vm ggml-silero-v5.1.2.bin -ojf -of aladdin
Examine the json and note that the first word is VERY long... 16 seconds to say 'there'
{
"text": " There",
"timestamps": {
"from": "00:00:01,190",
"to": "00:00:17,320"
},
"offsets": {
"from": 1190,
"to": 17320
},
"id": 1318,
"p": 0.156763,
"t_dtw": -1
},
In reality what's happened is that medium.en has cut a load of text at the start (credit stuff) and included it in 'there'
This happens with varying degrees depending on which model is used. The smaller the model the less de-crediting occurs
Metadata
Metadata
Assignees
Labels
No labels