Identifying non-speech? #2530
icsy7867
started this conversation in
Show and tell
Replies: 2 comments 1 reply
-
Actually thinking about this.... Maybe it would be better to just take the lion audio, split it into well defined sections, and then combine it with people talking so that I know programmatically the transcription of the audio. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Try the whisper-at variant. YMMV. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
So this is something I am trying to do to learn and to build a dataset.
This is a bit of a ridiculous thing I'm trying to do, but it's been very educational.
Long story short, let's say I have a ton of audio clips of people speaking and animal noises. In this particular case, people are talking, but if there are lions roaring between the people talking I want to insert (test).
(I'm experimenting with creating my own special token and training a tts model).
I have a feeling that something like this would require fine tuning, but just thought I'd ask!
Beta Was this translation helpful? Give feedback.
All reactions