Identifying non-speech? #2530

icsy7867 · 2025-02-15T20:12:08Z

icsy7867
Feb 15, 2025

So this is something I am trying to do to learn and to build a dataset.

This is a bit of a ridiculous thing I'm trying to do, but it's been very educational.

Long story short, let's say I have a ton of audio clips of people speaking and animal noises. In this particular case, people are talking, but if there are lions roaring between the people talking I want to insert (test).

(I'm experimenting with creating my own special token and training a tts model).

I have a feeling that something like this would require fine tuning, but just thought I'd ask!

icsy7867 · 2025-02-15T22:38:36Z

icsy7867
Feb 15, 2025
Author

Actually thinking about this....

Maybe it would be better to just take the lion audio, split it into well defined sections, and then combine it with people talking so that I know programmatically the transcription of the audio.

0 replies

whicks1 · 2025-02-19T03:28:37Z

whicks1
Feb 19, 2025

Try the whisper-at variant. YMMV.

1 reply

icsy7867 Feb 20, 2025
Author

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Identifying non-speech? #2530

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Identifying non-speech? #2530

Uh oh!

icsy7867 Feb 15, 2025

Replies: 2 comments · 1 reply

Uh oh!

icsy7867 Feb 15, 2025 Author

Uh oh!

whicks1 Feb 19, 2025

Uh oh!

icsy7867 Feb 20, 2025 Author

icsy7867
Feb 15, 2025

Replies: 2 comments 1 reply

icsy7867
Feb 15, 2025
Author

whicks1
Feb 19, 2025

icsy7867 Feb 20, 2025
Author