Replies: 3 comments 1 reply
-
I like this idea, and it could be extended to add new languages to the
tiptoi universe. One problem might be to identify different speakers to
generate different voices.
Interestingly the gme files support English as a language but there seem to
be no products on the market that actually use it.
…On Tue, Oct 8, 2024 at 9:43 AM Germling ***@***.***> wrote:
Hello, my idea is to build a Python script to identify all OGG files that
have speech, convert them to text, then translate that text and generate
audio files again. This way, I could easily translate common books like
"Auf dem Bauernhof" into English. I just have to overwrite the OGG files
and build a new GME.
Main challenges:
- Identify only those OGG files with speech
- Batch process these files using various TTS and translation APIs
—
Reply to this email directly, view it on GitHub
<#297>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQHBZBD5QWMMYRTVXNJUX3Z2OELHAVCNFSM6AAAAABPRRVACKVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZXGI4DSMZQHE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Maybe this could be solved using something like
https://github.com/noisetorch/NoiseTorch
…On Tue, Oct 8, 2024 at 11:40 AM Germling ***@***.***> wrote:
ogg_batch_translation_silero.zip
<https://github.com/user-attachments/files/17291251/ogg_batch_translation_silero.zip>
I wrote a Python script that can run through the whole media folder, scan
for speech via Silero VAD, convert German speech to text, translate the
German text to English and convert the English text to speech. An example
is attached.
The output file is temp.mp3; as you can hear, it uses a female voice by
default, and there are nicer-sound voices available.
I think the speaker types could be identified via VAD and mapped in some
way. But the challenge are large ogg files with lots of background noise. I
don't know how this could be preserved... You would need to isolate only
the speech somehow.
—
Reply to this email directly, view it on GitHub
<#297 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQHBZD53OP2JXU4ZTOWX3TZ2OSABAVCNFSM6AAAAABPRRVACKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAOBXG44TMNI>
.
You are receiving this because you commented.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
I did build this, using some bunch of bash script and speaches.ai (identify if there is any german in there) and chatterbox-TTS (for VoiceCloning, was couple of hours using a 4090). Got pretty good results. But I noticed the Pretty certain because, the "Mein Wörter-Bilderbuch XXL 49257" seems to have additional audio like "can you find all the elephants" and none of the ~1330 media files referred to them. (unless I missed it) There are 7/8 games bins in there and this additional media seems to be used only via games. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, my idea is to build a Python script to identify all OGG files that have speech, convert them to text, then translate that text and generate audio files again. This way, I could easily translate common books like "Auf dem Bauernhof" into English. I just have to overwrite the OGG files and build a new GME.
Main challenges:
Beta Was this translation helpful? Give feedback.
All reactions