Transcribing hour length audio in Japanese without translating it back into english? #1484

KellAven · 2023-06-30T12:08:36Z

KellAven
Jun 30, 2023

Hi I started using whisper today with the below setup and it was able to transcribe a two hour long meeting audio from Japanese into English almost perfectly (I was surprised with the high quality of the translation). However this was a mistake because I needed the audio in Japanese so I changed the result line from:

result = model.transcribe(audio_path)

to

result = model.transcribe(audio_path, language="Japanese")

However this returned a transcription that only lasted the first hour and no matter how many times I attempted the transcription with the language set as Japanese it'll always drop out somewhere around the 1 hour mark. When I tried transcribing from JPN to ENG again that was fine, it gave me that perfect translation.

Does anyone know what is happening and how I can modify the code to keep it from translating the audio before transcribing?

import whisper
import os

# load the model and transcribe the audio
model = whisper.load_model("large-v2")

# input audio file name
audio_path = "/content/drive/MyDrive/{meeting_note}.mp3"

result = model.transcribe(audio_path)

# extract the text and language information
text = result["text"]
language = result["language"]

# create the output text file name based on the input mp3 file name
file_name = audio_path.split("/")[-1].split(".")[0] + ".txt"

# write the text and language information to the output text file
with open(file_name, "w") as f:
    f.write(f"Language: {language}\n\nText:\n\n{text}")

# print the text and language information to the console
print("Language: ", language)
print("Text:\n\n", text)

Answered by KellAven

Jun 30, 2023

Note to others who come across this thread: it seems like the current state of Whisper has a limit on how long it can transcribe for languages other than english as outlined by other users who are working on active projects:

#397
#1378

For now my solution is to break my audio up into 15 minute chunks and transcribing them individually before concatenating back together.

View full answer

glangford · 2023-06-30T13:31:57Z

glangford
Jun 30, 2023

There isn't enough information here to say for certain what is happening. Are you running this on a local machine, or could you be running into a quota or usage limitation on a cloud service?

1 reply

KellAven Jun 30, 2023
Author

I'm currently running this on a free instance of google colab. I've confirmed that it is transcribing to the end for ENG but for any other language it will only go up to the 1 hour mark. But it doesn't terminate, I've checked the execution time and both JPN and ENG take approx. 20 minutes (even if Japanese only writes the first hour of the meeting). So it is definitely not hitting the quota and the VM is not terminating.

KellAven · 2023-06-30T14:36:59Z

KellAven
Jun 30, 2023
Author

Note to others who come across this thread: it seems like the current state of Whisper has a limit on how long it can transcribe for languages other than english as outlined by other users who are working on active projects:

For now my solution is to break my audio up into 15 minute chunks and transcribing them individually before concatenating back together.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transcribing hour length audio in Japanese without translating it back into english? #1484

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Transcribing hour length audio in Japanese without translating it back into english? #1484

Uh oh!

KellAven Jun 30, 2023

Replies: 2 comments · 1 reply

Uh oh!

glangford Jun 30, 2023

Uh oh!

KellAven Jun 30, 2023 Author

Uh oh!

KellAven Jun 30, 2023 Author

KellAven
Jun 30, 2023

Replies: 2 comments 1 reply

glangford
Jun 30, 2023

KellAven Jun 30, 2023
Author

KellAven
Jun 30, 2023
Author