Whisper AI Batch Transcribe #954

FuturizeRush · 2023-02-10T16:29:00Z

FuturizeRush
Feb 10, 2023

I hope to transcribe my MP3 files in a folder and save the transcripts on Google Drive in Google Colab. However, I noticed that the whisper AI currently does not have the input command line and there seem to be some small issues. As I am not familiar with the code at all, I would like to request help from someone who is knowledgeable. (I know there's some discussion about batch transcribe, but it looks like difficult to set up.) Thank you in advance.

!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg

import os

mp3_dir = "/content/drive/MyDrive/FileFolderName"
transcripts_dir = os.path.join(mp3_dir, "Transcribed")

if not os.path.exists(transcripts_dir):
os.makedirs(transcripts_dir)

mp3_files = [f for f in os.listdir(mp3_dir) if f.endswith(".mp3")]

for mp3 in mp3_files:
input_path = os.path.join(mp3_dir, mp3)
output_path = os.path.join(transcripts_dir, f"{mp3}.txt")
!whisper "[]" --model medium.en --input "{input_path}" --output_dir "{transcripts_dir}"

FuturizeRush · 2023-02-10T16:33:58Z

FuturizeRush
Feb 10, 2023
Author

import whisper
import os
from google.colab import auth
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaFileUpload

Load the speech recognition model

model = whisper.load_model("medium.en")

Read all MP3 files in the target directory

mp3_files = [f for f in os.listdir("/content/drive/MyDrive/FileFolder") if f.endswith(".mp3")]

Create a Google Drive API client

auth.authenticate_user()
drive_service = build("drive", "v3")

Define a function to upload a file to Google Drive

def upload_to_drive(file_path, file_name):
try:
file_metadata = {"name": file_name}
media = MediaFileUpload(file_path, mimetype="text/plain")
file = drive_service.files().create(body=file_metadata, media_body=media, fields="id").execute()
print(f"File ID: {file.get('id')}")
except HttpError as error:
print(f"Error occurred: {error}")
file = None
return file

Save the speech-to-text result as a local file and upload it to Google Drive

for i, file in enumerate(mp3_files):
result_file = f"{file}_transcript.txt"
if os.path.exists(result_file):
print(f"{result_file} already exists, skipping conversion")
continue
audio = whisper.load_audio(os.path.join("/content/drive/MyDrive/FileFolder", file))
audio = whisper.pad_or_trim(audio)
result = model.transcribe(audio)
with open(result_file, "w") as f:
f.write(result["text"])
upload_to_drive(result_file, f"/content/drive/MyDrive/FileFolder/Transcripts/{result_file}")
if not os.path.exists("/content/drive/MyDrive/FileFolder/Transcripts"):
os.makedirs("/content/drive/MyDrive/FileFolder/Transcripts")

This is another one. (Cannot use)

3 replies

eduardofc Mar 14, 2023

Hi,

I recommend you to install datasets:

from datasets import Dataset, Audio 

files = ... (a list of mp3-paths)
dataset = Dataset.from_dict({"audio": files}).cast_column("audio", Audio(sampling_rate=16000))

and transformers. Specially, I recommend you the use of its pipelines:

from tqdm import tqdm
from transformers import pipeline
from transformers.pipelines.pt_utils import KeyDataset

pipe = pipeline( 
  task="automatic-speech-recognition", 
  model="openai/whisper-large", 
  # chunk_length_s=30, 
  # device=device, 
  # ignore_warning=True 
) 

transcriptions = []
with torch.no_grad():
    for tt in tqdm(pipe(KeyDataset(dataset, "audio"), batch_size=10, truncation=True)):
        transcriptions.append(tt['text'])

inconnu11 Jul 27, 2023

Hi, do you know how to pass the whisper arguments e.g. --language, --initial_prompt into the pipeline?

eduardofc Jul 27, 2023

try to explore the configuration of the model. In my case, to translate as task and spanish as language: pipe.model.config.forced_decoder_ids = [(1, 50262), (2, 50359), (3, 50363)]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Whisper AI Batch Transcribe #954

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Whisper AI Batch Transcribe #954

Uh oh!

FuturizeRush Feb 10, 2023

Replies: 1 comment · 3 replies

Uh oh!

FuturizeRush Feb 10, 2023 Author

Load the speech recognition model

Read all MP3 files in the target directory

Create a Google Drive API client

Define a function to upload a file to Google Drive

Save the speech-to-text result as a local file and upload it to Google Drive

Uh oh!

Uh oh!

eduardofc Mar 14, 2023

Uh oh!

inconnu11 Jul 27, 2023

Uh oh!

eduardofc Jul 27, 2023

FuturizeRush
Feb 10, 2023

Replies: 1 comment 3 replies

FuturizeRush
Feb 10, 2023
Author