Powershell script that uses OpenAI whisper model locally to transcribe, clean, and combine the multi-track audio output of Discord Craig.
transcribe.ps1
is a PowerShell script designed to automate the transcription, cleaning, and combination of multi-track audio files, specifically those generated by a Discord Craig bot. The script leverages OpenAI's Whisper model to perform high-quality speech-to-text transcription locally on a Windows machine.
This script is ideal for users who need to process multi-track audio recordings, such as podcasts, interviews, or collaborative discussions, and generate consolidated, easy-to-read transcripts in multiple formats.
- Automated Transcription: Uses OpenAI's Whisper model (
large-v2
) locally for accurate transcription of audio files. - Multi-Track Support: Processes multiple audio files with a naming convention (
n-playername_m
) to identify speakers. - Error Handling: Logs errors and tracks transcription progress to allow resumption of interrupted processes.
- Output Formats:
- Individual TSV files for each transcription.
- Combined TSV file with all transcripts sorted by timestamp.
- Plain text transcript (
final_transcript.txt
). - Markdown-formatted transcript (
transcript.md
) with timestamps and speaker names (ideal for building a custom GPT).
- Post-Processing:
- Consolidates consecutive identical lines from the same speaker.
- Filters out noisy words (e.g., "you") if they appear alone.
- Statistics: Collects and displays processing statistics, including total files processed, success/failure counts, and average processing time.
- Cleanup Option: Removes temporary files after processing if specified.
To use the script, ensure the following dependencies are installed and available in your system's PATH:
- PowerShell 7.5.0
- Python (with
pip
installed) - OpenAI Whisper (
pip install -U openai-whisper
) - FFmpeg
The audio files must be in a format supported by FFmpeg (e.g., MP3, WAV, M4A, FLAC, OGG, AAC, MP4, WMA).
Run the script from a PowerShell terminal with the following parameters:
-InputFolder
(Required): Path to the folder containing audio files to transcribe.-OutputFolder
(Optional): Path to the folder where transcription outputs will be saved. Defaults to<InputFolder>\..\transcriptions
.-Force
(Optional): Forces re-transcription of already processed files.-PostProcessOnly
(Optional): Skips transcription and only performs post-processing on existing transcripts.-Cleanup
(Optional): Removes temporary files after processing.
- Transcribe all audio files in a folder:
.\transcribe.ps1 -InputFolder "C:\path\to\audio\files"