Skip to content

Powershell script that uses OpenAI whisper model locally to transcribe, clean, and combine the multi-track audio output of Discord Craig.

Notifications You must be signed in to change notification settings

jmutchek/craig-whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Craig Whisper Transcription Script

Powershell script that uses OpenAI whisper model locally to transcribe, clean, and combine the multi-track audio output of Discord Craig.

Overview

transcribe.ps1 is a PowerShell script designed to automate the transcription, cleaning, and combination of multi-track audio files, specifically those generated by a Discord Craig bot. The script leverages OpenAI's Whisper model to perform high-quality speech-to-text transcription locally on a Windows machine.

This script is ideal for users who need to process multi-track audio recordings, such as podcasts, interviews, or collaborative discussions, and generate consolidated, easy-to-read transcripts in multiple formats.

Features

  • Automated Transcription: Uses OpenAI's Whisper model (large-v2) locally for accurate transcription of audio files.
  • Multi-Track Support: Processes multiple audio files with a naming convention (n-playername_m) to identify speakers.
  • Error Handling: Logs errors and tracks transcription progress to allow resumption of interrupted processes.
  • Output Formats:
    • Individual TSV files for each transcription.
    • Combined TSV file with all transcripts sorted by timestamp.
    • Plain text transcript (final_transcript.txt).
    • Markdown-formatted transcript (transcript.md) with timestamps and speaker names (ideal for building a custom GPT).
  • Post-Processing:
    • Consolidates consecutive identical lines from the same speaker.
    • Filters out noisy words (e.g., "you") if they appear alone.
  • Statistics: Collects and displays processing statistics, including total files processed, success/failure counts, and average processing time.
  • Cleanup Option: Removes temporary files after processing if specified.

Requirements

To use the script, ensure the following dependencies are installed and available in your system's PATH:

  • PowerShell 7.5.0
  • Python (with pip installed)
  • OpenAI Whisper (pip install -U openai-whisper)
  • FFmpeg

The audio files must be in a format supported by FFmpeg (e.g., MP3, WAV, M4A, FLAC, OGG, AAC, MP4, WMA).

Usage

Run the script from a PowerShell terminal with the following parameters:

Parameters

  • -InputFolder (Required): Path to the folder containing audio files to transcribe.
  • -OutputFolder (Optional): Path to the folder where transcription outputs will be saved. Defaults to <InputFolder>\..\transcriptions.
  • -Force (Optional): Forces re-transcription of already processed files.
  • -PostProcessOnly (Optional): Skips transcription and only performs post-processing on existing transcripts.
  • -Cleanup (Optional): Removes temporary files after processing.

Examples

  1. Transcribe all audio files in a folder:
    .\transcribe.ps1 -InputFolder "C:\path\to\audio\files"

About

Powershell script that uses OpenAI whisper model locally to transcribe, clean, and combine the multi-track audio output of Discord Craig.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published