-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Description
I propose adding an advanced interactive batch script (.bat) for Windows that significantly simplifies the entire transcription process, making it more accessible and robust for non-technical users. This script acts as a user-friendly wrapper for whisper-cli.exe and ffmpeg.exe.
The initial script has been enhanced to address the common issue of repetitive loops or "hallucinations" (e.g., repeating the last phrase) that often occur with large-v3 models, especially on audio with silent segments.
Key Features of the Script:
- Interactive Prompts: The script interactively asks the user for the media file path and the desired language.
- Default Language: It suggests a default language (en) if the user simply presses Enter.
- Automatic FFmpeg Conversion: It automatically detects if the input file is not a .wav file (e.g., .mp4, .mp3, .m4a). If so, it uses FFmpeg to convert the file to the required 16-bit, 16kHz WAV format in the background. A simple, non-problematic temporary filename (_temp_audio_for_whisper.wav) is used to avoid character encoding issues.
- Automatic Output & Display: The transcription result is automatically saved to a .txt file with the same name as the input media file. The result is also displayed directly in the console window.
- Automatic Cleanup: The temporary WAV file created during conversion is automatically deleted after the process is complete.
- Built-in Correction for Repetitive Loops (Hallucinations): The script offers the user a choice between two effective methods to combat transcription loops:
- Simple Fix (Parameter Tuning): A quick method that adjusts decoding parameters (--entropy-thold, --beam-size, --max-context) to prevent the model from getting stuck.
- Robust Fix (VAD): A more reliable method that uses a Voice Activity Detection model to process only speech segments, ignoring the silence where loops often begin.
Prerequisites for the user:
- whisper-cli.exe and ffmpeg.exe should be available.
- The user needs to edit the first few lines of the script to set the correct paths to the executables and the desired model.
- (For the VAD method) The user should download the Silero VAD model using the provided download-vad-model.cmd script. The script is pre-configured to look for this model.
Proposed Script (run-whisper-interactive.bat):
@echo off
setlocal enabledelayedexpansion
:: Set the console to UTF-8 to correctly handle different languages
chcp 65001 > nul
cls
:: --- USER CONFIGURATION ---
:: Please set the full paths to your executables and model files.
:: IMPORTANT: Do NOT use quotes in these paths.
set "WHISPER_EXE=C:\path\to\whisper.cpp\whisper-cli.exe"
set "FFMPEG_EXE=C:\path\to\ffmpeg\bin\ffmpeg.exe"
set "MODEL_PATH=C:\path\to\whisper.cpp\models\ggml-large-v3.bin"
:: Path to the Voice Activity Detection (VAD) model.
:: Download it by running: .\models\download-vad-model.cmd silero-v5.1.2
set "VAD_MODEL_PATH=C:\path\to\whisper.cpp\models\ggml-silero-v5.1.2.bin"
echo =================================================================
echo Whisper.cpp Advanced Interactive Runner
echo (with Auto-Conversion and Anti-Repetition Features)
echo =================================================================
echo.
:: --- Step 1: Prompt for the media file path ---
set "MEDIA_FILE="
set /p MEDIA_FILE="Enter the full path to your audio or video file: "
if not defined MEDIA_FILE (
echo.
echo Error: File path was not provided.
goto end
)
:: Remove quotes if the user pasted them
set "MEDIA_FILE=!MEDIA_FILE:"=!"
:: --- Step 2: Prompt for the language ---
echo.
set /p LANGUAGE="Enter language code (or press Enter for 'en'): "
if "!LANGUAGE!"=="" set LANGUAGE=en
:: --- Step 3: Choose a correction method for transcription loops ---
cls
echo ==========================================================
echo To prevent repetitive loops, please choose a method:
echo ==========================================================
echo.
echo [1] Simple Fix (Parameter Tuning)
echo - A quick fix that helps in most cases.
echo.
echo [2] Robust Fix (Using VAD)
echo - Recommended for audio with long pauses or silence.
echo - Requires the VAD model to be downloaded.
echo.
set "CHOICE="
set /p CHOICE="Enter 1 or 2 and press Enter: "
:: --- Set arguments based on user's choice ---
set "WHISPER_ARGS="
if "!CHOICE!"=="1" (
set "WHISPER_ARGS=--entropy-thold 2.8 --beam-size 5 --max-context 64"
echo. & echo You chose [1]: Simple Fix.
)
if "!CHOICE!"=="2" (
if not exist "!VAD_MODEL_PATH!" (
echo.
echo ERROR: VAD model not found at "!VAD_MODEL_PATH!"
echo Please download it first before using this option.
goto end
)
set "WHISPER_ARGS=--vad --vad-model "!VAD_MODEL_PATH!" --entropy-thold 2.8 --beam-size 5"
echo. & echo You chose [2]: Robust Fix with VAD.
)
if not defined WHISPER_ARGS (
echo Error: Invalid selection. Aborting.
goto end
)
echo. & echo Press any key to continue...
pause > nul
:: --- Define output paths and prepare for conversion ---
for %%F in ("!MEDIA_FILE!") do (
set "OUTPUT_FILE=%%~dpnF.txt"
set "FILE_EXT=%%~xF"
set "TEMP_WAV_FILE=%%~dpF_temp_audio_for_whisper.wav"
)
set "AUDIO_TO_PROCESS=!MEDIA_FILE!"
set "DELETE_TEMP_FILE=0"
:: --- Step 4: Convert the file to WAV if it's not already ---
if /i not "!FILE_EXT!"==".wav" (
cls
echo Non-WAV file detected. Starting conversion...
echo Source: "!MEDIA_FILE!"
echo.
"!FFMPEG_EXE!" -hide_banner -i "!MEDIA_FILE!" -ar 16000 -ac 1 -c:a pcm_s16le -y "!TEMP_WAV_FILE!"
if !errorlevel! neq 0 (
echo.
echo ERROR: Failed to convert the file using FFmpeg.
goto end
)
set "AUDIO_TO_PROCESS=!TEMP_WAV_FILE!"
set "DELETE_TEMP_FILE=1"
echo.
echo Conversion complete.
)
:: --- Step 5: Run the transcription ---
cls
echo --- Running with the following parameters ---
echo Model: !MODEL_PATH!
echo File: "!AUDIO_TO_PROCESS!"
echo Language: !LANGUAGE!
echo Arguments: !WHISPER_ARGS!
echo Saving to: "!OUTPUT_FILE!"
echo ------------------------------------------
echo.
echo Starting transcription... Please wait.
echo.
"!WHISPER_EXE!" -m "!MODEL_PATH!" -l "!LANGUAGE!" -f "!AUDIO_TO_PROCESS!" !WHISPER_ARGS! > "!OUTPUT_FILE!"
:: --- Step 6: Display result and clean up ---
echo.
echo ==========================================================
echo TRANSCRIPTION RESULT:
echo ==========================================================
echo.
type "!OUTPUT_FILE!"
if "!DELETE_TEMP_FILE!"=="1" (
del "!AUDIO_TO_PROCESS!"
echo.
echo (Temporary WAV file has been deleted)
)
echo.
echo.
echo ==========================================================
echo Transcription finished.
echo.
echo The result has been saved to:
echo !OUTPUT_FILE!
echo ==========================================================
:end
echo.
echo Press any key to close this window.
pause > nul