Skip to content

Suggestion: Add Advanced Interactive Batch Script for Windows (with Auto-Conversion & Anti-Repetition Features)Β #3451

@Koaxz

Description

@Koaxz

I propose adding an advanced interactive batch script (.bat) for Windows that significantly simplifies the entire transcription process, making it more accessible and robust for non-technical users. This script acts as a user-friendly wrapper for whisper-cli.exe and ffmpeg.exe.

The initial script has been enhanced to address the common issue of repetitive loops or "hallucinations" (e.g., repeating the last phrase) that often occur with large-v3 models, especially on audio with silent segments.

Key Features of the Script:

  • Interactive Prompts: The script interactively asks the user for the media file path and the desired language.
  • Default Language: It suggests a default language (en) if the user simply presses Enter.
  • Automatic FFmpeg Conversion: It automatically detects if the input file is not a .wav file (e.g., .mp4, .mp3, .m4a). If so, it uses FFmpeg to convert the file to the required 16-bit, 16kHz WAV format in the background. A simple, non-problematic temporary filename (_temp_audio_for_whisper.wav) is used to avoid character encoding issues.
  • Automatic Output & Display: The transcription result is automatically saved to a .txt file with the same name as the input media file. The result is also displayed directly in the console window.
  • Automatic Cleanup: The temporary WAV file created during conversion is automatically deleted after the process is complete.
  • Built-in Correction for Repetitive Loops (Hallucinations): The script offers the user a choice between two effective methods to combat transcription loops:
  1. Simple Fix (Parameter Tuning): A quick method that adjusts decoding parameters (--entropy-thold, --beam-size, --max-context) to prevent the model from getting stuck.
  2. Robust Fix (VAD): A more reliable method that uses a Voice Activity Detection model to process only speech segments, ignoring the silence where loops often begin.

Prerequisites for the user:

  1. whisper-cli.exe and ffmpeg.exe should be available.
  2. The user needs to edit the first few lines of the script to set the correct paths to the executables and the desired model.
  3. (For the VAD method) The user should download the Silero VAD model using the provided download-vad-model.cmd script. The script is pre-configured to look for this model.

Proposed Script (run-whisper-interactive.bat):

@echo off
setlocal enabledelayedexpansion

:: Set the console to UTF-8 to correctly handle different languages
chcp 65001 > nul
cls

:: --- USER CONFIGURATION ---
:: Please set the full paths to your executables and model files.
:: IMPORTANT: Do NOT use quotes in these paths.
set "WHISPER_EXE=C:\path\to\whisper.cpp\whisper-cli.exe"
set "FFMPEG_EXE=C:\path\to\ffmpeg\bin\ffmpeg.exe"
set "MODEL_PATH=C:\path\to\whisper.cpp\models\ggml-large-v3.bin"

:: Path to the Voice Activity Detection (VAD) model.
:: Download it by running: .\models\download-vad-model.cmd silero-v5.1.2
set "VAD_MODEL_PATH=C:\path\to\whisper.cpp\models\ggml-silero-v5.1.2.bin"

echo =================================================================
echo Whisper.cpp Advanced Interactive Runner
echo (with Auto-Conversion and Anti-Repetition Features)
echo =================================================================
echo.

:: --- Step 1: Prompt for the media file path ---
set "MEDIA_FILE="
set /p MEDIA_FILE="Enter the full path to your audio or video file: "
if not defined MEDIA_FILE (
echo.
echo Error: File path was not provided.
goto end
)
:: Remove quotes if the user pasted them
set "MEDIA_FILE=!MEDIA_FILE:"=!"

:: --- Step 2: Prompt for the language ---
echo.
set /p LANGUAGE="Enter language code (or press Enter for 'en'): "
if "!LANGUAGE!"=="" set LANGUAGE=en

:: --- Step 3: Choose a correction method for transcription loops ---
cls
echo ==========================================================
echo To prevent repetitive loops, please choose a method:
echo ==========================================================
echo.
echo [1] Simple Fix (Parameter Tuning)
echo - A quick fix that helps in most cases.
echo.
echo [2] Robust Fix (Using VAD)
echo - Recommended for audio with long pauses or silence.
echo - Requires the VAD model to be downloaded.
echo.
set "CHOICE="
set /p CHOICE="Enter 1 or 2 and press Enter: "

:: --- Set arguments based on user's choice ---
set "WHISPER_ARGS="
if "!CHOICE!"=="1" (
set "WHISPER_ARGS=--entropy-thold 2.8 --beam-size 5 --max-context 64"
echo. & echo You chose [1]: Simple Fix.
)
if "!CHOICE!"=="2" (
if not exist "!VAD_MODEL_PATH!" (
echo.
echo ERROR: VAD model not found at "!VAD_MODEL_PATH!"
echo Please download it first before using this option.
goto end
)
set "WHISPER_ARGS=--vad --vad-model "!VAD_MODEL_PATH!" --entropy-thold 2.8 --beam-size 5"
echo. & echo You chose [2]: Robust Fix with VAD.
)
if not defined WHISPER_ARGS (
echo Error: Invalid selection. Aborting.
goto end
)
echo. & echo Press any key to continue...
pause > nul

:: --- Define output paths and prepare for conversion ---
for %%F in ("!MEDIA_FILE!") do (
set "OUTPUT_FILE=%%~dpnF.txt"
set "FILE_EXT=%%~xF"
set "TEMP_WAV_FILE=%%~dpF_temp_audio_for_whisper.wav"
)

set "AUDIO_TO_PROCESS=!MEDIA_FILE!"
set "DELETE_TEMP_FILE=0"

:: --- Step 4: Convert the file to WAV if it's not already ---
if /i not "!FILE_EXT!"==".wav" (
cls
echo Non-WAV file detected. Starting conversion...
echo Source: "!MEDIA_FILE!"
echo.

"!FFMPEG_EXE!" -hide_banner -i "!MEDIA_FILE!" -ar 16000 -ac 1 -c:a pcm_s16le -y "!TEMP_WAV_FILE!"
if !errorlevel! neq 0 (
    echo.
    echo ERROR: Failed to convert the file using FFmpeg.
    goto end
)

set "AUDIO_TO_PROCESS=!TEMP_WAV_FILE!"
set "DELETE_TEMP_FILE=1"
echo.
echo Conversion complete.

)

:: --- Step 5: Run the transcription ---
cls
echo --- Running with the following parameters ---
echo Model: !MODEL_PATH!
echo File: "!AUDIO_TO_PROCESS!"
echo Language: !LANGUAGE!
echo Arguments: !WHISPER_ARGS!
echo Saving to: "!OUTPUT_FILE!"
echo ------------------------------------------
echo.
echo Starting transcription... Please wait.
echo.

"!WHISPER_EXE!" -m "!MODEL_PATH!" -l "!LANGUAGE!" -f "!AUDIO_TO_PROCESS!" !WHISPER_ARGS! > "!OUTPUT_FILE!"

:: --- Step 6: Display result and clean up ---
echo.
echo ==========================================================
echo TRANSCRIPTION RESULT:
echo ==========================================================
echo.
type "!OUTPUT_FILE!"

if "!DELETE_TEMP_FILE!"=="1" (
del "!AUDIO_TO_PROCESS!"
echo.
echo (Temporary WAV file has been deleted)
)

echo.
echo.
echo ==========================================================
echo Transcription finished.
echo.
echo The result has been saved to:
echo !OUTPUT_FILE!
echo ==========================================================

:end
echo.
echo Press any key to close this window.
pause > nul

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions