Differences in Whisper Results When Executing via Code vs. Console Command with the Same Parameters #1892

Igarugueri · 2023-12-11T16:11:02Z

Igarugueri
Dec 11, 2023

Hello whisper community,

I am encountering unexpected behaviour when using Whisper, OpenAI's voice-to-text transcription model. I've noticed that I get different results when running the model through a Python script compared to direct execution in the console, even though I am using the same parameters in both cases.

-Parameters Used-

Execution Environment: Running on Windows 11, Python 3.9.9
Whisper Version: 20231117
Parameters Used:
In the script: (file_path: str, model_size: str = "small", word_timestamps: bool = True,
language: str = "Spanish", translate: bool = True)
In the console: > whisper file.wav --word_timestamps True --language es --task translate --model small

-Issue-

When using Whisper, I am encountering differences in performance between running it through a Python script and executing the same command directly in the console:

Console Command Execution:
    The transcription captures all words accurately and recognizes pauses correctly, aligning well with the audio.
Python Script Execution:
    In the Python script, the primary issue lies in the 'segments' section of the output. The script fails to detect pauses accurately,   missing several of them compared to the console execution.
    Additionally, the quality of the transcription is somewhat a bit worse compared to the console execution.

This discrepancy is puzzling, especially since the parameters and environment are consistent across both methods of execution.

I would like to understand why there is this discrepancy. Could it be due to differences in the execution environment, or is there something else I might be overlooking?

I appreciate any guidance or suggestions you can provide to help me solve this mystery.

Thank you in advance for your time and help!

Best regards,

Igarugueri.

Note: Additional Audio File Details
Just as an additional note, here are some key details of the audio file (file.wav) I am using for the transcription:

File Format: WAV / WAVE (Waveform Audio)
Audio Duration: Approximately 112.56 seconds (about 1 minute and 52 seconds)
Audio Codec: PCM signed 16-bit little-endian (pcm_s16le)
Sample Rate: 44,100 Hz
Audio Channels: 2 (stereo)
Bit Rate: Approximately 1,411,200 bits per second (1.41 Mbps)
File Size: Approximately 19.86 MB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Differences in Whisper Results When Executing via Code vs. Console Command with the Same Parameters #1892

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Differences in Whisper Results When Executing via Code vs. Console Command with the Same Parameters #1892

Uh oh!

Igarugueri Dec 11, 2023

Replies: 0 comments

Igarugueri
Dec 11, 2023