Differences in Whisper Results When Executing via Code vs. Console Command with the Same Parameters #1893
Unanswered
Igarugueri
asked this question in
Q&A
Replies: 1 comment 1 reply
-
See this discussion, and note the inclusion of |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
NOTE: I have posted this message in the Show and Tell section, I apologize because this is the appropriate section.
Hello whisper community,
I am encountering unexpected behaviour when using Whisper, OpenAI's voice-to-text transcription model. I've noticed that I get different results when running the model through a Python script compared to direct execution in the console, even though I am using the same parameters in both cases.
-Parameters Used-
Execution Environment: Running on Windows 11, Python 3.9.9
Whisper Version: 20231117
Parameters Used:
In the script: (file_path: str, model_size: str = "small", word_timestamps: bool = True,
language: str = "Spanish", translate: bool = True)
In the console: > whisper file.wav --word_timestamps True --language es --task translate --model small
-Issue-
When using Whisper, I am encountering differences in performance between running it through a Python script and executing the same command directly in the console:
Console Command Execution:
The transcription captures all words accurately and recognizes pauses correctly, aligning well with the audio.
Python Script Execution:
In the Python script, the primary issue lies in the 'segments' section of the output. The script fails to detect pauses accurately, missing several of them compared to the console execution.
Additionally, the quality of the transcription is somewhat a bit worse compared to the console execution.
This discrepancy is puzzling, especially since the parameters and environment are consistent across both methods of execution.
I would like to understand why there is this discrepancy. Could it be due to differences in the execution environment, or is there something else I might be overlooking?
I appreciate any guidance or suggestions you can provide to help me solve this mystery.
Thank you in advance for your time and help!
Best regards,
Igarugueri.
NOTE: Additional Audio File Details
Just as an additional note, here are some key details of the audio file (file.wav) I am using for the transcription:
File Format: WAV / WAVE (Waveform Audio)
Audio Duration: Approximately 112.56 seconds (about 1 minute and 52 seconds)
Audio Codec: PCM signed 16-bit little-endian (pcm_s16le)
Sample Rate: 44,100 Hz
Audio Channels: 2 (stereo)
Bit Rate: Approximately 1,411,200 bits per second (1.41 Mbps)
File Size: Approximately 19.86 MB
Beta Was this translation helpful? Give feedback.
All reactions