Replies: 1 comment 1 reply
-
Hi @AdriAJ1 I would also expect the behaviour to be the same in this case. Can you reach out to OpenAI as this is on their side. Please post back if you find an answer and let us know if there is anything we need to do here. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have a simple script in .NET that transcribes audio to text using the Whisper model. I'm using the OpenAIAudioToText service.
I'm currently trying to transcribe a 34 second .wav audio file in French (I can share the audio if needed). I’ve encountered different results depending on whether I specify the temperature parameter in OpenAIAudioToTextExecutionSettings.
According to OpenAI’s API documentation, the temperature parameter (which defaults to 0) is defined as follows:
"The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit."
In the Semantic Kernel package, if you don't specify the temperature, it also defaults to 0. However, I've noticed differences between not specifying the parameter and explicitly setting it to null.
When examining the HTTP request payloads, I realised that:
In theory, these two behaviors should be equivalent since OpenAI defaults the parameter to 0.
However, in practice, specifying or omitting this parameter leads to significant differences in the transcription results.
If I set it to null, I get a correct transcription of the audio:
Open AI response object >> {"Duration":"00:00:34.1100006","Language":"french","Text":"J'êtes aux maires. Oui, alors? Oui, je le reviens. Ça m'entend? Oui, je vous entends. Vous avez un problème? J'ai un problème de téléphone. Les appels n'arrivent pas. Votre code client? 2 2 2 0 8 0 8 Je vais continuer à agriculture en fonction des possibilités. Merci beaucoup. Au revoir. Au entspreché . RA O K O T O N Rs .", ...}
If I don’t specify the parameter, I get an incorrect transcription where a single sentence is repeated throughout the entire text. This is not just a minor variation, it's a major discrepancy:
Open AI response object >> {"Duration":"00:00:34.1100006","Language":"french","Text":"Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie.", ...}
This doesn’t always happen. In fact, I’ve been using Whisper for months, and this is the first time I’ve encountered this issue. The audio quality isn’t perfect, but the results seems completely inconsistent to me.
I'm not sure if this is an issue on OpenAI’s end or if there’s an actual explanation for this. It seems like OpenAI does not treat 0 as the default value, so the only way to reach the default value is by actually sending null.
I’d appreciate any insights or possible explanations. Or maybe I should reach out to OpenAI directly regarding this?
Code
OS: Windows
IDE: Visual Studio 2022
Language: C#
Source: NuGet package version 1.40.1
Model: whisper-1
Beta Was this translation helpful? Give feedback.
All reactions