Possible bug in the Temperature parameter of OpenAIAudioToTextExecutionSettings. #11156

AdriAJ1 · 2025-03-24T16:53:51Z

AdriAJ1
Mar 24, 2025

Hi,

I have a simple script in .NET that transcribes audio to text using the Whisper model. I'm using the OpenAIAudioToText service.
I'm currently trying to transcribe a 34 second .wav audio file in French (I can share the audio if needed). I’ve encountered different results depending on whether I specify the temperature parameter in OpenAIAudioToTextExecutionSettings.

According to OpenAI’s API documentation, the temperature parameter (which defaults to 0) is defined as follows:
"The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit."

In the Semantic Kernel package, if you don't specify the temperature, it also defaults to 0. However, I've noticed differences between not specifying the parameter and explicitly setting it to null.
When examining the HTTP request payloads, I realised that:

If I set it to null, the parameter is omitted from the request entirely.
If I don’t specify the temperature, a temperature = 0 parameter is sent to https://api.openai.com/v1/audio/transcriptions.

In theory, these two behaviors should be equivalent since OpenAI defaults the parameter to 0.
However, in practice, specifying or omitting this parameter leads to significant differences in the transcription results.

If I set it to null, I get a correct transcription of the audio:
Open AI response object >> {"Duration":"00:00:34.1100006","Language":"french","Text":"J'êtes aux maires. Oui, alors? Oui, je le reviens. Ça m'entend? Oui, je vous entends. Vous avez un problème? J'ai un problème de téléphone. Les appels n'arrivent pas. Votre code client? 2 2 2 0 8 0 8 Je vais continuer à agriculture en fonction des possibilités. Merci beaucoup. Au revoir. Au entspreché . RA O K O T O N Rs .", ...}
If I don’t specify the parameter, I get an incorrect transcription where a single sentence is repeated throughout the entire text. This is not just a minor variation, it's a major discrepancy:
Open AI response object >> {"Duration":"00:00:34.1100006","Language":"french","Text":"Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie. Je vous en prie, je vous en prie.", ...}

This doesn’t always happen. In fact, I’ve been using Whisper for months, and this is the first time I’ve encountered this issue. The audio quality isn’t perfect, but the results seems completely inconsistent to me.
I'm not sure if this is an issue on OpenAI’s end or if there’s an actual explanation for this. It seems like OpenAI does not treat 0 as the default value, so the only way to reach the default value is by actually sending null.

I’d appreciate any insights or possible explanations. Or maybe I should reach out to OpenAI directly regarding this?

Code

var kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddOpenAIAudioToText(model, apiKey);
var kernel = kernelBuilder.Build();

var executionSettings = new OpenAIAudioToTextExecutionSettings
{
    ResponseFormat = "verbose_json",
    TimestampGranularities = ["segment"],
    Temperature = null, // If not specified defaults to 0
};

AudioContent audioContent;

// Read audio content from a file
await using (var audioFileStream = new FileStream(audioPath, FileMode.Open, FileAccess.Read))
{
    var audioFileBinaryData = await BinaryData.FromStreamAsync(audioFileStream);
    audioContent = new AudioContent(audioFileBinaryData, mimeType: "audio/wav");
}

var audioToTextService = kernel.GetRequiredService<IAudioToTextService>();

TextContent response = await audioToTextService.GetTextContentAsync(audioContent, executionSettings, kernel);

OS: Windows
IDE: Visual Studio 2022
Language: C#
Source: NuGet package version 1.40.1
Model: whisper-1

markwallace-microsoft · 2025-03-25T15:55:20Z

markwallace-microsoft
Mar 25, 2025
Maintainer

Hi @AdriAJ1 I would also expect the behaviour to be the same in this case. Can you reach out to OpenAI as this is on their side. Please post back if you find an answer and let us know if there is anything we need to do here.

1 reply

AdriAJ1 Mar 25, 2025
Author

Hi @markwallace-microsoft,

Alright, I'll reach out to OpenAI to see if they can provide any insights.

I'm not sure if this makes complete sense, but one possible hotfix could be using null as the default value instead of 0, since OpenAI allows it. This way, we could rely on OpenAI to handle the default value directly.
It might also be beneficial in the future if OpenAI decides to change the default value from 0 to something else, as this approach would prevent the need for updates on the Semantic Kernel side.

Thanks for your response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible bug in the Temperature parameter of OpenAIAudioToTextExecutionSettings. #11156

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Possible bug in the Temperature parameter of OpenAIAudioToTextExecutionSettings. #11156

Uh oh!

AdriAJ1 Mar 24, 2025

Replies: 1 comment · 1 reply

Uh oh!

markwallace-microsoft Mar 25, 2025 Maintainer

Uh oh!

Uh oh!

AdriAJ1 Mar 25, 2025 Author

AdriAJ1
Mar 24, 2025

Replies: 1 comment 1 reply

markwallace-microsoft
Mar 25, 2025
Maintainer

AdriAJ1 Mar 25, 2025
Author