Skip to content

Conversation

@HVbajoria
Copy link

@HVbajoria HVbajoria commented May 24, 2025

Purpose

  • Added support for noise cancellation via input_audio_noise_reduction in the session configuration payload.
  • Implemented RealtimeAudioInputAudioNoiseReductionSettings with a type field that accepts:
    • "near_field" – optimized for close-talking microphones (e.g., headsets)
    • "far_field" – optimized for distant microphones (e.g., laptop, room mics)
  • Integrated the setting within the SessionUpdateMessage structure to enable real-time noise reduction in transcription scenarios.
  • Complements existing support for custom transcription models, prompt guidance, and language specification.

Does this introduce a breaking change?

[ ] Yes
[x] No

Pull Request Type

What kind of change does this Pull Request introduce?

[ ] Bugfix
[x] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[x] Documentation content changes
[ ] Other... Please describe:

How to Test

Get the code

git clone https://github.com/Azure-Samples/aoai-realtime-audio-sdk
cd aoai-realtime-audio-sdk
git checkout Add_Noise_Reduction
npm install

Update test or dev config

In your client-side test (client_test) or runtime session code, use this updated ServerMessageType:

let configMessage: SessionUpdateMessage = {
  type: "session.update",
  session: {
    voice: "verse",
    instructions: instruction,
    input_audio_format: "pcm16",
    input_audio_transcription: {
      model: "whisper-1"
    },
    turn_detection: {
      threshold: 0.9,
      prefix_padding_ms: 500,
      silence_duration_ms: 1400,
      type: "server_vad",
      interrupt_response: true,
    },
    input_audio_noise_reduction: {
      type: "near_field"
    }
  }
};

Test the behavior

  • Observe improved transcription accuracy in noisy environments
  • Try both near_field and far_field settings in different mic setups
  • Validate the presence of the input_audio_noise_reduction key in the outbound WebSocket payload

What to Check

  • input_audio_noise_reduction is included and structured correctly in the ServerMessageType
  • The type field only accepts valid values (near_field, far_field)
  • Backward compatibility: sessions work as expected when noise reduction is omitted
  • Noise reduction configuration is applied before the audio stream begins
  • No impact to other fields like turn_detection or input_audio_transcription

Note:

Similarly do it for Python

Other Information

@HVbajoria
Copy link
Author

Hi @glecaros , @jpalvarezl, @trrwilson,

I have referred this document: https://learn.microsoft.com/en-us/azure/ai-services/openai/realtime-audio-reference

Under: RealtimeAudioInputAudioNoiseReductionSettings

Could you please check once?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant