Skip to content

Commit 957e75c

Browse files
Merge pull request #7326 from yulin-li/yulin/voice-live-minors
[voice live] update api reference for latest api version
2 parents e652ec9 + d13981b commit 957e75c

File tree

1 file changed

+8
-11
lines changed

1 file changed

+8
-11
lines changed

articles/ai-services/speech-service/voice-live-api-reference.md

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1940,8 +1940,9 @@ Configuration for input audio transcription.
19401940
|-------|------|-------------|
19411941
| model | string | The transcription model. Supported: `whisper-1`, `gpt-4o-transcribe`, `gpt-4o-mini-transcribe`, `azure-speech` |
19421942
| language | string | Optional language code in BCP-47 (e.g., `en-US`), or ISO-639-1 (e.g., `en`), or multi languages with auto detection, (e.g., `en,zh`). |
1943-
| custom_speech | object | Optional configuration for custom speech models |
1944-
| phrase_list | string[] | Optional list of phrase hints to bias recognition |
1943+
| custom_speech | object | Optional configuration for custom speech models, only valid for `azure-speech` model. |
1944+
| phrase_list | string[] | Optional list of phrase hints to bias recognition, only valid for `azure-speech` model. |
1945+
| prompt | string | Optional prompt text to guide transcription, only valid for `whisper-1`, `gpt-4o-transcribe`, and `gpt-4o-mini-transcribe` models. |
19451946

19461947
#### RealtimeInputAudioNoiseReductionSettings
19471948

@@ -2080,19 +2081,15 @@ Azure semantic VAD (default variant).
20802081
| languages | string[] | Optional. Supported languages |
20812082
| auto_truncate | boolean | Optional. Auto-truncate on interruption (default: false) |
20822083

2083-
#### RealtimeEOUDetection
2084+
### RealtimeEOUDetection
20842085

2085-
End-of-utterance semantic detection configuration.
2086-
2087-
##### RealtimeAzureSemanticDetection
2088-
2089-
Azure semantic end-of-utterance detection (default).
2086+
Azure End-of-Utterance (EOU) could indicate when the end-user stopped speaking while allowing for natural pauses. End of utterance detection can significantly reduce premature end-of-turn signals without adding user-perceivable latency.
20902087

20912088
| Field | Type | Description |
20922089
|-------|------|-------------|
2093-
| model | string | Must be `"semantic_detection_v1"` |
2094-
| threshold | number | Optional. Detection threshold |
2095-
| timeout | number | Optional. Detection timeout |
2090+
| model | string | Could be `semantic_detection_v1` supporting English or `semantic_detection_v1_multilingual` supporting English, Spanish, French, Italian, German (DE), Japanese, Portuguese, Chinese, Korean, Hindi |
2091+
| threshold_level | string | Optional. Detection threshold level (`low`, `medium`, `high` and `default`), the default is `default` |
2092+
| timeout_ms | number | Maximum time in milliseconds to wait for more user speech. Defaults to 1000 ms. |
20962093

20972094
### Avatar Configuration
20982095

0 commit comments

Comments
 (0)