You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/voice-live-api-reference.md
+8-11Lines changed: 8 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1940,8 +1940,9 @@ Configuration for input audio transcription.
1940
1940
|-------|------|-------------|
1941
1941
| model | string | The transcription model. Supported: `whisper-1`, `gpt-4o-transcribe`, `gpt-4o-mini-transcribe`, `azure-speech`|
1942
1942
| language | string | Optional language code in BCP-47 (e.g., `en-US`), or ISO-639-1 (e.g., `en`), or multi languages with auto detection, (e.g., `en,zh`). |
| phrase_list | string[]| Optional list of phrase hints to bias recognition |
1943
+
| custom_speech | object | Optional configuration for custom speech models, only valid for `azure-speech` model. |
1944
+
| phrase_list | string[]| Optional list of phrase hints to bias recognition, only valid for `azure-speech` model. |
1945
+
| prompt | string | Optional prompt text to guide transcription, only valid for `whisper-1`, `gpt-4o-transcribe`, and `gpt-4o-mini-transcribe` models. |
1945
1946
1946
1947
#### RealtimeInputAudioNoiseReductionSettings
1947
1948
@@ -2080,19 +2081,15 @@ Azure semantic VAD (default variant).
2080
2081
| languages | string[]| Optional. Supported languages |
Azure End-of-Utterance (EOU) could indicate when the end-user stopped speaking while allowing for natural pauses. End of utterance detection can significantly reduce premature end-of-turn signals without adding user-perceivable latency.
2090
2087
2091
2088
| Field | Type | Description |
2092
2089
|-------|------|-------------|
2093
-
| model | string |Must be `"semantic_detection_v1"`|
| model | string |Could be `semantic_detection_v1` supporting English or `semantic_detection_v1_multilingual` supporting English, Spanish, French, Italian, German (DE), Japanese, Portuguese, Chinese, Korean, Hindi|
2091
+
|threshold_level|string| Optional. Detection threshold level (`low`, `medium`, `high` and `default`), the default is `default`|
2092
+
|timeout_ms| number |Maximum time in milliseconds to wait for more user speech. Defaults to 1000 ms.|
0 commit comments