You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pages/public_cloud/ai_machine_learning/endpoints_guide_08_audio_transcriptions/guide.en-gb.md
+10-9Lines changed: 10 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: AI Endpoints - Speech to Text
3
3
excerpt: Learn how to transcribe audio files with OVHcloud AI Endpoints
4
-
updated: 2025-09-30
4
+
updated: 2025-10-01
5
5
---
6
6
7
7
> [!primary]
@@ -250,7 +250,6 @@ The `diarize` parameter enables speaker separation in the generated transcript.
250
250
This is useful for meetings, debates, or interviews where multiple people are speaking.
251
251
252
252
> [!warning]
253
-
>**Warning**:
254
253
> - This parameter is only available with the default `verbose_json` [response format](#response-formats). Using any other will raise an error.
255
254
> - `diarize` is not supported when using the OpenAI client libraries. You must use a direct HTTP request with `requests`, `cURL`, or another HTTP client.
256
255
@@ -299,7 +298,7 @@ The `prompt` parameter lets you provide extra context to improve transcription.
299
298
- **Translate** generated speech to English.
300
299
301
300
> [!warning]
302
-
>**Warning**: The prompt **must be written in the same language** as the audio. For example, if your audio is in English, your prompt must also be in English.
301
+
> The prompt **must be written in the same language** as the audio. For example, if your audio is in English, your prompt must also be in English.
303
302
304
303
**Examples**
305
304
@@ -380,6 +379,7 @@ The `prompt` parameter lets you provide extra context to improve transcription.
380
379
>>"prompt": "以下是普通話的句子。"
381
380
>> }
382
381
>>```
382
+
>
383
383
384
384
#### Timestamp Granularities
385
385
@@ -431,7 +431,7 @@ The `timestamp_granularities` parameter controls the level of time markers inclu
@@ -461,7 +461,7 @@ The `timestamp_granularities` parameter controls the level of time markers inclu
461
461
>>```
462
462
>>
463
463
>>> [!warning]
464
-
>>>**Warning**: Generating `["word"]` timestamps can incur additional latency.
464
+
>>> Generating `["word"]` timestamps can incur additional latency.
465
465
>>
466
466
467
467
#### Response Formats
@@ -545,8 +545,7 @@ By **default**, when unset, the audio is **processed as a single block**.
545
545
546
546
When set to `auto`, the system first normalizes audio loudness and then uses voice activity detection (VAD) to automatically split the audio at natural pauses (silence).
547
547
548
-
You can also provide a `server_vad` object
549
-
to manually tweak VAD detection parameters. This lets you control the following parameters:
548
+
You can also provide a `server_vad` object to manually tweak VAD detection parameters. This lets you control the following parameters:
550
549
551
550
- `prefix_padding_ms`: Amount of audio to include before the VAD detected speech (in milliseconds).
552
551
- `silence_duration_ms`: Duration of silence to detect speech stop (in milliseconds). With shorter values the model will respond more quickly, but may jump in on short pauses from the user.
@@ -609,7 +608,9 @@ If your audio file exceeds these limits, you can split it into smaller chunks be
609
608
610
609
Try to avoid splitting mid-sentence, as this can cause context to be lost and reduce transcription accuracy. Using compressed audio formats can also help reduce file size.
611
610
612
-
**Example**: Splitting Audio with open-source Python PyDub library:
611
+
**Example**
612
+
613
+
Splitting Audio with open-source Python PyDub library:
613
614
614
615
```python
615
616
from pydub import AudioSegment
@@ -631,7 +632,7 @@ Repeat this process to create multiple chunks, then transcribe each chunk indivi
631
632
632
633
> [!warning]
633
634
>
634
-
>**Warning**: OVHcloud makes no guarantees about the usability or security of third-party software like PyDub.
635
+
> OVHcloud makes no guarantees about the usability or security of third-party software like PyDub.
0 commit comments