Skip to content

Commit 64ad868

Browse files
authored
Merge pull request #1549 from eric-urban/eur/speech-fast-follow
Azure AI Speech fast follow updates
2 parents 12adf3e + 89e61d4 commit 64ad868

File tree

6 files changed

+16
-20
lines changed

6 files changed

+16
-20
lines changed

articles/ai-services/speech-service/includes/language-support/stt.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ ms.author: eur
7474
| `es-PR` | Spanish (Puerto Rico) | No | Audio + human-labeled transcript<br/><br/>Plain text<br/><br/>Structured text<br/><br/>Pronunciation |
7575
| `es-PY` | Spanish (Paraguay) | No | Audio + human-labeled transcript<br/><br/>Plain text<br/><br/>Structured text<br/><br/>Pronunciation |
7676
| `es-SV` | Spanish (El Salvador) | No | Audio + human-labeled transcript<br/><br/>Plain text<br/><br/>Structured text<br/><br/>Pronunciation |
77-
| `es-US` | Spanish (United States) | No | Plain text<br/><br/>Structured text<br/><br/>Pronunciation<br/><br/>Phrase list |
77+
| `es-US` | Spanish (United States)<sup>1</sup> | No | Plain text<br/><br/>Structured text<br/><br/>Pronunciation<br/><br/>Phrase list |
7878
| `es-UY` | Spanish (Uruguay) | No | Audio + human-labeled transcript<br/><br/>Plain text<br/><br/>Structured text<br/><br/>Pronunciation |
7979
| `es-VE` | Spanish (Venezuela) | No | Plain text<br/><br/>Structured text<br/><br/>Pronunciation |
8080
| `et-EE` | Estonian (Estonia) | No | Plain text<br/><br/>Pronunciation |
@@ -83,7 +83,7 @@ ms.author: eur
8383
| `fi-FI` | Finnish (Finland) | No | Audio + human-labeled transcript<br/><br/>Plain text<br/><br/>Structured text<br/><br/>Output format<br/><br/>Pronunciation |
8484
| `fil-PH` | Filipino (Philippines) | No | Plain text<br/><br/>Pronunciation |
8585
| `fr-BE` | French (Belgium) | No | Plain text |
86-
| `fr-CA` | French (Canada) | No | Plain text<br/><br/>Structured text<br/><br/>Output format<br/><br/>Pronunciation<br/><br/>Phrase list |
86+
| `fr-CA` | French (Canada)<sup>1</sup> | No | Plain text<br/><br/>Structured text<br/><br/>Output format<br/><br/>Pronunciation<br/><br/>Phrase list |
8787
| `fr-CH` | French (Switzerland) | No | Plain text<br/><br/>Pronunciation |
8888
| `fr-FR` | French (France) | Yes | Audio + human-labeled transcript<br/><br/>Plain text<br/><br/>Structured text<br/><br/>Output format<br/><br/>Pronunciation<br/><br/>Phrase list |
8989
| `ga-IE` | Irish (Ireland) | No | Plain text<br/><br/>Pronunciation |

articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,10 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 11/12/2024
5+
ms.date: 11/19/2024
66
ms.author: eur
77
---
88

9-
109
### November 2024 release
1110

1211
#### Speech to text REST API version 2024-11-15
@@ -22,6 +21,10 @@ Fast transcription is now generally available via [speech to text REST API versi
2221

2322
### October 2024 release
2423

24+
#### Real-time speech to text (bilingual)
25+
26+
Significant improvements have been made the recognition quality of short Spanish terms via the `es-US` bilingual models. The model is bilingual and also supports English. The quality of English recognition is also improved.
27+
2528
#### Video translation (Preview)
2629

2730
The video translation API is now available in public preview. For more information, see the [How to use video translation](../../video-translation-get-started.md?pivots=rest-api).

articles/ai-services/speech-service/meeting-transcription.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ Conversation transcription multichannel diarization (preview) is retiring on Mar
2929
To continue using speech to text with diarization, use the following features instead:
3030

3131
- [Real-time speech to text with diarization](get-started-stt-diarization.md)
32+
- [Fast transcription with diarization](fast-transcription-create.md)
3233
- [Batch transcription with diarization](batch-transcription.md)
3334

3435
These speech to text features only support diarization for single-channel audio. Multichannel audio that you used with conversation transcription multichannel diarization isn't supported.

articles/ai-services/speech-service/overview.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -59,17 +59,14 @@ With [real-time speech to text](get-started-speech-to-text.md), the audio is tra
5959
- Dictation
6060
- Voice agents
6161

62-
## Fast transcription API (Preview)
62+
## Fast transcription API
6363

6464
Fast transcription API is used to transcribe audio files with returning results synchronously and much faster than real-time audio. Use fast transcription in the scenarios that you need the transcript of an audio recording as quickly as possible with predictable latency, such as:
6565

6666
- Quick audio or video transcription, subtitles, and edit.
6767
- Video translation
6868

69-
> [!NOTE]
70-
> Fast transcription API is only available via the speech to text REST API version 2024-05-15-preview.
71-
72-
To get started with fast transcription, see [use the fast transcription API (preview)](fast-transcription-create.md).
69+
To get started with fast transcription, see [use the fast transcription API](fast-transcription-create.md).
7370

7471
### Batch transcription
7572

articles/ai-services/speech-service/speech-to-text.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Azure AI Speech service offers advanced speech to text capabilities. This featur
1919

2020
The speech to text service offers the following core features:
2121
- [Real-time](#real-time-speech-to-text) transcription: Instant transcription with intermediate results for live audio inputs.
22-
- [Fast transcription](#fast-transcription-preview): Fastest synchronous output for situations with predictable latency.
22+
- [Fast transcription](#fast-transcription): Fastest synchronous output for situations with predictable latency.
2323
- [Batch transcription](#batch-transcription-api): Efficient processing for large volumes of prerecorded audio.
2424
- [Custom speech](#custom-speech): Models with enhanced accuracy for specific domains and conditions.
2525

@@ -36,17 +36,14 @@ Real-time speech to text transcribes audio as it's recognized from a microphone
3636
Real-time speech to text can be accessed via the Speech SDK, Speech CLI, and REST API, allowing integration into various applications and workflows.
3737
Real-time speech to text is available via the [Speech SDK](speech-sdk.md), the [Speech CLI](spx-overview.md), and REST APIs such as the [Fast transcription API](fast-transcription-create.md).
3838

39-
## Fast transcription (Preview)
39+
## Fast transcription
4040

4141
Fast transcription API is used to transcribe audio files with returning results synchronously and faster than real-time audio. Use fast transcription in the scenarios that you need the transcript of an audio recording as quickly as possible with predictable latency, such as:
4242

4343
- **Quick audio or video transcription and subtitles**: Quickly get a transcription of an entire video or audio file in one go.
4444
- **Video translation**: Immediately get new subtitles for a video if you have audio in different languages.
4545

46-
> [!NOTE]
47-
> Fast transcription API is only available via the speech to text REST API version 2024-05-15-preview and later.
48-
49-
To get started with fast transcription, see [use the fast transcription API (preview)](fast-transcription-create.md).
46+
To get started with fast transcription, see [use the fast transcription API](fast-transcription-create.md).
5047

5148
## Batch transcription API
5249

articles/ai-services/speech-service/speech-translation.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ The core features of speech translation include:
2929

3030
- [Speech to text translation](#speech-to-text-translation)
3131
- [Speech to speech translation](#speech-to-speech-translation)
32-
- [Multi-lingual speech translation](#multi-lingual-speech-translation-preview)
32+
- [Multi-lingual speech translation](#multi-lingual-speech-translation)
3333
- [Multiple target languages translation](#multiple-target-languages-translation)
3434

3535
## Speech to text translation
@@ -40,7 +40,7 @@ The standard feature offered by the Speech service is the ability to take in an
4040

4141
As a supplement to the above feature, the Speech service also offers the option to read aloud the translated text using our large database of pretrained voices, allowing for a natural output of the input speech.
4242

43-
## Multi-lingual speech translation (Preview)
43+
## Multi-lingual speech translation
4444

4545
Multi-lingual speech translation implements a new level of speech translation technology that unlocks various capabilities, including having no specified input language, handling language switches within the same session, and supporting live streaming translations into English. These features enable a new level of speech translation powers that can be implemented into your products.
4646

@@ -53,9 +53,7 @@ Some use cases for multi-lingual speech translation include:
5353
- Travel Interpreter. When traveling abroad, multi-lingual speech translation offers the ability to create a solution that allows customers to translate any input audio to and from the local language. This allows them to communicate with the locals and better understand their surroundings.
5454
- Business Meeting. In a meeting with people who speak different languages, multi-lingual speech translation allows the members of the meeting to all communicate with each other naturally as if there was no language barrier.
5555

56-
For multi-lingual speech translation, these are the languages the Speech service can automatically detect and switch between from the input: Arabic (ar), Basque (eu), Bosnian (bs), Bulgarian (bg), Chinese Simplified (zh), Chinese Traditional (zhh), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), Galician (gl), German (de), Greek (el), Hindi (hi), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Latvian (lv), Lithuanian (lt), Macedonian (mk), Norwegian (nb), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Serbian (sr), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv), Thai (th), Turkish (tr), Ukrainian (uk), Vietnamese (vi), and Welsh (cy).
57-
58-
For a list of the supported output (target) languages, see the *Translate to text language* table in the [language and voice support documentation](language-support.md?tabs=speech-translation).
56+
For a list of the supported input (source) languages, see the [speech to text languages documentation](language-support.md?tabs=stt). For a list of the supported output (target) languages, see the *Translate to text language* table in the [speech translation languages documentation](language-support.md?tabs=speech-translation).
5957

6058
For more information on multi-lingual speech translation, see [the speech translation how to guide](./how-to-translate-speech.md#multi-lingual-speech-translation-without-source-language-candidates) and [speech translation samples on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/translation_samples.cs#L472).
6159

0 commit comments

Comments
 (0)