Skip to content

Commit 6ee90fd

Browse files
author
Jill Grant
authored
Merge pull request #245976 from eric-urban/eur/real-time-stt-ai-services
real time stt diarization public preview
2 parents ded6d04 + 7fb00e5 commit 6ee90fd

27 files changed

+1030
-275
lines changed

articles/ai-services/.openpublishing.redirection.ai-services-from-cog.json

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1532,8 +1532,13 @@
15321532
},
15331533
{
15341534
"source_path_from_root": "/articles/cognitive-services/speech-service/conversation-transcription.md",
1535-
"redirect_url": "/azure/ai-services/speech-service/conversation-transcription",
1536-
"redirect_document_id": true
1535+
"redirect_url": "/azure/ai-services/speech-service/get-started-stt-diarization",
1536+
"redirect_document_id": false
1537+
},
1538+
{
1539+
"source_path_from_root": "/articles/ai-services/speech-service/conversation-transcription.md",
1540+
"redirect_url": "/azure/ai-services/speech-service/get-started-stt-diarization",
1541+
"redirect_document_id": false
15371542
},
15381543
{
15391544
"source_path_from_root": "/articles/cognitive-services/speech-service/custom-commands-encryption-of-data-at-rest.md",
@@ -1652,8 +1657,13 @@
16521657
},
16531658
{
16541659
"source_path_from_root": "/articles/cognitive-services/speech-service/how-to-async-conversation-transcription.md",
1655-
"redirect_url": "/azure/ai-services/speech-service/how-to-async-conversation-transcription",
1656-
"redirect_document_id": true
1660+
"redirect_url": "/azure/ai-services/speech-service/get-started-stt-diarization",
1661+
"redirect_document_id": false
1662+
},
1663+
{
1664+
"source_path_from_root": "/articles/ai-services/speech-service/how-to-async-conversation-transcription.md",
1665+
"redirect_url": "/azure/ai-services/speech-service/get-started-stt-diarization",
1666+
"redirect_document_id": false
16571667
},
16581668
{
16591669
"source_path_from_root": "/articles/cognitive-services/speech-service/how-to-audio-content-creation.md",
@@ -1887,8 +1897,13 @@
18871897
},
18881898
{
18891899
"source_path_from_root": "/articles/cognitive-services/speech-service/how-to-use-conversation-transcription.md",
1890-
"redirect_url": "/azure/ai-services/speech-service/how-to-use-conversation-transcription",
1891-
"redirect_document_id": true
1900+
"redirect_url": "/azure/ai-services/speech-service/get-started-stt-diarization",
1901+
"redirect_document_id": false
1902+
},
1903+
{
1904+
"source_path_from_root": "/articles/ai-services/speech-service/how-to-use-conversation-transcription.md",
1905+
"redirect_url": "/azure/ai-services/speech-service/get-started-stt-diarization",
1906+
"redirect_document_id": false
18921907
},
18931908
{
18941909
"source_path_from_root": "/articles/cognitive-services/speech-service/how-to-use-custom-entity-pattern-matching.md",

articles/ai-services/speech-service/custom-speech-overview.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ ms.custom: contperf-fy21q2, references_regions
1717

1818
With Custom Speech, you can evaluate and improve the accuracy of speech recognition for your applications and products. A custom speech model can be used for [real-time speech to text](speech-to-text.md), [speech translation](speech-translation.md), and [batch transcription](batch-transcription.md).
1919

20-
Out of the box, speech recognition utilizes a Universal Language Model as a base model that is trained with Microsoft-owned data and reflects commonly used spoken language. The base model is pre-trained with dialects and phonetics representing a variety of common domains. When you make a speech recognition request, the most recent base model for each [supported language](language-support.md?tabs=stt) is used by default. The base model works very well in most speech recognition scenarios.
20+
Out of the box, speech recognition utilizes a Universal Language Model as a base model that is trained with Microsoft-owned data and reflects commonly used spoken language. The base model is pre-trained with dialects and phonetics representing various common domains. When you make a speech recognition request, the most recent base model for each [supported language](language-support.md?tabs=stt) is used by default. The base model works well in most speech recognition scenarios.
2121

2222
A custom model can be used to augment the base model to improve recognition of domain-specific vocabulary specific to the application by providing text data to train the model. It can also be used to improve recognition based for the specific audio conditions of the application by providing audio data with reference transcriptions.
2323

@@ -29,20 +29,20 @@ With Custom Speech, you can upload your own data, test and train a custom model,
2929

3030
Here's more information about the sequence of steps shown in the previous diagram:
3131

32-
1. [Create a project](how-to-custom-speech-create-project.md) and choose a model. Use a <a href="https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices" title="Create a Speech resource" target="_blank">Speech resource</a> that you create in the Azure portal. If you will train a custom model with audio data, choose a Speech resource region with dedicated hardware for training audio data. See footnotes in the [regions](regions.md#speech-service) table for more information.
32+
1. [Create a project](how-to-custom-speech-create-project.md) and choose a model. Use a <a href="https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices" title="Create a Speech resource" target="_blank">Speech resource</a> that you create in the Azure portal. If you'll train a custom model with audio data, choose a Speech resource region with dedicated hardware for training audio data. See footnotes in the [regions](regions.md#speech-service) table for more information.
3333
1. [Upload test data](./how-to-custom-speech-upload-data.md). Upload test data to evaluate the speech to text offering for your applications, tools, and products.
3434
1. [Test recognition quality](how-to-custom-speech-inspect-data.md). Use the [Speech Studio](https://aka.ms/speechstudio/customspeech) to play back uploaded audio and inspect the speech recognition quality of your test data.
35-
1. [Test model quantitatively](how-to-custom-speech-evaluate-data.md). Evaluate and improve the accuracy of the speech to text model. The Speech service provides a quantitative word error rate (WER), which you can use to determine if additional training is required.
35+
1. [Test model quantitatively](how-to-custom-speech-evaluate-data.md). Evaluate and improve the accuracy of the speech to text model. The Speech service provides a quantitative word error rate (WER), which you can use to determine if more training is required.
3636
1. [Train a model](how-to-custom-speech-train-model.md). Provide written transcripts and related text, along with the corresponding audio data. Testing a model before and after training is optional but recommended.
3737
> [!NOTE]
3838
> You pay for Custom Speech model usage and endpoint hosting, but you are not charged for training a model.
39-
1. [Deploy a model](how-to-custom-speech-deploy-model.md). Once you're satisfied with the test results, deploy the model to a custom endpoint. With the exception of [batch transcription](batch-transcription.md), you must deploy a custom endpoint to use a Custom Speech model.
39+
1. [Deploy a model](how-to-custom-speech-deploy-model.md). Once you're satisfied with the test results, deploy the model to a custom endpoint. Except for [batch transcription](batch-transcription.md), you must deploy a custom endpoint to use a Custom Speech model.
4040
> [!TIP]
4141
> A hosted deployment endpoint isn't required to use Custom Speech with the [Batch transcription API](batch-transcription.md). You can conserve resources if the custom speech model is only used for batch transcription. For more information, see [Speech service pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
4242
4343
## Responsible AI
4444

45-
An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it is deployed. Read the transparency notes to learn about responsible AI use and deployment in your systems.
45+
An AI system includes not only the technology, but also the people who use it, the people who will be affected by it, and the environment in which it's deployed. Read the transparency notes to learn about responsible AI use and deployment in your systems.
4646

4747
* [Transparency note and use cases](/legal/cognitive-services/speech-service/speech-to-text/transparency-note?context=/azure/ai-services/speech-service/context/context)
4848
* [Characteristics and limitations](/legal/cognitive-services/speech-service/speech-to-text/characteristics-and-limitations?context=/azure/ai-services/speech-service/context/context)

articles/ai-services/speech-service/devices-sdk-release-notes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ The following sections list changes in the most recent releases.
6363

6464
## Speech Devices SDK 1.5.1:
6565

66-
- Include [Conversation Transcription](./conversation-transcription.md) in the sample app.
66+
- Include conversation transcription in the sample app.
6767
- Updated the [Speech SDK](./speech-sdk.md) component to version 1.5.1. For more information, see its [release notes](./releasenotes.md).
6868

6969
## Speech Devices SDK 1.5.0: 2019-May release
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
title: "Real-time diarization quickstart - Speech service"
3+
titleSuffix: Azure AI services
4+
description: In this quickstart, you convert speech to text continuously from a file. The service transcribes the speech and identifies one or more speakers.
5+
services: cognitive-services
6+
author: eric-urban
7+
manager: nitinme
8+
ms.service: cognitive-services
9+
ms.subservice: speech-service
10+
ms.topic: quickstart
11+
ms.date: 7/27/2023
12+
ms.author: eur
13+
zone_pivot_groups: programming-languages-set-twenty-two
14+
keywords: speech to text, speech to text software
15+
---
16+
17+
# Quickstart: Real-time diarization
18+
19+
::: zone pivot="programming-language-csharp"
20+
[!INCLUDE [C# include](includes/quickstarts/stt-diarization/csharp.md)]
21+
::: zone-end
22+
23+
::: zone pivot="programming-language-cpp"
24+
[!INCLUDE [C++ include](includes/quickstarts/stt-diarization/cpp.md)]
25+
::: zone-end
26+
27+
::: zone pivot="programming-language-java"
28+
[!INCLUDE [Java include](includes/quickstarts/stt-diarization/java.md)]
29+
::: zone-end
30+
31+
::: zone pivot="programming-language-python"
32+
[!INCLUDE [Python include](includes/quickstarts/stt-diarization/python.md)]
33+
::: zone-end
34+
35+
## Next steps
36+
37+
> [!div class="nextstepaction"]
38+
> [Learn more about speech recognition](how-to-recognize-speech.md)

articles/ai-services/speech-service/how-to-async-conversation-transcription.md

Lines changed: 0 additions & 40 deletions
This file was deleted.
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: Asynchronous meeting transcription - Speech service
3+
titleSuffix: Azure AI services
4+
description: Learn how to use asynchronous meeting transcription using the Speech service. Available for Java and C# only.
5+
services: cognitive-services
6+
manager: nitinme
7+
ms.service: cognitive-services
8+
ms.subservice: speech-service
9+
ms.topic: how-to
10+
ms.date: 11/04/2019
11+
ms.devlang: csharp, java
12+
ms.custom: cogserv-non-critical-speech, devx-track-csharp, devx-track-extended-java
13+
zone_pivot_groups: programming-languages-set-twenty-one
14+
---
15+
16+
# Asynchronous meeting transcription
17+
18+
In this article, asynchronous meeting transcription is demonstrated using the **RemoteMeetingTranscriptionClient** API. If you have configured meeting transcription to do asynchronous transcription and have a `meetingId`, you can obtain the transcription associated with that `meetingId` using the **RemoteMeetingTranscriptionClient** API.
19+
20+
## Asynchronous vs. real-time + asynchronous
21+
22+
With asynchronous transcription, you stream the meeting audio, but don't need a transcription returned in real-time. Instead, after the audio is sent, use the `meetingId` of `Meeting` to query for the status of the asynchronous transcription. When the asynchronous transcription is ready, you'll get a `RemoteMeetingTranscriptionResult`.
23+
24+
With real-time plus asynchronous, you get the transcription in real-time, but also get the transcription by querying with the `meetingId` (similar to asynchronous scenario).
25+
26+
Two steps are required to accomplish asynchronous transcription. The first step is to upload the audio, choosing either asynchronous only or real-time plus asynchronous. The second step is to get the transcription results.
27+
28+
::: zone pivot="programming-language-csharp"
29+
[!INCLUDE [prerequisites](includes/how-to/remote-meeting/csharp/examples.md)]
30+
::: zone-end
31+
32+
::: zone pivot="programming-language-java"
33+
[!INCLUDE [prerequisites](includes/how-to/remote-meeting/java/examples.md)]
34+
::: zone-end
35+
36+
37+
## Next steps
38+
39+
> [!div class="nextstepaction"]
40+
> [Explore our samples on GitHub](https://aka.ms/csspeech/samples)

articles/ai-services/speech-service/how-to-use-conversation-transcription.md

Lines changed: 0 additions & 45 deletions
This file was deleted.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
title: Real-time meeting transcription quickstart - Speech service
3+
titleSuffix: Azure AI services
4+
description: In this quickstart, learn how to transcribe meetings. You can add, remove, and identify multiple participants by streaming audio to the Speech service.
5+
services: cognitive-services
6+
author: eric-urban
7+
manager: nitinme
8+
ms.service: cognitive-services
9+
ms.subservice: speech-service
10+
ms.topic: quickstart
11+
ms.date: 05/06/2023
12+
ms.author: eur
13+
zone_pivot_groups: acs-js-csharp-python
14+
ms.devlang: csharp, javascript
15+
ms.custom: cogserv-non-critical-speech, ignite-fall-2021, references_regions
16+
---
17+
18+
# Quickstart: Real-time meeting transcription
19+
20+
You can transcribe meetings with the ability to add, remove, and identify multiple participants by streaming audio to the Speech service. You first create voice signatures for each participant using the REST API, and then use the voice signatures with the Speech SDK to transcribe meetings. See the Meeting Transcription [overview](meeting-transcription.md) for more information.
21+
22+
## Limitations
23+
24+
* Only available in the following subscription regions: `centralus`, `eastasia`, `eastus`, `westeurope`
25+
* Requires a 7-mic circular multi-microphone array. The microphone array should meet [our specification](./speech-sdk-microphone.md).
26+
27+
> [!NOTE]
28+
> The Speech SDK for C++, Java, Objective-C, and Swift support Meeting Transcription, but we haven't yet included a guide here.
29+
30+
::: zone pivot="programming-language-javascript"
31+
[!INCLUDE [JavaScript Basics include](includes/how-to/meeting-transcription/real-time-javascript.md)]
32+
::: zone-end
33+
34+
::: zone pivot="programming-language-csharp"
35+
[!INCLUDE [C# Basics include](includes/how-to/meeting-transcription/real-time-csharp.md)]
36+
::: zone-end
37+
38+
::: zone pivot="programming-language-python"
39+
[!INCLUDE [Python Basics include](includes/how-to/meeting-transcription/real-time-python.md)]
40+
::: zone-end
41+
42+
## Next steps
43+
44+
> [!div class="nextstepaction"]
45+
> [Asynchronous Meeting Transcription](how-to-async-meeting-transcription.md)
46+

0 commit comments

Comments
 (0)