Skip to content

Commit 794c6e6

Browse files
authored
Merge pull request #204390 from heikora/API_V3_1_PublicPreview
Adding doc for Rest V3.1 API Public Preview
2 parents 18a0981 + a2149d2 commit 794c6e6

File tree

2 files changed

+80
-0
lines changed

2 files changed

+80
-0
lines changed
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
---
2+
title: Speech-to-text REST API v3.1 Public Preview - Speech service
3+
titleSuffix: Azure Cognitive Services
4+
description: Get reference documentation for Speech-to-text REST API v3.1 (Public Preview).
5+
services: cognitive-services
6+
author: heikora
7+
manager: dongli
8+
ms.service: cognitive-services
9+
ms.subservice: speech-service
10+
ms.topic: reference
11+
ms.date: 07/11/2022
12+
ms.author: heikora
13+
ms.devlang: csharp
14+
ms.custom: devx-track-csharp
15+
---
16+
17+
# Speech-to-text REST API v3.1 (preview)
18+
19+
The Speech-to-text REST API v3.1 is used for [Batch transcription](batch-transcription.md) and [Custom Speech](custom-speech-overview.md). It is currently in Public Preview.
20+
21+
> [!TIP]
22+
> See the [Speech to Text API v3.1 preview1](https://westus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1-preview1/) reference documentation for details. This is an updated version of the [Speech to Text API v3.0](./rest-speech-to-text.md)
23+
24+
Use the REST API v3.1 to:
25+
- Copy models to other subscriptions if you want colleagues to have access to a model that you built, or if you want to deploy a model to more than one region.
26+
- Transcribe data from a container (bulk transcription) and provide multiple URLs for audio files.
27+
- Upload data from Azure storage accounts by using a shared access signature (SAS) URI.
28+
- Get logs for each endpoint if logs have been requested for that endpoint.
29+
- Request the manifest of the models that you create, to set up on-premises containers.
30+
31+
## Changes to the v3.0 API
32+
33+
### Batch transcription changes:
34+
- In **Create Transcription** the following three new fields were added to properties:
35+
- **displayFormWordLevelTimestampsEnabled** can be used to enable the reporting of word-level timestamps on the display form of the transcription results.
36+
- **diarization** can be used to specify hints for the minimum and maximum number of speaker labels to generate when performing optional diarization (speaker separation). With this feature, the service is now able to generate speaker labels for more than two speakers.
37+
- **languageIdentification** can be used specify settings for optional language identification on the input prior to transcription. Up to 10 candidate locales are supported for language identification. For the preview API, transcription can only be performed with base models for the respective locales. The ability to use custom models for transcription will be added for the GA version.
38+
- **Get Transcriptions**, **Get Transcription Files**, **Get Transcriptions For Project** now include a new optional parameter to simplify finding the right resource:
39+
- **filter** can be used to provide a filtering expression for selecting a subset of the available resources. You can filter by displayName, description, createdDateTime, lastActionDateTime, status and locale. Example: filter=createdDateTime gt 2022-02-01T11:00:00Z
40+
41+
### Custom Speech changes
42+
- **Create Dataset** now supports a new data type of **LanguageMarkdown** to support upload of the new structured text data.
43+
It also now supports uploading data in multiple blocks for which the following new operations were added:
44+
- **Upload Data Block** - Upload a block of data for the dataset. The maximum size of the block is 8MiB.
45+
- **Get Uploaded Blocks** - Get the list of uploaded blocks for this dataset.
46+
- **Commit Block List** - Commit block list to complete the upload of the dataset.
47+
- **Get Base Models** and **Get Base Model** now provide information on the type of adaptation supported by a base model:
48+
```json
49+
"features": {
50+
51+
"supportsAdaptationsWith": [
52+
“Acoustic”,
53+
"Language",
54+
“LanguageMarkdown”,
55+
"Pronunciation"
56+
]
57+
}
58+
```
59+
60+
|Adaptation Type |DescriptionText |
61+
|---------|---------|
62+
|Acoustic |Supports adapting the model with the audio provided to adapt to the audio condition or specific speaker characteristics. |
63+
|Language |Supports adapting with Plain Text. |
64+
|LanguageMarkdown |Supports adapting with Structured Text. |
65+
|Pronunciation |Supports adapting with a Pronunciation File. |
66+
- **Create Model** has a new optional parameter under **properties** called **customModelWeightPercent** that lets you specify the weight used when the Custom Language Model (trained from plain or structured text data) is combined with the Base Language Model. Valid values are integers between 1 and 100. The default value is currently 30.
67+
- **Get Base Models**, **Get Datasets**, **Get Datasets For Project**, **Get Data Set Files**, **Get Endpoints**, **Get Endpoints For Project**, **Get Evaluations**, **Get Evaluations For Project**, **Get Evaluation Files**, **Get Models**, **Get Models For Project**, **Get Projects** now include a new optional parameter to simplify finding the right resource:
68+
- **filter** can be used to provide a filtering expression for selecting a subset of the available resources. You can filter by displayName, description, createdDateTime, lastActionDateTime, status, locale and kind. Example: filter=locale eq 'en-US'
69+
70+
- Added a new **Get Model Files** operation to get the files of the model identified by the given ID as well as a new **Get Model File** operation to get one specific file (identified with fileId) from a model (identified with id). This lets you retrieve a **ModelReport** file that provides information on the data processed during training.
71+
72+
## Next steps
73+
74+
- [Customize acoustic models](./how-to-custom-speech-train-model.md)
75+
- [Customize language models](./how-to-custom-speech-train-model.md)
76+
- [Get familiar with batch transcription](batch-transcription.md)
77+

articles/cognitive-services/Speech-Service/toc.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -426,6 +426,9 @@ items:
426426
- name: Speech-to-text REST API v3.0
427427
href: rest-speech-to-text.md
428428
displayName: reference
429+
- name: Speech-to-text REST API v3.1 (preview)
430+
href: rest-speech-to-text-v3-1.md
431+
displayName: reference
429432
- name: Speech-to-text REST API for short audio
430433
href: rest-speech-to-text-short.md
431434
displayName: reference

0 commit comments

Comments
 (0)