Skip to content

Commit 2e2a6dc

Browse files
authored
Merge pull request #106934 from IEvangelist/audioFormats
#1674741 updating docs to reflect audio format support.
2 parents d75e0a2 + a5499bb commit 2e2a6dc

17 files changed

+404
-295
lines changed

.openpublishing.redirection.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49205,6 +49205,16 @@
4920549205
"source_path": "articles/cognitive-services/Speech-Service/sapi-phoneset-usage.md",
4920649206
"redirect_url": "/azure/cognitive-services/speech-service/speech-ssml-phonetic-sets",
4920749207
"redirect_document_id": false
49208+
},
49209+
{
49210+
"source_path": "articles/cognitive-services/Speech-Service/how-to-use-codec-compressed-audio-input-streams-android.md",
49211+
"redirect_url": "/azure/cognitive-services/Speech-Service/how-to-use-codec-compressed-audio-input-streams?pivots=programming-language-java",
49212+
"redirect_document_id": false
49213+
},
49214+
{
49215+
"source_path": "articles/cognitive-services/Speech-Service/how-to-use-codec-compressed-audio-input-streams-ios.md",
49216+
"redirect_url": "/azure/cognitive-services/Speech-Service/how-to-use-codec-compressed-audio-input-streams?pivots=programming-language-objectivec",
49217+
"redirect_document_id": false
4920849218
}
4920949219
]
4921049220
}

articles/cognitive-services/Speech-Service/how-to-custom-speech-test-data.md

Lines changed: 25 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@ title: "Prepare test data for Custom Speech - Speech service"
33
titleSuffix: Azure Cognitive Services
44
description: "When testing the accuracy of Microsoft speech recognition or training your custom models, you'll need audio and text data. On this page, we cover the types of data, how to use, and manage them."
55
services: cognitive-services
6-
author: erhopf
6+
author: IEvangelist
77
manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: conceptual
11-
ms.date: 12/17/2019
12-
ms.author: erhopf
11+
ms.date: 03/09/2020
12+
ms.author: dapine
1313
---
1414

1515
# Prepare data for Custom Speech
@@ -50,15 +50,17 @@ Audio data is optimal for testing the accuracy of Microsoft's baseline speech-to
5050

5151
Use this table to ensure that your audio files are formatted correctly for use with Custom Speech:
5252

53-
| Property | Value |
54-
|----------|-------|
55-
| File format | RIFF (WAV) |
56-
| Sample rate | 8,000 Hz or 16,000 Hz |
57-
| Channels | 1 (mono) |
58-
| Maximum length per audio | 2 hours |
59-
| Sample format | PCM, 16-bit |
60-
| Archive format | .zip |
61-
| Maximum archive size | 2 GB |
53+
| Property | Value |
54+
|--------------------------|-----------------------|
55+
| File format | RIFF (WAV) |
56+
| Sample rate | 8,000 Hz or 16,000 Hz |
57+
| Channels | 1 (mono) |
58+
| Maximum length per audio | 2 hours |
59+
| Sample format | PCM, 16-bit |
60+
| Archive format | .zip |
61+
| Maximum archive size | 2 GB |
62+
63+
[!INCLUDE [supported-audio-formats](includes/supported-audio-formats.md)]
6264

6365
> [!TIP]
6466
> When uploading training and testing data, the .zip file size cannot exceed 2 GB. If you require more data for training, divide it into several .zip files and upload them separately. Later, you can choose to train from *multiple* datasets. However, you can only test from a *single* dataset.
@@ -74,18 +76,20 @@ Use <a href="http://sox.sourceforge.net" target="_blank" rel="noopener">SoX <spa
7476

7577
To measure the accuracy of Microsoft's speech-to-text accuracy when processing your audio files, you must provide human-labeled transcriptions (word-by-word) for comparison. While human-labeled transcription is often time consuming, it's necessary to evaluate accuracy and to train the model for your use cases. Keep in mind, the improvements in recognition will only be as good as the data provided. For that reason, it's important that only high-quality transcripts are uploaded.
7678

77-
| Property | Value |
78-
|----------|-------|
79-
| File format | RIFF (WAV) |
80-
| Sample rate | 8,000 Hz or 16,000 Hz |
81-
| Channels | 1 (mono) |
79+
| Property | Value |
80+
|--------------------------|-------------------------------------|
81+
| File format | RIFF (WAV) |
82+
| Sample rate | 8,000 Hz or 16,000 Hz |
83+
| Channels | 1 (mono) |
8284
| Maximum length per audio | 2 hours (testing) / 60 s (training) |
83-
| Sample format | PCM, 16-bit |
84-
| Archive format | .zip |
85-
| Maximum zip size | 2 GB |
85+
| Sample format | PCM, 16-bit |
86+
| Archive format | .zip |
87+
| Maximum zip size | 2 GB |
88+
89+
[!INCLUDE [supported-audio-formats](includes/supported-audio-formats.md)]
8690

8791
> [!NOTE]
88-
> When uploading training and testing data, the .zip file size cannot exceed 2 GB. Uou can only test from a *single* dataset, be sure to keep it within the appropriate file size.
92+
> When uploading training and testing data, the .zip file size cannot exceed 2 GB. You can only test from a *single* dataset, be sure to keep it within the appropriate file size. Additionally, each training file cannot exceed 60 seconds otherwise it will error out.
8993
9094
To address issues like word deletion or substitution, a significant amount of data is required to improve recognition. Generally, it's recommended to provide word-by-word transcriptions for roughly 10 to 1,000 hours of audio. The transcriptions for all WAV files should be contained in a single plain-text file. Each line of the transcription file should contain the name of one of the audio files, followed by the corresponding transcription. The file name and transcription should be separated by a tab (\t).
9195

articles/cognitive-services/Speech-Service/how-to-use-codec-compressed-audio-input-streams-android.md

Lines changed: 0 additions & 159 deletions
This file was deleted.

articles/cognitive-services/Speech-Service/how-to-use-codec-compressed-audio-input-streams-ios.md

Lines changed: 0 additions & 66 deletions
This file was deleted.

0 commit comments

Comments
 (0)