Skip to content

Commit 2b0aec2

Browse files
committed
Learn Editor: Update speech-container-batch-processing.md
1 parent 9346cc1 commit 2b0aec2

File tree

1 file changed

+27
-9
lines changed

1 file changed

+27
-9
lines changed

articles/cognitive-services/Speech-Service/speech-container-batch-processing.md

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ docker pull docker.io/batchkit/speech-batch-kit:latest
4242

4343
## Endpoint configuration
4444

45-
The batch client takes a yaml configuration file that specifies the on-prem container endpoints. The following example can be written to `/mnt/my_nfs/config.yaml`, which is used in the examples below.
45+
The batch client takes a yaml configuration file that specifies the on-premises container endpoints. The following example can be written to `/mnt/my_nfs/config.yaml`, which is used in the examples below.
4646

4747

4848

@@ -64,13 +64,20 @@ MyContainer3:
6464
rtf: 4
6565
```
6666
67-
This yaml example specifies three speech containers on three hosts. The first host is specified by a IPv4 address, the second is running on the same VM as the batch-client, and the third container is specified by the DNS hostname of another VM. The `concurrency` value specifies the maximum concurrent file transcriptions that can run on the same container. The `rtf` (Real-Time Factor) value is optional, and can be used to tune performance.
67+
This yaml example specifies three speech containers on three hosts. The first host is specified by a IPv4 address, the second is running on the same VM as the batch-client, and the third container is specified by the DNS hostname of another VM. The `concurrency` value specifies the maximum concurrent file transcriptions that can run on the same container. The `rtf` (Real-Time Factor) value is optional and can be used to tune performance.
68+
6869
The batch client can dynamically detect if an endpoint becomes unavailable (for example, due to a container restart or networking issue), and when it becomes available again. Transcription requests will not be sent to containers that are unavailable, and the client will continue using other available containers. You can add, remove, or edit endpoints at any time without interrupting the progress of your batch.
6970

7071

7172

7273

7374

75+
76+
77+
78+
79+
80+
7481
## Run the batch processing container
7582
7683
> [!NOTE]
@@ -83,6 +90,10 @@ Use the Docker `run` command to start the container. This will start an interact
8390

8491

8592

93+
94+
95+
96+
8697
```Docker
8798
docker run --network host --rm -ti -v /mnt/my_nfs:/my_nfs --entrypoint /bin/bash /mnt/my_nfs:/my_nfs docker.io/batchkit/speech-batch-kit:latest
8899
```
@@ -91,6 +102,8 @@ To run the batch client:
91102

92103

93104

105+
106+
94107
```Docker
95108
run-batch-client -config /my_nfs/config.yaml -input_folder /my_nfs/audio_files -output_folder /my_nfs/transcriptions -log_folder /my_nfs/logs -file_log_level DEBUG -nbest 1 -m ONESHOT -diarization None -language en-US -strict_config
96109
```
@@ -100,11 +113,16 @@ To run the batch client and container in a single command:
100113

101114

102115

116+
117+
118+
119+
103120
```Docker
104121
docker run --network host --rm -ti -v /mnt/my_nfs:/my_nfs docker.io/batchkit/speech-batch-kit:latest -config /my_nfs/config.yaml -input_folder /my_nfs/audio_files -output_folder /my_nfs/transcriptions -log_folder /my_nfs/logs
105122
```
106123

107124

125+
108126
The client will start running. If an audio file has already been transcribed in a previous run, the client will automatically skip the file. Files are sent with an automatic retry if transient errors occur, and you can differentiate between which errors you want to the client to retry on. On a transcription error, the client will continue transcription, and can retry without losing progress.
109127

110128
## Run modes
@@ -141,18 +159,17 @@ The batch processing kit offers three modes, using the `--run-mode` parameter.
141159

142160
#### [REST](#tab/rest)
143161

144-
`REST` mode is an API server mode that provides a basic set of HTTP endpoints for audio file batch submission, status checking, and long polling. Also enables programmatic consumption using a Python module extension, or importing as a submodule.
162+
`REST` mode is an API server mode that provides a basic set of HTTP endpoints for audio file batch submission, status checking, and long polling. Also enables programmatic consumption using a Python module extension or importing as a submodule.
145163

146164
:::image type="content" source="media/containers/batch-rest-api-mode.png" alt-text="A diagram showing the batch-kit container processing files in REST mode.":::
147165

148166
1. Define the Speech container endpoints that the batch client will use in the `config.yaml` file.
149-
2. Send an HTTP request request to one of the API server's endpoints.
150-
167+
1. Send an HTTP request to one of the API server's endpoints.
151168
|Endpoint |Description |
152-
|---------|---------|
153-
|`/submit` | Endpoint for creating new batch requests. |
154-
|`/status` | Endpoint for checking the status of a batch request. The connection will stay open until the batch completes. |
155-
|`/watch` | Endpoint for using HTTP long polling until the batch completes. |
169+
|---------|---------|
170+
|`/submit` | Endpoint for creating new batch requests. |
171+
|`/status` | Endpoint for checking the status of a batch request. The connection will stay open until the batch completes. |
172+
|`/watch` | Endpoint for using HTTP long polling until the batch completes. |
156173

157174
3. Audio files are uploaded from the input directory. If the audio file has already been transcribed in a previous run with the same output directory (same file name and checksum), the client will skip the file.
158175
4. If a request is sent to the `/submit` endpoint, the files are dispatched to the container endpoints from step 1.
@@ -173,3 +190,4 @@ The output directory specified by `-output_folder` will contain a *run_summary.j
173190
* [How to install and run containers](speech-container-howto.md)
174191

175192

193+

0 commit comments

Comments
 (0)