You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: demos/audio/README.md
+39Lines changed: 39 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -102,6 +102,25 @@ print("Generation finished")
102
102
103
103
Play speech.wav file to check generated speech.
104
104
105
+
## Benchmarking speech generation
106
+
An asynchronous benchmarking client can be used to access the model server performance with various load conditions. Below are execution examples captured on Intel(R) Core(TM) Ultra 7 258V.
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
116
+
Tokens: 1802
117
+
Success rate: 100.0%. (100/100)
118
+
Throughput - Tokens per second: 15.2
119
+
Mean latency: 63653.98 ms
120
+
Median latency: 66736.83 ms
121
+
Average document length: 18.02 tokens
122
+
```
123
+
105
124
## Transcription
106
125
### Model preparation
107
126
Many variances of Whisper models can be deployed in a single command by using pre-configured models from [OpenVINO HuggingFace organization](https://huggingface.co/collections/OpenVINO/speech-to-text) and used both for translations and transcriptions endpoints.
@@ -208,6 +227,26 @@ print(transcript.text)
208
227
The quick brown fox jumped over the lazy dog.
209
228
```
210
229
:::
230
+
231
+
## Benchmarking transcription
232
+
An asynchronous benchmarking client can be used to access the model server performance with various load conditions. Below are execution examples captured on Intel(R) Core(TM) Ultra 7 258V.
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
242
+
Tokens: 10948
243
+
Success rate: 100.0%. (1000/1000)
244
+
Throughput - Tokens per second: 38.5
245
+
Mean latency: 26670.64 ms
246
+
Median latency: 20772.09 ms
247
+
Average document length: 10.948 tokens
248
+
```
249
+
211
250
## Translation
212
251
To test translations endpoint we first need to prepare audio file with speech in language other than English, e.g. Spanish. To generate such sample we will use finetuned version of microsoft/speecht5_tts model.
parser=argparse.ArgumentParser(description='Run benchmark for embeddings endpoints', formatter_class=argparse.ArgumentDefaultsHelpFormatter)
44
49
parser.add_argument('--dataset', required=False, default='Cohere/wikipedia-22-12-simple-embeddings', help='Dataset for load generation from HF or a keyword "synthetic"', dest='dataset')
@@ -47,8 +52,12 @@
47
52
parser.add_argument('--model', required=False, default='Alibaba-NLP/gte-large-en-v1.5', help='HF model name', dest='model')
48
53
parser.add_argument('--request_rate', required=False, default='inf', help='Average amount of requests per seconds in random distribution', dest='request_rate')
49
54
parser.add_argument('--batch_size', required=False, type=int, default=16, help='Number of strings in every requests', dest='batch_size')
50
-
parser.add_argument('--backend', required=False, default='ovms-embeddings', choices=['ovms-embeddings','tei-embed','infinity-embeddings','ovms_rerank'], help='Backend serving API type', dest='backend')
55
+
parser.add_argument('--backend', required=False, default='ovms-embeddings', choices=['ovms-embeddings','tei-embed','infinity-embeddings','ovms_rerank','text2speech','speech2text', 'translations'], help='Backend serving API type', dest='backend')
51
56
parser.add_argument('--limit', required=False, type=int, default=1000, help='Number of documents to use in testing', dest='limit')
An asynchronous benchmarking client can be used to access the model server performance with various load conditions. Below are execution examples captured on dual Intel(R) Xeon(R) CPU Max 9480.
0 commit comments