Skip to content

Commit 9ebd5c1

Browse files
committed
another dirty
2 parents b2ec3d4 + 054c7ee commit 9ebd5c1

File tree

79 files changed

+520
-406
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+520
-406
lines changed

articles/ai-services/openai/reference.md

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -486,7 +486,6 @@ The following parameters can be used inside of the `parameters` field inside of
486486
|--|--|--|--|--|
487487
| `type` | string | Required | null | The data source to be used for the Azure OpenAI on your data feature. For Azure AI Search the value is `AzureCognitiveSearch`. For Azure Cosmos DB for MongoDB vCore, the value is `AzureCosmosDB`. |
488488
| `indexName` | string | Required | null | The search index to be used. |
489-
| `fieldsMapping` | dictionary | Optional for Azure AI Search. Required for Azure Cosmos DB for MongoDB vCore. | null | Index data column mapping. When using Azure Cosmos DB for MongoDB vCore, the value `vectorFields` is required, which indicates the fields that store vectors. |
490489
| `inScope` | boolean | Optional | true | If set, this value will limit responses specific to the grounding data content. |
491490
| `topNDocuments` | number | Optional | 5 | Specifies the number of top-scoring documents from your data index used to generate responses. You might want to increase the value when you have short documents or want to provide more context. This is the *retrieved documents* parameter in Azure OpenAI studio. |
492491
| `semanticConfiguration` | string | Optional | null | The semantic search configuration. Only required when `queryType` is set to `semantic` or `vectorSemanticHybrid`. |
@@ -498,13 +497,36 @@ The following parameters can be used inside of the `parameters` field inside of
498497
| `strictness` | number | Optional | 3 | Sets the threshold to categorize documents as relevant to your queries. Raising the value means a higher threshold for relevance and filters out more less-relevant documents for responses. Setting this value too high might cause the model to fail to generate responses due to limited available documents. |
499498

500499

501-
**The following parameters are used for Azure AI Search only**
500+
**The following parameters are used for Azure AI Search**
502501

503502
| Parameters | Type | Required? | Default | Description |
504503
|--|--|--|--|--|
505504
| `endpoint` | string | Required | null | Azure AI Search only. The data source endpoint. |
506505
| `key` | string | Required | null | Azure AI Search only. One of the Azure AI Search admin keys for your service. |
507506
| `queryType` | string | Optional | simple | Indicates which query option will be used for Azure AI Search. Available types: `simple`, `semantic`, `vector`, `vectorSimpleHybrid`, `vectorSemanticHybrid`. |
507+
| `fieldsMapping` | dictionary | Optional for Azure AI Search. | null | defines which [fields](./concepts/use-your-data.md?tabs=ai-search#index-field-mapping) you want to map when you add your data source. |
508+
509+
The following parameters are used inside of the `fieldsMapping` field.
510+
511+
| Parameters | Type | Required? | Default | Description |
512+
|--|--|--|--|--|
513+
| `titleField` | string | Optional | null | The field in your index that contains the original title of each document. |
514+
| `urlField` | string | Optional | null | The field in your index that contains the original URL of each document. |
515+
| `filepathField` | string | Optional | null | The field in your index that contains the original file name of each document. |
516+
| `contentFields` | dictionary | Optional | null | The fields in your index that contain the main text content of each document. |
517+
| `contentFieldsSeparator` | string | Optional | null | The separator for the your content fields. Use `\n` by default. |
518+
519+
```json
520+
"fieldsMapping": {
521+
"titleField": "myTitleField",
522+
"urlField": "myUrlField",
523+
"filepathField": "myFilePathField",
524+
"contentFields": [
525+
"myContentField"
526+
],
527+
"contentFieldsSeparator": "\n"
528+
}
529+
```
508530

509531
**The following parameters are used for Azure Cosmos DB for MongoDB vCore**
510532

@@ -516,6 +538,7 @@ The following parameters can be used inside of the `parameters` field inside of
516538
| `containerName` | string | Required | null | Azure Cosmos DB for MongoDB vCore only. The Azure Cosmos Mongo vCore container name in the database. |
517539
| `type` (found inside of`embeddingDependencyType`) | string | Required | null | Indicates the embedding model dependency. |
518540
| `deploymentName` (found inside of`embeddingDependencyType`) | string | Required | null | The embedding model deployment name. |
541+
| `fieldsMapping` | dictionary | Required for Azure Cosmos DB for MongoDB vCore. | null | Index data column mapping. When using Azure Cosmos DB for MongoDB vCore, the value `vectorFields` is required, which indicates the fields that store vectors. |
519542

520543
### Start an ingestion job
521544

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
title: Performance evaluations for Embedded Speech - Speech service
3+
titleSuffix: Azure AI services
4+
description: Learn how to evaluate performance of embedded speech models on your target devices.
5+
author: eric-urban
6+
manager: nitinme
7+
ms.service: azure-ai-speech
8+
ms.topic: how-to
9+
ms.date: 11/28/2023
10+
ms.author: eur
11+
---
12+
13+
# Evaluating performance of Embedded Speech
14+
15+
Embedded speech models run fully on your target devices. Understanding the performance characteristics of these models on your devices’ hardware can be critical to delivering low latency experiences within your products and applications. This guide provides information to help answer the question, "Is my device suitable to run embedded speech to text and speech translation models?"
16+
17+
## Metrics & terminology
18+
19+
**Real-time factor (RTF)** – The real-time factor (RTF) of a device measures how fast the embedded speech model can process audio input. It's the ratio of the processing time to the audio length. For example, if a device processes a 1-minute audio file in 30 seconds, the RTF is 0.5. This metric evaluates the computational power of the device for running embedded speech models. It can help identify devices that are too slow to support the models. Measurement of this metric should only be done using file-based input rather than real-time microphone input.
20+
21+
To support real-time & interactive speech experiences, the device should have an RTF of `1` or lower. An RTF value higher than `1` means that the device can't keep up with the audio input and will cause poor user experiences.
22+
23+
When measuring the RTF of a device, it's important to measure multiple samples and analyze the distribution across percentiles. This allows you to capture the effect of variations in the device's behavior like different CPU clock speeds due to thermal throttling. The predefined measurement tests outlined in [Measuring the real-time factor on your device](#measuring-the-real-time-factor-on-your-device) automatically measure the RTF for each speech recognition result, yielding a sufficiently large sample size.
24+
25+
**User-perceived latency (UPL)** – The user-perceived latency (UPL) of speech to text is the time between a word being spoken and the word being shown in the recognition results.
26+
27+
## Factors that affect performance
28+
29+
**Device specifications** – The specifications of your device play a key role in whether embedded speech models can run without performance issues. CPU clock speed, architecture (for example, x64, ARM processor, etcetera), and memory can all affect model inference speed.
30+
31+
**CPU load** – In most cases, your device is running other applications in parallel to the application where embedded speech models are integrated. The amount of CPU load your device experiences when idle and at peak can also affect performance.
32+
33+
For example, if the device is under moderate to high CPU load from all other applications running on the device, it's possible to encounter performance issues for running embedded speech in addition to the other applications, even with a powerful processor.
34+
35+
**Memory load** – An embedded speech to text model consumes between 200-300 MB of memory at runtime. If your device has less memory available than that for the embedded speech process to use, frequent fallbacks to virtual memory and paging can introduce more latencies. This can affect both the real-time factor and user-perceived latency.
36+
37+
## Built-in performance optimizations
38+
39+
All embedded speech to text models come with a Voice Activity Detector (VAD) component that aims to filter out silence and non-speech content from the audio input. The goal is to reduce the CPU load and the processing time for other speech to text model components.
40+
41+
The VAD component is always on and doesn't need any configuration from you as the developer. It works best when the audio input has non-negligible amounts of silence or non-speech content, which is common in scenarios like captioning, commanding, and dictation.
42+
43+
## Measuring the real-time factor on your device
44+
45+
For all embedded speech supported platforms, a code sample is available on GitHub that includes a performance measurement mode. In this mode, the goal is to measure the real-time factor (RTF) of your device by controlling as many variables as possible:
46+
47+
- **Model** – The English (United States) model is used for measurement. Models for all other supported locales follow similar performance characteristics, so measuring through the English (United States) model is sufficient.
48+
49+
- **Audio input** – A prebuilt audio file designed for RTF measurements is made available as a supplemental download to the sample code.
50+
51+
- **Measurement mechanism** – The start and stop markers of time measurement are preconfigured in the sample to ensure accuracy and ease of comparing results across different devices & test iterations.
52+
53+
This measurement should be done with the sample running directly on your target device(s), with no code changes other than specifying your model paths & encryption key. The device should be in a state that represents a real end-user state when embedded speech would be used (for example, other active applications, CPU & memory load, etc.).
54+
55+
Running the sample yields performance metrics outputted to the console. The full suite of metrics includes the real-time factor along with other properties like CPU usage and memory consumption. Each metric is defined and explained below.
56+
57+
### Instruction set metrics
58+
59+
| Metric | Description | Notes |
60+
|--------|-------------|-------|
61+
| AVX512Supported | True if CPU supports AVX512 instruction set. | This flag is for X64 platforms. ONNX runtime has optimizations for the various instruction sets, and having this information can help diagnose inconsistencies. |
62+
| AVXSupported | True if CPU supports AVX instruction set. | This flag is for X64 platforms. ONNX runtime has optimizations for the various instruction sets, and having this information can help diagnose inconsistencies. |
63+
| AVX2Supported | True if CPU supports AVX2 instruction set. | This flag is for X64 platforms. ONNX runtime has optimizations for the various instruction sets, and having this information can help diagnose inconsistencies. |
64+
| SSE3Available | True if CPU supports SSE3 instruction set. | This flag is for X64 platforms. ONNX runtime has optimizations for the various instruction sets, and having this information can help diagnose inconsistencies. |
65+
| NEONAvailable | True if CPU supports NEON instruction set. | This flag is for ARM processor platforms. ONNX runtime has optimizations for the various instruction sets, and having this information can help diagnose inconsistencies. |
66+
| NPU | Name of Neural Processing Unit, or N/A if none found. | This flag is for hardware acceleration. |
67+
68+
### Memory metrics
69+
70+
| Metric | Description | Notes |
71+
|--------|-------------|-------|
72+
| PagefileUsage | Amount of page file used by process. Implemented for Linux and Windows. | Values are relative to the machine configuration. |
73+
| WorkingSetSize | Amount of memory used for the process. | |
74+
| ProcessCPUUsage | Aggregate of CPU usage for the process. | Includes all threads in the process, including Speech SDK and UI threads. Aggregated across all cores. |
75+
| ThreadCPUUsage | Aggregate of CPU usage for the speech recognition or speech translation thread. | |
76+
77+
### Performance metrics
78+
79+
| Metric | Description | Notes |
80+
|--------|-------------|-------|
81+
| RealTimeFactor | Measures how much faster than real-time the embedded speech engine is processing audio. Includes audio loading time. | Values greater than `1` indicate that the engine is processing audio slower than real-time. Values less than `1` indicate the engine is processing audio faster than real-time. This value should only be analyzed in file-based input mode. It shouldn't be analyzed in streaming input mode. |
82+
| StreamingRealTimeFactor | Measures how much faster than real-time the engine is processing audio. Excludes audio loading time. | Values greater than `1` indicate that the engine is processing audio slower than real-time. Values less than `1` indicate the engine is processing audio faster than real-time. |

articles/ai-services/speech-service/embedded-speech.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Embedded Speech is designed for on-device [speech to text](speech-to-text.md) an
2121
2222
## Platform requirements
2323

24-
Embedded speech is included with the Speech SDK (version 1.24.1 and higher) for C#, C++, and Java. Refer to the general [Speech SDK installation requirements](#embedded-speech-sdk-packages) for programming language and target platform specific details.
24+
Embedded speech is included with the Speech SDK (version 1.24.1 and higher) for C#, C++, and Java. Refer to the general [Speech SDK installation requirements](quickstarts/setup-platform.md#platform-requirements) for programming language and target platform specific details.
2525

2626
**Choose your target environment**
2727

articles/ai-services/speech-service/toc.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -449,7 +449,11 @@ items:
449449
href: swagger-documentation.md
450450
displayName: rest, swagger
451451
- name: Embedded Speech
452-
href: embedded-speech.md
452+
items:
453+
- name: Overview
454+
href: embedded-speech.md
455+
- name: Performance evaluations
456+
href: embedded-speech-performance-evaluations.md
453457
- name: Power automate batch transcription
454458
href: power-automate-batch-transcription.md
455459
- name: Containers

articles/ai-studio/how-to/data-add.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ These steps explain how to create a File typed data in the Azure AI Studio:
114114
To create a data that is a File type, use the following code and update the `<>` placeholders with your information.
115115

116116
```python
117-
from azure.ai.generative import AIClient
117+
from azure.ai.resources.client import AIClient
118118
from azure.ai.generative.entities import Data
119119
from azure.ai.generative.constants import AssetTypes
120120
from azure.identity import DefaultAzureCredential
@@ -174,7 +174,7 @@ Use these steps to create a Folder typed data in the Azure AI Studio:
174174
To create a data that is a Folder type use the following code and update the `<>` placeholders with your information.
175175

176176
```python
177-
from azure.ai.generative import AIClient
177+
from azure.ai.resources.client import AIClient
178178
from azure.ai.generative.entities import Data
179179
from azure.ai.generative.constants import AssetTypes
180180
from azure.identity import DefaultAzureCredential
@@ -240,7 +240,7 @@ To archive *all versions* of the data under a given name, use:
240240
# [Python SDK](#tab/python)
241241

242242
```python
243-
from azure.ai.generative import AIClient
243+
from azure.ai.resources.client import AIClient
244244
from azure.ai.generative.entities import Data
245245
from azure.ai.generative.constants import AssetTypes
246246
from azure.identity import DefaultAzureCredential
@@ -265,7 +265,7 @@ To archive a specific data version, use:
265265
# [Python SDK](#tab/python)
266266

267267
```python
268-
from azure.ai.generative import AIClient
268+
from azure.ai.resources.client import AIClient
269269
from azure.ai.generative.entities import Data
270270
from azure.ai.generative.constants import AssetTypes
271271
from azure.identity import DefaultAzureCredential
@@ -294,7 +294,7 @@ To restore *all versions* of the data under a given name, use:
294294
# [Python SDK](#tab/python)
295295

296296
```python
297-
from azure.ai.generative import AIClient
297+
from azure.ai.resources.client import AIClient
298298
from azure.ai.generative.entities import Data
299299
from azure.ai.generative.constants import AssetTypes
300300
from azure.identity import DefaultAzureCredential
@@ -321,7 +321,7 @@ To restore a specific data version, use:
321321
# [Python SDK](#tab/python)
322322

323323
```python
324-
from azure.ai.generative import AIClient
324+
from azure.ai.resources.client import AIClient
325325
from azure.ai.generative.entities import Data
326326
from azure.ai.generative.constants import AssetTypes
327327
from azure.identity import DefaultAzureCredential

articles/ai-studio/how-to/flow-deploy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ You can use the Azure AI Generative SDK to deploy a prompt flow as an online end
8383

8484
```python
8585
# Import required dependencies
86-
from azure.ai.generative import AIClient
86+
from azure.ai.resources.client import AIClient
8787
from azure.ai.generative.entities.deployment import Deployment
8888
from azure.ai.generative.entities.models import PromptflowModel
8989
from azure.identity import InteractiveBrowserCredential as Credential

articles/ai-studio/how-to/simulator-interaction-data.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ First we set up the system large language model, which acts as the "agent" simul
3737

3838
```python
3939
from azure.identity import DefaultAzureCredential
40-
from azure.ai.generative import AIClient
40+
from azure.ai.resources.client import AIClient
4141
from azure.ai.generative.entities import AzureOpenAIModelConfiguration
4242

4343
credential = DefaultAzureCredential()

articles/ai-studio/includes/evaluations/from-data/python.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ Before you call the `evaluate()` function, your environment needs to set up your
134134

135135
```python
136136
from azure.identity import DefaultAzureCredential
137-
from azure.ai.generative import AIClient
137+
from azure.ai.resources.client import AIClient
138138

139139
client = AIClient.from_config(DefaultAzureCredential())
140140
```

articles/api-management/breaking-changes/stv1-platform-retirement-august-2024.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The following table summarizes the compute platforms currently used for instance
2222
| `stv1` | Single-tenant v1 | Azure-allocated compute infrastructure | Developer, Basic, Standard, Premium |
2323
| `mtv1` | Multi-tenant v1 | Shared infrastructure that supports native autoscaling and scaling down to zero in times of no traffic | Consumption |
2424

25-
For continued support and to take advantage of upcoming features, customers must migrate their Azure API Management instances from the `stv1` compute platform to the `stv2` compute platform. The `stv2` compute platform comes with additional features and improvements such as support for Azure Private Link and other networking features.
25+
**For continued support and to take advantage of upcoming features, customers must [migrate](../migrate-stv1-to-stv2.md) their Azure API Management instances from the `stv1` compute platform to the `stv2` compute platform.** The `stv2` compute platform comes with additional features and improvements such as support for Azure Private Link and other networking features.
2626

2727
New instances created in service tiers other than the Consumption tier are mostly hosted on the `stv2` platform already. Existing instances on the `stv1` compute platform will continue to work normally until the retirement date, but those instances won’t receive the latest features available to the `stv2` platform. Support for `stv1` instances will be retired by 31 August 2024.
2828

@@ -43,10 +43,11 @@ Support for API Management instances hosted on the `stv1` platform will be retir
4343

4444
**Migrate all your existing instances hosted on the `stv1` compute platform to the `stv2` compute platform by 31 August 2024.**
4545

46-
If you have existing instances hosted on the `stv1` platform, follow our [migration guide](../migrate-stv1-to-stv2.md) to ensure a successful migration.
46+
If you have existing instances hosted on the `stv1` platform, follow our **[migration guide](../migrate-stv1-to-stv2.md)** to ensure a successful migration.
4747

4848
[!INCLUDE [api-management-migration-support](../../../includes/api-management-migration-support.md)]
4949

5050
## Related content
5151

52-
See all [upcoming breaking changes and feature retirements](overview.md).
52+
* [Migrate from stv1 platform to stv2](../migrate-stv1-to-stv2.md)
53+
* See all [upcoming breaking changes and feature retirements](overview.md).

articles/azure-relay/includes/relay-create-hybrid-connection-portal.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
author: spelluru
33
ms.service: service-bus-relay
44
ms.topic: include
5-
ms.date: 08/10/2023
5+
ms.date: 01/04/2024
66
ms.author: spelluru
77
---
88
On the **Relay** page for your namespace, follow these steps to create a hybrid connection.

0 commit comments

Comments
 (0)