Skip to content

Commit e9be506

Browse files
committed
Merge branch 'main' into release-preview-codestral-model
2 parents 3ad13b4 + 3021352 commit e9be506

File tree

7 files changed

+40
-20
lines changed

7 files changed

+40
-20
lines changed

articles/ai-services/openai/api-version-deprecation.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ services: cognitive-services
55
manager: nitinme
66
ms.service: azure-ai-openai
77
ms.topic: conceptual
8-
ms.date: 01/08/2024
8+
ms.date: 01/10/2024
99
author: mrbullwinkle
1010
ms.author: mbullwin
1111
recommendations: false
@@ -25,7 +25,10 @@ This article is to help you understand the support lifecycle for the Azure OpenA
2525
Azure OpenAI API latest release:
2626

2727
- Inference: [2024-12-01-preview](reference-preview.md)
28-
- Authoring: [2024-12-01-preview](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/authoring/preview/2024-10-01-preview/azureopenai.json)
28+
- Authoring: [2024-10-01-preview](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/authoring/preview/2024-10-01-preview/azureopenai.json)
29+
30+
> [!IMPORTANT]
31+
> For features that are part of the dataplane authoring API such as batch, fine-tuning, and assistants files, continue to use API version `2024-10-01-preview` to take advantage of the latest preview features.
2932
3033
This version contains support for the latest Azure OpenAI features including:
3134

articles/ai-services/openai/includes/api-surface.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Each API surface/specification encapsulates a different set of Azure OpenAI capa
2222
| API | Latest preview release | Latest GA release | Specifications | Description |
2323
|:---|:----|:----|:----|:---|
2424
| **Control plane** | [`2024-06-01-preview`](/rest/api/aiservices/accountmanagement/operation-groups?view=rest-aiservices-accountmanagement-2024-06-01-preview&preserve-view=true) | [`2024-10-01`](/rest/api/aiservices/accountmanagement/deployments/create-or-update?view=rest-aiservices-accountmanagement-2024-10-01&tabs=HTTP&preserve-view=true) | [Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/resource-manager/Microsoft.CognitiveServices) | Azure OpenAI shares a common control plane with all other Azure AI Services. The control plane API is used for things like [creating Azure OpenAI resources](/rest/api/aiservices/accountmanagement/accounts/create?view=rest-aiservices-accountmanagement-2023-05-01&tabs=HTTP&preserve-view=true), [model deployment](/rest/api/aiservices/accountmanagement/deployments/create-or-update?view=rest-aiservices-accountmanagement-2023-05-01&tabs=HTTP&preserve-view=true), and other higher level resource management tasks. The control plane also governs what is possible to do with capabilities like Azure Resource Manager, Bicep, Terraform, and Azure CLI.|
25-
| **Data plane - authoring** | `2024-12-01-preview` | `2024-10-21` | [Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/authoring) | The data plane authoring API controls [fine-tuning](/rest/api/azureopenai/fine-tuning?view=rest-azureopenai-2024-08-01-preview&preserve-view=true), [file-upload](/rest/api/azureopenai/files/upload?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [ingestion jobs](/rest/api/azureopenai/ingestion-jobs/create?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [batch](/rest/api/azureopenai/batch?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true) and certain [model level queries](/rest/api/azureopenai/models/get?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true)
25+
| **Data plane - authoring** | `2024-10-01-preview` | `2024-10-21` | [Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/authoring) | The data plane authoring API controls [fine-tuning](/rest/api/azureopenai/fine-tuning?view=rest-azureopenai-2024-08-01-preview&preserve-view=true), [file-upload](/rest/api/azureopenai/files/upload?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [ingestion jobs](/rest/api/azureopenai/ingestion-jobs/create?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [batch](/rest/api/azureopenai/batch?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true) and certain [model level queries](/rest/api/azureopenai/models/get?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true)
2626
| **Data plane - inference** | [`2024-12-01-preview`](/azure/ai-services/openai/reference-preview#data-plane-inference) | [`2024-10-21`](/azure/ai-services/openai/reference#data-plane-inference) | [Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference) | The data plane inference API provides the inference capabilities/endpoints for features like completions, chat completions, embeddings, speech/whisper, on your data, Dall-e, assistants, etc. |
2727

2828
## Authentication

articles/ai-services/openai/includes/use-your-data-rest.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ author: aahill
55
ms.author: aahi
66
ms.service: azure-ai-openai
77
ms.topic: include
8-
ms.date: 03/07/2024
8+
ms.date: 01/10/2025
99
---
1010

1111
[!INCLUDE [Set up required variables](./use-your-data-common-variables.md)]
@@ -20,7 +20,7 @@ To trigger a response from the model, you should end with a user message indicat
2020
> There are several parameters you can use to change the model's response, such as `temperature` or `top_p`. See the [reference documentation](../reference.md#completions-extensions) for more information.
2121
2222
```bash
23-
curl -i -X POST $AZURE_OPENAI_ENDPOINT/openai/deployments/$AZURE_OPENAI_DEPLOYMENT_ID/chat/completions?api-version=2024-02-15-preview \
23+
curl -i -X POST $AZURE_OPENAI_ENDPOINT/openai/deployments/$AZURE_OPENAI_DEPLOYMENT_ID/chat/completions?api-version=2024-10-21 \
2424
-H "Content-Type: application/json" \
2525
-H "api-key: $AZURE_OPENAI_API_KEY" \
2626
-d \
@@ -31,8 +31,11 @@ curl -i -X POST $AZURE_OPENAI_ENDPOINT/openai/deployments/$AZURE_OPENAI_DEPLOYME
3131
"type": "azure_search",
3232
"parameters": {
3333
"endpoint": "'$AZURE_AI_SEARCH_ENDPOINT'",
34-
"key": "'$AZURE_AI_SEARCH_API_KEY'",
35-
"index_name": "'$AZURE_AI_SEARCH_INDEX'"
34+
"index_name": "'$AZURE_AI_SEARCH_INDEX'",
35+
"authentication": {
36+
"type": "api_key",
37+
"key": "'$AZURE_AI_SEARCH_API_KEY'"
38+
}
3639
}
3740
}
3841
],
@@ -81,7 +84,8 @@ curl -i -X POST $AZURE_OPENAI_ENDPOINT/openai/deployments/$AZURE_OPENAI_DEPLOYME
8184
"prompt_tokens": 3779,
8285
"completion_tokens": 105,
8386
"total_tokens": 3884
84-
}
87+
},
88+
"system_fingerprint": "fp_65792305e4"
8589
}
8690
```
8791

articles/ai-studio/how-to/configure-managed-network.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -634,7 +634,7 @@ During hub creation, select __Provision managed network proactively at creation_
634634
The following example shows how to provision a managed virtual network during hub creation. The `--provision-network-now` flag is in preview.
635635

636636
```azurecli
637-
az ml workspace create -n myworkspace -g my_resource_group --kind hub --managed-network AllowInternetOutbound --provision-network-now
637+
az ml workspace create -n myworkspace -g my_resource_group --kind hub --managed-network AllowInternetOutbound --provision-network-now true
638638
```
639639

640640
The following example shows how to provision a managed virtual network.
@@ -654,7 +654,7 @@ az ml workspace show -n my_ai_hub_name -g my_resource_group --query managed_netw
654654
The following example shows how to provision a managed virtual network during hub creation. The `--provision-network-now` flag is in preview.
655655

656656
```azurecli
657-
az ml workspace create -n myworkspace -g my_resource_group --managed-network AllowInternetOutbound --provision-network-now
657+
az ml workspace create -n myworkspace -g my_resource_group --managed-network AllowInternetOutbound --provision-network-now true
658658
```
659659

660660
The following example shows how to provision a managed virtual network:
22.5 KB
Loading

articles/search/vector-search-how-to-create-index.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,13 @@ This article explains the workflow and uses REST for illustration. Once you unde
3030
3131
## Prerequisites
3232

33-
+ Azure AI Search, in any region and on any tier. Most existing services support vector search. For services created before January 2019, there's a small subset that can't create a vector index. In this situation, a new service must be created. If you're using integrated vectorization (skillsets that call Azure AI), Azure AI Search must be in the same region as Azure OpenAI or Azure AI services.
33+
+ Azure AI Search, in any region and on any tier. On services created before January 2019, there's a small subset that can't create a vector index. If this applies to you, create a new service to use vectors. For indexing workloads that include integrated vectorization (skillsets that call Azure AI), Azure AI Search must be in the same region as Azure OpenAI or Azure AI services.
3434

35-
+ [Pre-existing vector embeddings](vector-search-how-to-generate-embeddings.md) or use [integrated vectorization](vector-search-integrated-vectorization.md), where embedding models are called from the indexing pipeline.
35+
+ You must have [pre-existing vector embeddings](vector-search-how-to-generate-embeddings.md) to upload to the index, or you can use [integrated vectorization](vector-search-integrated-vectorization.md), where embedding models are called from a skillset in an indexer pipeline.
3636

37-
+ You should know the dimensions limit of the model used to create the embeddings. Valid values are 2 through 3072 dimensions. In Azure OpenAI, for **text-embedding-ada-002**, the length of the numerical vector is 1536. For **text-embedding-3-small** or **text-embedding-3-large**, the vector length is 3072.
37+
+ You should know the dimensions limit of the model used to create the embeddings so that you can assign that limit to the vector field. Integrated vectorization supports a finite number of embedding models. For **text-embedding-ada-002**, dimensions are fixed at 1536. For **text-embedding-3-small** or **text-embedding-3-large**, the vector length ranges from 1 to 1536 and 3072, respectively.
3838

39-
+ You should also know what the supported similarity metrics are. For Azure OpenAI, similarity is [computed using `cosine`](/azure/ai-services/openai/concepts/understand-embeddings#cosine-similarity).
39+
+ You should also know what similarity metric to use. For embedding models on Azure OpenAI, similarity is [computed using `cosine`](/azure/ai-services/openai/concepts/understand-embeddings#cosine-similarity).
4040

4141
+ You should be familiar with [creating an index](search-how-to-create-search-index.md). The schema must include a field for the document key, other fields you want to search or filter, and other configurations for behaviors needed during indexing and queries.
4242

@@ -50,9 +50,9 @@ Make sure your documents:
5050

5151
1. Provide vector data (an array of single-precision floating point numbers) in source fields.
5252

53-
Vector fields contain an array generated by embedding models, one embedding per field, where the field is a top-level field (not part of a nested or complex type). For the simplest integration, we recommend the embedding models in [Azure OpenAI](https://aka.ms/oai/access), such as **text-embedding-ada-002** for text documents or the [Image Retrieval REST API](/rest/api/computervision/2023-02-01-preview/image-retrieval/vectorize-image) for images.
53+
Vector fields contain an array generated by embedding models, one embedding per field, where the field is a top-level field (not part of a nested or complex type). For the simplest integration, we recommend the embedding models in [Azure OpenAI](https://aka.ms/oai/access), such as a **text-embedding-3** model for text documents or the [Image Retrieval REST API](/rest/api/computervision/2023-02-01-preview/image-retrieval/vectorize-image) for images.
5454

55-
If you can take a dependency on indexers and skillsets, consider using [integrated vectorization](vector-search-integrated-vectorization.md) that encodes images and textual content during indexing. Your field definitions are for vector fields, but incoming source data can be text or images, represented as vector arrays created during indexing.
55+
If you can take a dependency on indexers and skillsets, consider using [integrated vectorization](vector-search-integrated-vectorization.md) that encodes images and textual content during indexing. Your field definitions are for vector fields, but incoming source data can be text or images, which are converted to vector arrays during indexing.
5656

5757
1. Provide other fields with human-readable content for the query response, and for hybrid query scenarios that include full text search or semantic ranking in the same request.
5858

@@ -69,7 +69,7 @@ A vector configuration specifies the parameters used during indexing to create "
6969

7070
If you choose HNSW on a field, you can opt in for exhaustive KNN at query time. But the other direction doesn’t work: if you choose exhaustive, you can’t later request HNSW search because the extra data structures that enable approximate search don’t exist.
7171

72-
A vector configuration also specifies quantization methods for reducing vector size:
72+
Optionally, a vector configuration also specifies quantization methods for reducing vector size:
7373

7474
+ Scalar
7575
+ Binary (available in 2024-07-01 only and in newer Azure SDK packages)

articles/search/vector-search-index-size.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.custom:
1010
- build-2024
1111
- ignite-2024
1212
ms.topic: conceptual
13-
ms.date: 09/19/2024
13+
ms.date: 01/09/2025
1414
---
1515

1616
# Vector index size and staying under limits
@@ -27,7 +27,7 @@ For each vector field, Azure AI Search constructs an internal vector index using
2727

2828
+ Vector index size is measured in bytes.
2929

30-
+ Vector quotas are based on memory constraints. All searchable vector indexes must be loaded into memory. At the same time, there must also be sufficient memory for other runtime operations. Vector quotas exist to ensure that the overall system remains stable and balanced for all workloads.
30+
+ Vector quotas are based on memory constraints. For vector indexes created using the Hierarchical Navigable Small World (HNSW) algorithm, searchable vector indexes reside in memory. At the same time, there must also be sufficient memory for other runtime operations. Vector quotas exist to ensure that the overall system remains stable and balanced for all workloads. If you use exhaustive KNN algorithm, indexes are loaded into memory only at query time.
3131

3232
+ Vector indexes are also subject to disk quota, in the sense that all indexes are subject disk quota. There's no separate disk quota for vector indexes.
3333

@@ -67,11 +67,24 @@ A request for vector metrics is a data plane operation. You can use the Azure po
6767

6868
### [**Portal**](#tab/portal-vector-quota)
6969

70-
Usage information can be found on the **Overview** page's **Usage** tab. Portal pages refresh every few minutes so if you recently updated an index, wait a bit before checking results.
70+
#### Vector size per index
71+
72+
To get vector index size per index, select **Search management** > **Indexes** to view a list of indexes and the document count, the size of in-memory vector indexes, and total index size as stored on disk.
73+
74+
Recall that vector quota is based on memory constraints. For vector indexes created using the HNSW algorithm, all searchable vector indexes are permanently loaded into memory. For indexes created using the exhaustive KNN algorithm, vector indexes are loaded in chunks, sequentially, during query time. There's no memory residency requirement for exhaustive KNN indexes. The lifetime of the loaded pages in memory is similar to text search and there are no other metrics applicable to exhaustive KNN indexes other than total storage.
75+
76+
The following screenshot shows two versions of the same vector index. One version is created using HNSW algorithm, where the vector graph is memory resident. Another version is created using exhaustive KNN algorithm. With exhaustive KNN, there's no specialized in-memory vector index, so the portal shows 0 MB for vector index size. Those vectors still exist and are counted in overall storage size, but they don’t occupy the in-memory resource that the vector index size metric is tracking.
77+
78+
:::image type="content" source="media/vector-search-index-size/vector-index-size-by-algorithm.png" lightbox="media/vector-search-index-size/vector-index-size-by-algorithm.png" alt-text="Screenshot of the index portal page showing vector index size based on different algorithms.":::
79+
80+
#### Vector size per service
81+
82+
To get vector index size for the search service as a whole, select the **Overview** page's **Usage** tab. Portal pages refresh every few minutes so if you recently updated an index, wait a bit before checking results.
7183

7284
The following screenshot is for an older Standard 1 (S1) search service, configured for one partition and one replica.
7385

7486
+ Storage quota is a disk constraint, and it's inclusive of all indexes (vector and nonvector) on a search service.
87+
7588
+ Vector index size quota is a memory constraint. It's the amount of memory required to load all internal vector indexes created for each vector field on a search service.
7689

7790
The screenshot indicates that indexes (vector and nonvector) consume almost 460 megabytes of available disk storage. Vector indexes consume almost 93 megabytes of memory at the service level.

0 commit comments

Comments
 (0)