Skip to content

Commit 59fc65c

Browse files
Merge pull request #274321 from HeidiSteen/heidist-bug
[azure search] prep for build doc work
2 parents 2051c33 + 8afa73f commit 59fc65c

File tree

4 files changed

+28
-26
lines changed

4 files changed

+28
-26
lines changed

articles/search/search-get-started-portal-import-vectors.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.service: cognitive-search
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: quickstart
12-
ms.date: 01/02/2024
12+
ms.date: 05/05/2024
1313
---
1414

1515
# Quickstart: Integrated vectorization (preview)
@@ -22,8 +22,8 @@ Get started with [integrated vectorization (preview)](vector-search-integrated-v
2222
In this preview version of the wizard:
2323

2424
+ Source data is blob only, using the default parsing mode (one search document per blob).
25-
+ Index schema is nonconfigurable. Source fields include `content` (chunked and vectorized), `metadata_storage_name` for title, and a `metadata_storage_path` for the document key which is populated as `parent_id` in the Index.
26-
+ Vectorization is Azure OpenAI only (text-embedding-ada-002), using the [HNSW](vector-search-ranking.md) algorithm with defaults.
25+
+ Index schema is nonconfigurable. Source fields include `content` (chunked and vectorized), `metadata_storage_name` for title, and a `metadata_storage_path` for the document key, represented as `parent_id` in the Index.
26+
+ Vectorization is Azure OpenAI only (text-embedding-ada-002), using the [Hierarchical Navigable Small Worlds (HNSW)](vector-search-ranking.md) algorithm with defaults.
2727
+ Chunking is nonconfigurable. The effective settings are:
2828

2929
```json
@@ -32,21 +32,22 @@ In this preview version of the wizard:
3232
pageOverlapLength: 500
3333
```
3434

35-
## Prerequisites
35+
For more configuration and data source options, try Python or the REST APIs. See [integrated vectorization sample](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/integrated-vectorization/azure-search-integrated-vectorization-sample.ipynb) for details.
36+
3637

3738
+ An Azure subscription. [Create one for free](https://azure.microsoft.com/free/).
3839

39-
+ Azure AI Search, in any region and on any tier. Most existing services support vector search. For a small subset of services created prior to January 2019, an index containing vector fields fails on creation. In this situation, a new service must be created.
40+
+ Azure AI Search, in any region and on any tier. Most existing services support vector search. For a small subset of services created before January 2019, an index containing vector fields fails on creation. In this situation, a new service must be created.
4041

4142
+ [Azure OpenAI](https://aka.ms/oai/access) endpoint with a deployment of **text-embedding-ada-002** and an API key or [**Cognitive Services OpenAI User**](/azure/ai-services/openai/how-to/role-based-access-control#azure-openai-roles) permissions to upload data. You can only choose one vectorizer in this preview, and the vectorizer must be Azure OpenAI.
4243

43-
+ [Azure Storage account](/azure/storage/common/storage-account-overview), standard performance (general-purpose v2), Hot and Cool access tiers.
44+
+ [Azure Storage account](/azure/storage/common/storage-account-overview), standard performance (general-purpose v2), hot, cool, and cold access tiers.
4445

4546
+ Blobs providing text content, unstructured docs only, and metadata. In this preview, your data source must be Azure blobs.
4647

4748
+ Read permissions in Azure Storage. A storage connection string that includes an access key gives you read access to storage content. If instead you're using Microsoft Entra logins and roles, make sure the [search service's managed identity](search-howto-managed-identities-data-sources.md) has [**Storage Blob Data Reader**](/azure/storage/blobs/assign-azure-role-data-access) permissions.
4849

49-
+ All components (data source and embedding endpoint) must have public access enabled for the portal nodes to be able to access them. Otherwise, the wizard will fail. After the wizard runs, firewalls and private endpoints can be enabled in the different integration components for security. If private endpoints are already present and can't be disabled, the alternative option is to run the respective end-to-end flow from a script or program from a Virtual Machine within the same VNET as the private endpoint. Here is a [Python code sample](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-python/code/integrated-vectorization) for integrated vectorization. In the same [GitHub repo](https://github.com/Azure/azure-search-vector-samples/tree/main) are samples in other programming languages.
50+
+ All components (data source and embedding endpoint) must have public access enabled for the portal nodes to be able to access them. Otherwise, the wizard fails. After the wizard runs, firewalls and private endpoints can be enabled in the different integration components for security. If private endpoints are already present and can't be disabled, the alternative option is to run the respective end-to-end flow from a script or program from a virtual machine within the same virtual network as the private endpoint. Here is a [Python code sample](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-python/code/integrated-vectorization) for integrated vectorization. In the same [GitHub repo](https://github.com/Azure/azure-search-vector-samples/tree/main) are samples in other programming languages.
5051

5152
## Check for space
5253

@@ -202,4 +203,4 @@ Azure AI Search is a billable resource. If it's no longer needed, delete it from
202203

203204
## Next steps
204205

205-
This quickstart introduced you to the **Import and vectorize data** wizard that creates all of the objects necessary for integrated vectorization. If you want to explore each step in detail, try an [integrated vectorization sample](https://github.com/HeidiSteen/azure-search-vector-samples/blob/main/demo-python/code/integrated-vectorization/azure-search-integrated-vectorization-sample.ipynb).
206+
This quickstart introduced you to the **Import and vectorize data** wizard that creates all of the objects necessary for integrated vectorization. If you want to explore each step in detail, try an [integrated vectorization sample](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/integrated-vectorization/azure-search-integrated-vectorization-sample.ipynb).

articles/search/vector-search-how-to-configure-vectorizer.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Configure vectorizer
2+
title: Configure a vectorizer
33
titleSuffix: Azure AI Search
44
description: Steps for adding a vectorizer to a search index in Azure AI Search. A vectorizer calls an embedding model that generates embeddings from text.
55

@@ -9,7 +9,7 @@ ms.service: cognitive-search
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: how-to
12-
ms.date: 03/28/2024
12+
ms.date: 05/05/2024
1313
---
1414

1515
# Configure a vectorizer in a search index

articles/search/vector-search-integrated-vectorization.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -9,27 +9,28 @@ ms.service: cognitive-search
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: conceptual
12-
ms.date: 03/27/2024
12+
ms.date: 05/05/2024
1313
---
1414

1515
# Integrated data chunking and embedding in Azure AI Search
1616

1717
> [!IMPORTANT]
18-
> This feature is in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [2023-10-01-Preview REST API](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) supports this feature.
18+
> Integrated data chunking and vectorization is in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [2023-10-01-Preview REST API](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) provides this feature.
1919
20-
*Integrated vectorization* adds data chunking and text-to-vector embedding to skills in indexer-based indexing. It also adds text-to-vector conversions to queries.
20+
*Integrated vectorization* adds data chunking and text-to-vector conversions during indexing and at query time.
2121

22-
This capability is preview-only. In the generally available version of [vector search](vector-search-overview.md) and in previous preview versions, data chunking and vectorization rely on external components for chunking and vectors, and your application code must handle and coordinate each step. In this preview, chunking and vectorization are built into indexing through skills and indexers. You can set up a skillset that chunks data using the Text Split skill, and then call an embedding model using either the AzureOpenAIEmbedding skill or a custom skill. Any vectorizers used during indexing can also be called on queries to convert text to vectors.
22+
For data chunking and text-to-vector conversions during indexing, you need:
2323

24-
For indexing, integrated vectorization requires:
24+
+ [An indexer](search-indexer-overview.md) to retrieve data from a supported data source.
25+
+ [A skillset](cognitive-search-working-with-skillsets.md) to call the [Text Split skill](cognitive-search-skill-textsplit.md) to chunk the data.
26+
+ The same skillset, calling a vectorizer. The vectorizer is either the [AzureOpenAIEmbedding skill](cognitive-search-skill-azure-openai-embedding.md) for text-embedding-ada-002 on Azure OpenAI, or a [custom skill](cognitive-search-custom-skill-web-api.md) that points to another embedding model, for example test-embedding-ada-002 on OpenAI.
27+
+ You also need a [vector index](search-what-is-an-index.md) to receive the chunked and vectorized content.
2528

26-
+ [An indexer](search-indexer-overview.md) retrieving data from a supported data source.
27-
+ [A skillset](cognitive-search-working-with-skillsets.md) that calls the [Text Split skill](cognitive-search-skill-textsplit.md) to chunk the data, and either [AzureOpenAIEmbedding skill](cognitive-search-skill-azure-openai-embedding.md) or a [custom skill](cognitive-search-custom-skill-web-api.md) to vectorize the data.
28-
+ [One or more indexes](search-what-is-an-index.md) to receive the chunked and vectorized content.
29-
30-
For queries:
29+
For text-to-vector queries:
3130

3231
+ [A vectorizer](vector-search-how-to-configure-vectorizer.md) defined in the index schema, assigned to a vector field, and used automatically at query time to convert a text query to a vector.
32+
+ A query that specifies one or more vector fields.
33+
+ A text string that's converted to a vector at query time.
3334

3435
Vector conversions are one-way: text-to-vector. There's no vector-to-text conversion for queries or results (for example, you can't convert a vector result to a human-readable string).
3536

@@ -44,15 +45,15 @@ Here's a checklist of the components responsible for integrated vectorization:
4445
+ A supported data source for indexer-based indexing.
4546
+ An index that specifies vector fields, and a vectorizer definition assigned to vector fields.
4647
+ A skillset providing a Text Split skill for data chunking, and a skill for vectorization (either the AzureOpenAiEmbedding skill or a custom skill pointing to an external embedding model).
47-
+ Optionally, index projections (also defined in a skillset) to push chunked data to a secondary index
48+
+ Optionally, index projections (also defined in a skillset) to push chunked data to a secondary index.
4849
+ An embedding model, deployed on Azure OpenAI or available through an HTTP endpoint.
4950
+ An indexer for driving the process end-to-end. An indexer also specifies a schedule, field mappings, and properties for change detection.
5051

5152
This checklist focuses on integrated vectorization, but your solution isn't limited to this list. You can add more skills for AI enrichment, create a knowledge store, add semantic ranking, add relevance tuning, and other query features.
5253

5354
## Availability and pricing
5455

55-
Integrated vectorization availability is based on the embedding model. If you're using Azure OpenAI, check [regional availability]( https://azure.microsoft.com/explore/global-infrastructure/products-by-region/?products=cognitive-services).
56+
Integrated vectorization is available in all regions and tiers. However, if you're using Azure OpenAI and the AzureOpenAIEmbedding skill, check [regional availability]( https://azure.microsoft.com/explore/global-infrastructure/products-by-region/?products=cognitive-services) of that service.
5657

5758
If you're using a custom skill and an Azure hosting mechanism (such as an Azure function app, Azure Web App, and Azure Kubernetes), check the [product by region page](https://azure.microsoft.com/explore/global-infrastructure/products-by-region/) for feature availability.
5859

articles/search/whats-new.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,10 @@ ms.custom:
2525
| Item                         | Type | Description |
2626
|-----------------------------|------|--------------|
2727
| [Security update addressing information disclosure](https://msrc.microsoft.com/update-guide/vulnerability/CVE-2024-29063) | API | GET responses [no longer return connection strings or keys](search-api-migration.md#breaking-change-for-client-code-that-reads-connection-information). Applies to GET Skillset, GET Index, and GET Indexer. This change helps protect your Azure assets integrated with AI Search from unauthorized access. |
28-
| [**Storage expansion on Basic and Standard tiers**](search-limits-quotas-capacity.md#service-limits) | Feature | Basic now supports up to three partitions and three replicas. Basic and Standard (S1, S2, S3) tiers have significantly more storage per partition, at the same per-partition billing rate. Extra capacity is subject to [regional availability](search-limits-quotas-capacity.md#supported-regions-with-higher-storage-limits) and applies to new search services created after April 3, 2024. Currently, there's no in-place upgrade, so please create a new search service to get the extra storage. |
29-
| [**Increased quota for vectors**](search-limits-quotas-capacity.md#vector-limits-on-services-created-after-april-3-2024-in-supported-regions) | Feature | Vector quotas are also higher on new services created after April 3, 2024 in selected regions. |
30-
| [**Built-in vector quantization, narrow vector data types, and a new `stored` property (preview)**](vector-search-how-to-configure-compression-storage.md) | Feature | This preview adds support for larger vector workloads at a lower cost through three enhancements. First, *scalar quantization* reduces vector index size in memory and on disk. Second, [narrow data types](/rest/api/searchservice/supported-data-types) can be assigned to vector fields that can use them. Third, we added more flexible vector field storage options.|
31-
| [**2024-03-01-preview Search REST API**](/rest/api/searchservice/search-service-api-versions#2024-03-01-preview) | API | New preview version of the Search REST APIs for the new data types, vector compression properties, and storage options. |
28+
| [**More storage on Basic and Standard tiers**](search-limits-quotas-capacity.md#service-limits) | Feature | Basic now supports up to three partitions and three replicas. Basic and Standard (S1, S2, S3) tiers have significantly more storage per partition, at the same per-partition billing rate. Extra capacity is subject to [regional availability](search-limits-quotas-capacity.md#supported-regions-with-higher-storage-limits) and applies to new search services created after April 3, 2024. Currently, there's no in-place upgrade, so please create a new search service to get the extra storage. |
29+
| [**More quota for vectors**](search-limits-quotas-capacity.md#vector-limits-on-services-created-after-april-3-2024-in-supported-regions) | Feature | Vector quotas are also higher on new services created after April 3, 2024 in selected regions. |
30+
| [**Vector quantization, narrow vector data types, and a new `stored` property (preview)**](vector-search-how-to-configure-compression-storage.md) | Feature | Collectively, these three features add vector compression and smarter storage options. First, *scalar quantization* reduces vector index size in memory and on disk. Second, [narrow data types](/rest/api/searchservice/supported-data-types) reduce per-field storage by storing smaller values. Third, you can use `stored` to opt-out of storing the extra copy of a vector that's used only for search results. If you don't need vectors in a query response, you can set `stored` to false to save on space. |
31+
| [**2024-03-01-preview Search REST API**](/rest/api/searchservice/search-service-api-versions#2024-03-01-preview) | API | New preview version of the Search REST APIs for the new data types, vector compression properties, and vector storage options. |
3232
| [**2024-03-01-preview Management REST API**](/rest/api/searchmanagement/operation-groups?view=rest-searchmanagement-2024-03-01-preview&preserve-view=true) | API | New preview version of the Management REST APIs for control plane operations. |
3333
| [**2023-07-01-preview deprecation announcement**](/rest/api/searchservice/search-service-api-versions#2023-07-01-preview) | API | Deprecation announced on April 8, 2024. Retirement on July 8, 2024. This was the first REST API that offered vector search support. Newer API versions have a different vector configuration. We recommend [migrating to a newer version](search-api-migration.md) as soon as possible. |
3434

0 commit comments

Comments
 (0)