You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/vector-search-integrated-vectorization.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,27 +9,29 @@ ms.service: cognitive-search
9
9
ms.custom:
10
10
- ignite-2023
11
11
ms.topic: conceptual
12
-
ms.date: 03/27/2024
12
+
ms.date: 05/05/2024
13
13
---
14
14
15
15
# Integrated data chunking and embedding in Azure AI Search
16
16
17
17
> [!IMPORTANT]
18
-
> This feature is in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [2023-10-01-Preview REST API](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true)supports this feature.
18
+
> Integrated data chunking and vectorization is in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [2023-10-01-Preview REST API](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true)provides this feature.
19
19
20
20
*Integrated vectorization* adds data chunking and text-to-vector embedding to skills in indexer-based indexing. It also adds text-to-vector conversions to queries.
21
21
22
-
This capability is preview-only. In the generally available version of [vector search](vector-search-overview.md) and in previous preview versions, data chunking and vectorization rely on external components for chunking and vectors, and your application code must handle and coordinate each step. In this preview, chunking and vectorization are built into indexing through skills and indexers. You can set up a skillset that chunks data using the Text Split skill, and then call an embedding model using either the AzureOpenAIEmbedding skill or a custom skill. Any vectorizers used during indexing can also be called on queries to convert text to vectors.
22
+
<!--This capability is preview-only. In the generally available version of [vector search](vector-search-overview.md) and in previous preview versions, data chunking and vectorization rely on external components for chunking and vectors, and your application code must handle and coordinate each step. In this preview, chunking and vectorization are built into indexing through skills and indexers. You can set up a skillset that chunks data using the Text Split skill, and then call an embedding model using either the AzureOpenAIEmbedding skill or a custom skill. Any vectorizers used during indexing can also be called on queries to convert text to vectors.-->
23
23
24
-
For indexing, integrated vectorization requires:
24
+
For text-to-vector conversions during indexing:
25
25
26
-
+[An indexer](search-indexer-overview.md) retrieving data from a supported data source.
27
-
+[A skillset](cognitive-search-working-with-skillsets.md) that calls the [Text Split skill](cognitive-search-skill-textsplit.md) to chunk the data, and either [AzureOpenAIEmbedding skill](cognitive-search-skill-azure-openai-embedding.md) or a [custom skill](cognitive-search-custom-skill-web-api.md) to vectorize the data.
28
-
+[One or more indexes](search-what-is-an-index.md) to receive the chunked and vectorized content.
26
+
+[An indexer](search-indexer-overview.md) retrieves data from a supported data source.
27
+
+[A skillset](cognitive-search-working-with-skillsets.md) calls the [Text Split skill](cognitive-search-skill-textsplit.md) to chunk the data.
28
+
+ The same skillset also calls a vectorizer. The vectorizer is either the [AzureOpenAIEmbedding skill](cognitive-search-skill-azure-openai-embedding.md) for text-embedding-ada-002 on Azure OpenAI, or a [custom skill](cognitive-search-custom-skill-web-api.md) that points to another embedding model, for example test-embedding-ada-002 on OpenAI.
29
+
+ You also need a [vector index](search-what-is-an-index.md) to receive the chunked and vectorized content.
29
30
30
-
For queries:
31
+
For text-to-vector queries:
31
32
32
33
+[A vectorizer](vector-search-how-to-configure-vectorizer.md) defined in the index schema, assigned to a vector field, and used automatically at query time to convert a text query to a vector.
34
+
+ A query that specifies one or more vector fields, and a query string providing text that's converted to a vector at query
33
35
34
36
Vector conversions are one-way: text-to-vector. There's no vector-to-text conversion for queries or results (for example, you can't convert a vector result to a human-readable string).
|[Security update addressing information disclosure](https://msrc.microsoft.com/update-guide/vulnerability/CVE-2024-29063)| API | GET responses [no longer return connection strings or keys](search-api-migration.md#breaking-change-for-client-code-that-reads-connection-information). Applies to GET Skillset, GET Index, and GET Indexer. This change helps protect your Azure assets integrated with AI Search from unauthorized access. |
28
-
|[**Storage expansion on Basic and Standard tiers**](search-limits-quotas-capacity.md#service-limits)| Feature | Basic now supports up to three partitions and three replicas. Basic and Standard (S1, S2, S3) tiers have significantly more storage per partition, at the same per-partition billing rate. Extra capacity is subject to [regional availability](search-limits-quotas-capacity.md#supported-regions-with-higher-storage-limits) and applies to new search services created after April 3, 2024. Currently, there's no in-place upgrade, so please create a new search service to get the extra storage. |
29
-
|[**Increased quota for vectors**](search-limits-quotas-capacity.md#vector-limits-on-services-created-after-april-3-2024-in-supported-regions)| Feature | Vector quotas are also higher on new services created after April 3, 2024 in selected regions. |
30
-
|[**Built-in vector quantization, narrow vector data types, and a new `stored` property (preview)**](vector-search-how-to-configure-compression-storage.md)| Feature |This preview adds support for larger vector workloads at a lower cost through three enhancements. First, *scalar quantization* reduces vector index size in memory and on disk. Second, [narrow data types](/rest/api/searchservice/supported-data-types) can be assigned to vector fields that can use them. Third, we added more flexible vector field storage options.|
31
-
|[**2024-03-01-preview Search REST API**](/rest/api/searchservice/search-service-api-versions#2024-03-01-preview)| API | New preview version of the Search REST APIs for the new data types, vector compression properties, and storage options. |
28
+
|[**More storage on Basic and Standard tiers**](search-limits-quotas-capacity.md#service-limits)| Feature | Basic now supports up to three partitions and three replicas. Basic and Standard (S1, S2, S3) tiers have significantly more storage per partition, at the same per-partition billing rate. Extra capacity is subject to [regional availability](search-limits-quotas-capacity.md#supported-regions-with-higher-storage-limits) and applies to new search services created after April 3, 2024. Currently, there's no in-place upgrade, so please create a new search service to get the extra storage. |
29
+
|[**More quota for vectors**](search-limits-quotas-capacity.md#vector-limits-on-services-created-after-april-3-2024-in-supported-regions)| Feature | Vector quotas are also higher on new services created after April 3, 2024 in selected regions. |
30
+
|[**Vector quantization, narrow vector data types, and a new `stored` property (preview)**](vector-search-how-to-configure-compression-storage.md)| Feature |Collectively, these three features add vector compression and smarter storage options. First, *scalar quantization* reduces vector index size in memory and on disk. Second, [narrow data types](/rest/api/searchservice/supported-data-types)reduce per-field storage by storing smaller values. Third, you can use `stored` to opt-out of storing the extra copy of a vector that's used only for search results. If you don't need vectors in a query response, you can set `stored` to false to save on space. |
31
+
|[**2024-03-01-preview Search REST API**](/rest/api/searchservice/search-service-api-versions#2024-03-01-preview)| API | New preview version of the Search REST APIs for the new data types, vector compression properties, and vector storage options. |
32
32
|[**2024-03-01-preview Management REST API**](/rest/api/searchmanagement/operation-groups?view=rest-searchmanagement-2024-03-01-preview&preserve-view=true)| API | New preview version of the Management REST APIs for control plane operations. |
33
33
|[**2023-07-01-preview deprecation announcement**](/rest/api/searchservice/search-service-api-versions#2023-07-01-preview)| API | Deprecation announced on April 8, 2024. Retirement on July 8, 2024. This was the first REST API that offered vector search support. Newer API versions have a different vector configuration. We recommend [migrating to a newer version](search-api-migration.md) as soon as possible. |
0 commit comments