Skip to content

Commit 218b06d

Browse files
authored
Update ai-search-ingestion.md
1 parent e99cb69 commit 218b06d

File tree

1 file changed

+4
-9
lines changed

1 file changed

+4
-9
lines changed

articles/ai-services/openai/includes/ai-search-ingestion.md

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,12 @@ manager: nitinme
33
ms.service: azure-ai-studio
44
ms.custom:
55
ms.topic: include
6-
ms.date: 03/25/2024
7-
ms.author: aahi
8-
author: aahill
6+
ms.date: 10/08/2024
7+
ms.author: fshakerin
8+
author: fshakerin
99
---
1010

1111
### How data is ingested into Azure AI search
1212

1313
Data is ingested into Azure AI search using the following process:
14-
15-
1. Ingestion assets are created in Azure AI Search resource and Azure storage account. Currently these assets are: indexers, indexes, data sources, a [custom skill](/azure/search/cognitive-search-custom-skill-interface) in the search resource, and a container (later called the chunks container) in the Azure storage account. You can specify the input Azure storage container using the [Azure OpenAI studio](https://oai.azure.com/), or the [ingestion API (preview)](/rest/api/azureopenai/ingestion-jobs). By default, text is assumed to use the UTF-8 encoding. To specify a different encoding, use the encoding configuration property. See the [.NET documentation](/dotnet/fundamentals/runtime-libraries/system-text-encoding#list-of-encodings) for a list of supported encodings.
16-
17-
2. Data is read from the input container, contents are opened and chunked into small chunks with a maximum of 1,024 tokens each. If vector search is enabled, the service calculates the vector representing the embeddings on each chunk. The output of this step (called the "preprocessed" or "chunked" data) is stored in the chunks container created in the previous step.
18-
19-
3. The preprocessed data is loaded from the chunks container, and indexed in the Azure AI Search index.
14+
1. As of September 2024, the ingestion APIs have switched to [integrated vectoriztion](/azure/search/vector-search-integrated-vectorization). This update does **not** alter the existing API contracts. Integrated Vectorization - a new offering of Azure AI Search- utilizes prebuilt skills for chunking and embedding the input data. Consequently, the Azure OpenAI On Your Data ingestion service no longer employs custom skills. Following the migration to Integrated Vectorization, the ingestion process has undergone some modifications and as a result only the following assets are created: a single index, a single indexer (if an hourly or daily schedule is specified), and a data source. The chunks container is no longer available, as this functionality is now inherently managed by Azure AI Search.

0 commit comments

Comments
 (0)