Skip to content

Commit 79f06af

Browse files
Merge pull request #99 from aahill/quick-fix
adding encoding information
2 parents bd5930d + 5d1ac46 commit 79f06af

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

articles/ai-services/openai/includes/ai-search-ingestion.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ author: aahill
1212

1313
Data is ingested into Azure AI search using the following process:
1414

15-
1. Ingestion assets are created in Azure AI Search resource and Azure storage account. Currently these assets are: indexers, indexes, data sources, a [custom skill](/azure/search/cognitive-search-custom-skill-interface) in the search resource, and a container (later called the chunks container) in the Azure storage account. You can specify the input Azure storage container using the [Azure OpenAI studio](https://oai.azure.com/), or the [ingestion API (preview)](/rest/api/azureopenai/ingestion-jobs).
15+
1. Ingestion assets are created in Azure AI Search resource and Azure storage account. Currently these assets are: indexers, indexes, data sources, a [custom skill](/azure/search/cognitive-search-custom-skill-interface) in the search resource, and a container (later called the chunks container) in the Azure storage account. You can specify the input Azure storage container using the [Azure OpenAI studio](https://oai.azure.com/), or the [ingestion API (preview)](/rest/api/azureopenai/ingestion-jobs). By default, text is assumed to use the UTF-8 encoding. To specify a different encoding, use the encoding configuration property. See the [.NET documentation](/dotnet/fundamentals/runtime-libraries/system-text-encoding#list-of-encodings) for a list of supported encodings.
1616

1717
2. Data is read from the input container, contents are opened and chunked into small chunks with a maximum of 1,024 tokens each. If vector search is enabled, the service calculates the vector representing the embeddings on each chunk. The output of this step (called the "preprocessed" or "chunked" data) is stored in the chunks container created in the previous step.
1818

0 commit comments

Comments
 (0)