Skip to content

Commit 6df0200

Browse files
authored
Merge pull request #7640 from haileytap/miscellaneous
[Azure Search] Miscellaneous updates to agentic retrieval content
2 parents c7cfaf8 + c12b5a5 commit 6df0200

7 files changed

+106
-141
lines changed

articles/search/agentic-knowledge-source-how-to-blob.md

Lines changed: 49 additions & 92 deletions
Original file line numberDiff line numberDiff line change
@@ -7,59 +7,37 @@ author: HeidiSteen
77
ms.author: heidist
88
ms.service: azure-ai-search
99
ms.topic: how-to
10-
ms.date: 10/10/2025
10+
ms.date: 10/13/2025
1111
---
1212

1313
# Create a blob knowledge source
1414

1515
[!INCLUDE [Feature preview](./includes/previews/preview-generic.md)]
1616

17-
A *blob knowledge source* specifies all of the information necessary for indexing and querying multimodal Azure blob content in an Azure AI Search agentic pipeline. It's created independently, and then referenced by a [knowledge agent](agentic-retrieval-how-to-create-knowledge-base.md) and used at query time when an agent or chat bot calls a [retrieve](/rest/api/searchservice/knowledge-retrieval/retrieve?view=rest-searchservice-2025-08-01-preview&preserve-view=true) action.
17+
Use a *blob knowledge source* to index and query Azure blob content in an agentic retrieval pipeline. [Knowledge sources](agentic-knowledge-source-overview.md) are created independently, referenced in a [knowledge agent](agentic-retrieval-how-to-create-knowledge-base.md), and used as grounding data when an agent or chatbot calls a [retrieve](/rest/api/searchservice/knowledge-retrieval/retrieve?view=rest-searchservice-2025-08-01-preview&preserve-view=true) action at query time.
1818

19-
In contrast with a [search index knowledge source](agentic-knowledge-source-how-to-search-index.md) that specifies an existing and qualified index, a blob knowledge source specifies an external data source (a blob container) plus models and properties that are used to create an entire enrichment pipeline:
19+
Unlike a [search index knowledge source](agentic-knowledge-source-how-to-search-index.md), which specifies an existing and qualified index, a blob knowledge source specifies an external data source, models, and properties to automatically generate the following Azure AI Search objects:
2020

21-
+ The generated data source specifies the blob container
22-
+ The generated skillset chunks and vectorizes multimodal content
23-
+ The generated index stores indexed content and meets the criteria for agentic retrieval
24-
+ The generated indexer drives the indexing and enrichment pipeline
25-
26-
The generated index provides the content that's used by a knowledge agent.
21+
+ A data source that represents a blob container.
22+
+ A skillset that chunks and optionally vectorizes multimodal content from the container.
23+
+ An index that stores enriched content and meets the criteria for agentic retrieval.
24+
+ An indexer that uses the previous objects to drive the indexing and enrichment pipeline.
2725

2826
Knowledge sources are new in the 2025-08-01-preview release.
2927

3028
## Prerequisites
3129

32-
+ Azure Storage with a blob container containing [supported content types](search-how-to-index-azure-blob-storage.md#supported-document-formats) for text content. For images, the supported content type depends on your chat completion model and whether it can analyze and describe the image file.
33-
34-
+ Azure AI Search, basic tier or higher, configured for semantic ranker.
30+
+ Azure Storage with a blob container containing [supported content types](search-how-to-index-azure-blob-storage.md#supported-document-formats) for text content. For optional image verbalization, the supported content type depends on whether your chat completion model can analyze and describe the image file.
3531

36-
+ An embedding model and a chat completion model used for verbalizing images. Depending on the models you specify, the generated skillset can include any of the following skills: [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md), [GenAI Prompt skill](cognitive-search-skill-genai-prompt.md), [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), [AML skill](cognitive-search-aml-skill.md). Each of these skills has a finite list of supported models. Check the skill documentation for supported models.
32+
+ Azure AI Search on the Basic tier or higher with [semantic ranker enabled](semantic-how-to-enable-disable.md).
3733

38-
To try the examples in this article, we recommend [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client) for sending preview REST API calls to Azure AI Search. There's no portal support at this time.
34+
To try the examples in this article, we recommend [Visual Studio Code](https://code.visualstudio.com/download) with the [REST Client extension](https://marketplace.visualstudio.com/items?itemName=humao.rest-client) for sending preview REST API calls to Azure AI Search. Currently, there's no portal support.
3935

4036
## Check for existing knowledge sources
4137

42-
A knowledge source is a top-level, reusable object. All knowledge sources must be uniquely named within the knowledge sources collection. It's helpful to know about existing knowledge sources for either reuse or for naming new objects.
43-
44-
The following request lists knowledge sources by name and type.
45-
46-
```http
47-
# List knowledge sources by name and type
48-
GET {{search-url}}/knowledgeSources?api-version=2025-08-01-preview&$select=name,kind
49-
api-key: {{api-key}}
50-
Content-Type: application/json
51-
```
52-
53-
You can also return a single knowledge source by name to review its JSON definition.
54-
55-
```http
56-
### Get a knowledge source definition
57-
GET {{search-url}}/knowledgeSources/{{knowledge-source-name}}?api-version=2025-08-01-preview
58-
api-key: {{api-key}}
59-
Content-Type: application/json
60-
```
38+
[!INCLUDE [Check for existing knowledge sources](includes/how-tos/knowledge-source-check-rest.md)]
6139

62-
A response for blob knowledge source might look like the following example.
40+
The following JSON is an example response for an `azureBlob` knowledge source.
6341

6442
```json
6543
{
@@ -110,104 +88,83 @@ A response for blob knowledge source might look like the following example.
11088
```
11189

11290
> [!NOTE]
113-
> Sensitive information is redacted. The generated resources appear at the end of the response. The `webParameters` property isn't operational in this preview and it's reserved for future use.
91+
> Sensitive information is redacted. The generated resources appear at the end of the response. The `webParameters` property isn't operational in this preview and is reserved for future use.
11492
11593
## Create a knowledge source
11694

117-
To create a [knowledge source](agentic-knowledge-source-overview.md), use the 2025-08-01-preview data plane REST API or an Azure SDK preview package that provides equivalent functionality.
118-
119-
A knowledge source can contain exactly one of the following: `searchIndexParameters` *or* `azureBlobParameters`. The `webParameters` property isn't supported in this release. If you specify `azureBlobParameters`, then `searchIndexParameters` must be null.
120-
121-
For `azureBlobParameters`:
122-
123-
+ Provide a connection to Azure AI Search
124-
+ Provide a full access connection string for Azure Storage and the container name
125-
+ Provide a text embedding model. This model is used to vectorize text content during indexing and queries.
126-
+ Provide a chat completion model used for describing image content.
127-
+ Provide an encryption key to doubly encrypt sensitive information in this knowledge source and in the generated resources.
128-
129-
Models are referenced in the skillset and as vectorizer for encoding text strings at query time.
130-
131-
A blob knowledge source can include an `ingestionSchedule` that adds scheduling information to an indexer. You can also [add a schedule](search-howto-schedule-indexers.md) later if you want to automate data refresh
132-
133-
1. Use the [Create or Update Knowledge Source](/rest/api/searchservice/knowledge-sources/create-or-update?view=rest-searchservice-2025-08-01-preview&preserve-view=true) preview REST API.
95+
To create an `azureBlob` knowledge source:
13496

13597
1. Set environment variables at the top of your file.
13698

13799
```http
138-
@search-url=<YOUR SEARCH SERVICE URL>
139-
@api-key=<YOUR SEARCH ADMIN API KEY>
140-
@connection-string=<YOUR FULL ACCESS CONNECTION STRING TO AZURE STORAGE>
141-
@aoai-endpoint=<YOUR AZURE OPENAI ENDPOINT>
142-
@aoai-key=<YOUR AZURE OPENAI API KEY>
100+
@search-url = <YOUR SEARCH SERVICE URL>
101+
@api-key = <YOUR SEARCH ADMIN API KEY>
102+
@ks-name = <YOUR KNOWLEDGE SOURCE NAME>
103+
@connection-string = <YOUR FULL ACCESS CONNECTION STRING TO AZURE STORAGE>
104+
@container-name = <YOUR BLOB CONTAINER NAME>
143105
```
144106
145-
1. Formulate the request and then **Send**.
107+
1. Use the 2025-08-01-preview of [Knowledge Sources - Create or Update (REST API)](/rest/api/searchservice/knowledge-sources/create-or-update?view=rest-searchservice-2025-08-01-preview&preserve-view=true) or an Azure SDK preview package that provides equivalent functionality to formulate the request.
146108
147109
```http
148110
PUT {{search-url}}/knowledgeSources/earth-at-night-blob-ks?api-version=2025-08-01-preview
149111
api-key: {{api-key}}
150112
Content-Type: application/json
151113
152114
{
153-
"name": "earth-at-night-blob-ks",
115+
"name": "{{ks-name}}",
154116
"kind": "azureBlob",
155-
"description": "This knowledge source pull from a blob storage container containing pages from the Earth at Night PDF.",
117+
"description": "This knowledge source pulls from a blob storage container containing pages from the Earth at Night PDF.",
156118
"encryptionKey": null,
157119
"azureBlobParameters": {
158120
"connectionString": "{{connection-string}}",
159-
"containerName": "nasa-ebook",
121+
"containerName": "{{container-name}}",
160122
"folderPath": null,
161123
"disableImageVerbalization": null,
162124
"identity": null,
163125
"embeddingModel": {
164-
"kind": "azureOpenAI",
165-
"azureOpenAIParameters": {
166-
"resourceUri": "{{aoai-endpoint}}",
167-
"deploymentId": "text-embedding-3-small",
168-
"apiKey": "{{aoai-key}}",
169-
"modelName": "text-embedding-3-small",
170-
"authIdentity": null
171-
},
172-
"customWebApiParameters": null,
173-
"aiServicesVisionParameters": null,
174-
"amlParameters": null
126+
// Redacted for brevity
175127
},
176128
"chatCompletionModel": {
177-
"kind": "azureOpenAI",
178-
"azureOpenAIParameters": {
179-
"resourceUri": "{{aoai-endpoint}}",
180-
"deploymentId": "gpt-5-mini",
181-
"apiKey": "{{aoai-key}}",
182-
"modelName": "gpt-5-mini",
183-
"authIdentity": null
184-
}
129+
// Redacted for brevity
185130
},
186131
"ingestionSchedule": {
187-
"interval": "P1D",
188-
"startTime": "2025-01-07T19:30:00Z"
132+
// Redacted for brevity
189133
}
190134
}
191135
}
192136
```
193137
194-
If you get errors, make sure the embedding model and chat completion models exist at the endpoint you provided.
138+
1. Select **Send Request**.
139+
140+
**Key points:**
141+
142+
+ `name` must be unique within the knowledge sources collection and follow the [naming guidelines](/rest/api/searchservice/naming-rules) for objects in Azure AI Search.
143+
144+
+ `kind` must be `azureBlob` for a blob knowledge source.
145+
146+
+ `encryptionKey` (optional) is an encryption key in Azure Key Vault. Use this property to doubly encrypt sensitive information in both the knowledge source and the generated objects.
147+
148+
+ `embeddingModel` (optional) is a text embedding model that vectorizes text and image content during indexing and at query time. Use a model supported by the [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md), [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), [AML skill](cognitive-search-aml-skill.md), or [Custom Web API skill](cognitive-search-custom-skill-web-api.md). The embedding skill will be included in the generated skillset, and its equivalent vectorizer will be included in the generated index.
195149
196-
## Check output
150+
+ `chatCompletionModel` (optional) is a chat completion model that verbalizes images or extracts content. Use a model supported by the [GenAI Prompt skill](cognitive-search-skill-genai-prompt.md), which will be included in the generated skillset. To skip image verbalization, omit this object and set `"disableImageVerbalization": true`.
197151
198-
When you create a blob knowledge source, the search service also creates the following objects: an indexer, data source, skillset, and index. Exercise caution when editing these objects because you can break the pipeline if you introduce an error or incompatibility.
152+
+ `ingestionSchedule` (optional) adds scheduling information to the generated indexer. You can also [add a schedule](search-howto-schedule-indexers.md) later to automate data refresh.
199153
200-
The response on knowledge source creation lists the created resources. Objects are created according to a fixed template and naming is based on the knowledge source. You can't change the object names.
154+
+ If you get errors, make sure the embedding and chat completion models exist at the endpoints you provided.
201155
202-
We recommend using the Azure portal to validate output creation.
156+
## Review the created objects
203157
204-
1. Check the indexer for success or failure messages. Connection or quota errors appear here. If the indexer failed, try reset and rerun.
158+
When you create a blob knowledge source, your search service also creates an indexer, data source, skillset, and index. Exercise caution when you edit these objects, as introducing an error or incompatibility can break the pipeline.
205159
206-
1. Check the index for searchable content. Use Search Explorer to run your queries.
160+
After you create a knowledge source, the response lists the created objects. These objects are created according to a fixed template, and their names are based on the name of the knowledge source. You can't change the object names.
207161
208-
1. Check the skillset to learn more about how your content is chunked and vectorized.
162+
We recommend using the Azure portal to validate output creation. The workflow is:
209163
210-
1. Modify the data source if you want to change connection details, such as authentication and authorization. The example uses API keys for simplicity but you can use Microsoft Entra ID authentication and role-based access.
164+
1. Check the indexer for success or failure messages. Connection or quota errors appear here.
165+
1. Check the index for searchable content. Use Search Explorer to run queries.
166+
1. Check the skillset to learn how your content is chunked and optionally vectorized.
167+
1. Modify the data source if you want to change connection details, such as authentication and authorization. Our example uses API keys for simplicity, but you can use Microsoft Entra ID authentication and role-based access.
211168
212169
## Assign to a knowledge agent
213170
@@ -221,7 +178,7 @@ After the knowledge agent is configured, use the retrieve action to query the kn
221178
222179
[!INCLUDE [Delete knowledge source](includes/how-tos/knowledge-source-delete-rest.md)]
223180
224-
## Learn more
181+
## Related content
225182
226183
+ [Azure AI Search Blob knowledge source Python sample](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/knowledge/blob-knowledge-source.ipynb)
227184

0 commit comments

Comments
 (0)