You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A *blob knowledge source*specifies all of the information necessary for indexing and querying multimodal Azure blob content in an Azure AI Search agentic pipeline. It's created independently, and then referenced by a [knowledge agent](agentic-retrieval-how-to-create-knowledge-base.md) and used at query time when an agent or chat bot calls a [retrieve](/rest/api/searchservice/knowledge-retrieval/retrieve?view=rest-searchservice-2025-08-01-preview&preserve-view=true) action.
17
+
Use a *blob knowledge source*to index and query Azure blob content in an agentic retrieval pipeline. [Knowledge sources](agentic-knowledge-source-overview.md) are created independently, referenced in a [knowledge agent](agentic-retrieval-how-to-create-knowledge-base.md), and used as grounding data when an agent or chatbot calls a [retrieve](/rest/api/searchservice/knowledge-retrieval/retrieve?view=rest-searchservice-2025-08-01-preview&preserve-view=true) action at query time.
18
18
19
-
In contrast with a [search index knowledge source](agentic-knowledge-source-how-to-search-index.md) that specifies an existing and qualified index, a blob knowledge source specifies an external data source (a blob container) plus models and properties that are used to create an entire enrichment pipeline:
19
+
Unlike a [search index knowledge source](agentic-knowledge-source-how-to-search-index.md), which specifies an existing and qualified index, a blob knowledge source specifies an external data source, models, and properties to automatically generate the following Azure AI Search objects:
20
20
21
-
+ The generated data source specifies the blob container
22
-
+ The generated skillset chunks and vectorizes multimodal content
23
-
+ The generated index stores indexed content and meets the criteria for agentic retrieval
24
-
+ The generated indexer drives the indexing and enrichment pipeline
25
-
26
-
The generated index provides the content that's used by a knowledge agent.
21
+
+ A data source that represents a blob container.
22
+
+ A skillset that chunks and optionally vectorizes multimodal content from the container.
23
+
+ An index that stores enriched content and meets the criteria for agentic retrieval.
24
+
+ An indexer that uses the previous objects to drive the indexing and enrichment pipeline.
27
25
28
26
Knowledge sources are new in the 2025-08-01-preview release.
29
27
30
28
## Prerequisites
31
29
32
-
+ Azure Storage with a blob container containing [supported content types](search-how-to-index-azure-blob-storage.md#supported-document-formats) for text content. For images, the supported content type depends on your chat completion model and whether it can analyze and describe the image file.
33
-
34
-
+ Azure AI Search, basic tier or higher, configured for semantic ranker.
30
+
+ Azure Storage with a blob container containing [supported content types](search-how-to-index-azure-blob-storage.md#supported-document-formats) for text content. For optional image verbalization, the supported content type depends on whether your chat completion model can analyze and describe the image file.
35
31
36
-
+An embedding model and a chat completion model used for verbalizing images. Depending on the models you specify, the generated skillset can include any of the following skills: [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md), [GenAI Prompt skill](cognitive-search-skill-genai-prompt.md), [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), [AML skill](cognitive-search-aml-skill.md). Each of these skills has a finite list of supported models. Check the skill documentation for supported models.
32
+
+Azure AI Search on the Basic tier or higher with [semantic ranker enabled](semantic-how-to-enable-disable.md).
37
33
38
-
To try the examples in this article, we recommend [Visual Studio Code](https://code.visualstudio.com/download) with a[REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client) for sending preview REST API calls to Azure AI Search. There's no portal support at this time.
34
+
To try the examples in this article, we recommend [Visual Studio Code](https://code.visualstudio.com/download) with the[REST Client extension](https://marketplace.visualstudio.com/items?itemName=humao.rest-client) for sending preview REST API calls to Azure AI Search. Currently, there's no portal support.
39
35
40
36
## Check for existing knowledge sources
41
37
42
-
A knowledge source is a top-level, reusable object. All knowledge sources must be uniquely named within the knowledge sources collection. It's helpful to know about existing knowledge sources for either reuse or for naming new objects.
43
-
44
-
The following request lists knowledge sources by name and type.
45
-
46
-
```http
47
-
# List knowledge sources by name and type
48
-
GET {{search-url}}/knowledgeSources?api-version=2025-08-01-preview&$select=name,kind
49
-
api-key: {{api-key}}
50
-
Content-Type: application/json
51
-
```
52
-
53
-
You can also return a single knowledge source by name to review its JSON definition.
54
-
55
-
```http
56
-
### Get a knowledge source definition
57
-
GET {{search-url}}/knowledgeSources/{{knowledge-source-name}}?api-version=2025-08-01-preview
58
-
api-key: {{api-key}}
59
-
Content-Type: application/json
60
-
```
38
+
[!INCLUDE [Check for existing knowledge sources](includes/how-tos/knowledge-source-check-rest.md)]
61
39
62
-
A response for blob knowledge source might look like the following example.
40
+
The following JSON is an example response for an `azureBlob` knowledge source.
63
41
64
42
```json
65
43
{
@@ -110,104 +88,83 @@ A response for blob knowledge source might look like the following example.
110
88
```
111
89
112
90
> [!NOTE]
113
-
> Sensitive information is redacted. The generated resources appear at the end of the response. The `webParameters` property isn't operational in this preview and it's reserved for future use.
91
+
> Sensitive information is redacted. The generated resources appear at the end of the response. The `webParameters` property isn't operational in this preview and is reserved for future use.
114
92
115
93
## Create a knowledge source
116
94
117
-
To create a [knowledge source](agentic-knowledge-source-overview.md), use the 2025-08-01-preview data plane REST API or an Azure SDK preview package that provides equivalent functionality.
118
-
119
-
A knowledge source can contain exactly one of the following: `searchIndexParameters`*or*`azureBlobParameters`. The `webParameters` property isn't supported in this release. If you specify `azureBlobParameters`, then `searchIndexParameters` must be null.
120
-
121
-
For `azureBlobParameters`:
122
-
123
-
+ Provide a connection to Azure AI Search
124
-
+ Provide a full access connection string for Azure Storage and the container name
125
-
+ Provide a text embedding model. This model is used to vectorize text content during indexing and queries.
126
-
+ Provide a chat completion model used for describing image content.
127
-
+ Provide an encryption key to doubly encrypt sensitive information in this knowledge source and in the generated resources.
128
-
129
-
Models are referenced in the skillset and as vectorizer for encoding text strings at query time.
130
-
131
-
A blob knowledge source can include an `ingestionSchedule` that adds scheduling information to an indexer. You can also [add a schedule](search-howto-schedule-indexers.md) later if you want to automate data refresh
132
-
133
-
1. Use the [Create or Update Knowledge Source](/rest/api/searchservice/knowledge-sources/create-or-update?view=rest-searchservice-2025-08-01-preview&preserve-view=true) preview REST API.
95
+
To create an `azureBlob` knowledge source:
134
96
135
97
1. Set environment variables at the top of your file.
136
98
137
99
```http
138
-
@search-url=<YOUR SEARCH SERVICE URL>
139
-
@api-key=<YOUR SEARCH ADMIN API KEY>
140
-
@connection-string=<YOUR FULL ACCESS CONNECTION STRING TO AZURE STORAGE>
141
-
@aoai-endpoint=<YOUR AZURE OPENAI ENDPOINT>
142
-
@aoai-key=<YOUR AZURE OPENAI API KEY>
100
+
@search-url = <YOUR SEARCH SERVICE URL>
101
+
@api-key = <YOUR SEARCH ADMIN API KEY>
102
+
@ks-name = <YOUR KNOWLEDGE SOURCE NAME>
103
+
@connection-string = <YOUR FULL ACCESS CONNECTION STRING TO AZURE STORAGE>
104
+
@container-name = <YOUR BLOB CONTAINER NAME>
143
105
```
144
106
145
-
1. Formulate the request and then **Send**.
107
+
1. Use the 2025-08-01-preview of [Knowledge Sources - Create or Update (REST API)](/rest/api/searchservice/knowledge-sources/create-or-update?view=rest-searchservice-2025-08-01-preview&preserve-view=true) or an Azure SDK preview package that provides equivalent functionality to formulate the request.
146
108
147
109
```http
148
110
PUT {{search-url}}/knowledgeSources/earth-at-night-blob-ks?api-version=2025-08-01-preview
149
111
api-key: {{api-key}}
150
112
Content-Type: application/json
151
113
152
114
{
153
-
"name": "earth-at-night-blob-ks",
115
+
"name": "{{ks-name}}",
154
116
"kind": "azureBlob",
155
-
"description": "This knowledge source pull from a blob storage container containing pages from the Earth at Night PDF.",
117
+
"description": "This knowledge source pulls from a blob storage container containing pages from the Earth at Night PDF.",
156
118
"encryptionKey": null,
157
119
"azureBlobParameters": {
158
120
"connectionString": "{{connection-string}}",
159
-
"containerName": "nasa-ebook",
121
+
"containerName": "{{container-name}}",
160
122
"folderPath": null,
161
123
"disableImageVerbalization": null,
162
124
"identity": null,
163
125
"embeddingModel": {
164
-
"kind": "azureOpenAI",
165
-
"azureOpenAIParameters": {
166
-
"resourceUri": "{{aoai-endpoint}}",
167
-
"deploymentId": "text-embedding-3-small",
168
-
"apiKey": "{{aoai-key}}",
169
-
"modelName": "text-embedding-3-small",
170
-
"authIdentity": null
171
-
},
172
-
"customWebApiParameters": null,
173
-
"aiServicesVisionParameters": null,
174
-
"amlParameters": null
126
+
// Redacted for brevity
175
127
},
176
128
"chatCompletionModel": {
177
-
"kind": "azureOpenAI",
178
-
"azureOpenAIParameters": {
179
-
"resourceUri": "{{aoai-endpoint}}",
180
-
"deploymentId": "gpt-5-mini",
181
-
"apiKey": "{{aoai-key}}",
182
-
"modelName": "gpt-5-mini",
183
-
"authIdentity": null
184
-
}
129
+
// Redacted for brevity
185
130
},
186
131
"ingestionSchedule": {
187
-
"interval": "P1D",
188
-
"startTime": "2025-01-07T19:30:00Z"
132
+
// Redacted for brevity
189
133
}
190
134
}
191
135
}
192
136
```
193
137
194
-
If you get errors, make sure the embedding model and chat completion models exist at the endpoint you provided.
138
+
1. Select **Send Request**.
139
+
140
+
**Key points:**
141
+
142
+
+ `name` must be unique within the knowledge sources collection and follow the [naming guidelines](/rest/api/searchservice/naming-rules) for objects in Azure AI Search.
143
+
144
+
+ `kind` must be `azureBlob` for a blob knowledge source.
145
+
146
+
+ `encryptionKey` (optional) is an encryption key in Azure Key Vault. Use this property to doubly encrypt sensitive information in both the knowledge source and the generated objects.
147
+
148
+
+ `embeddingModel` (optional) is a text embedding model that vectorizes text and image content during indexing and at query time. Use a model supported by the [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md), [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), [AML skill](cognitive-search-aml-skill.md), or [Custom Web API skill](cognitive-search-custom-skill-web-api.md). The embedding skill will be included in the generated skillset, and its equivalent vectorizer will be included in the generated index.
195
149
196
-
## Check output
150
+
+ `chatCompletionModel` (optional) is a chat completion model that verbalizes images or extracts content. Use a model supported by the [GenAI Prompt skill](cognitive-search-skill-genai-prompt.md), which will be included in the generated skillset. To skip image verbalization, omit this object and set `"disableImageVerbalization": true`.
197
151
198
-
When you create a blob knowledge source, the search service also creates the following objects: an indexer, data source, skillset, and index. Exercise caution when editing these objects because you can break the pipeline if you introduce an error or incompatibility.
152
+
+ `ingestionSchedule` (optional) adds scheduling information to the generated indexer. You can also [add a schedule](search-howto-schedule-indexers.md) later to automate data refresh.
199
153
200
-
The response on knowledge source creation lists the created resources. Objects are created according to a fixed template and naming is based on the knowledge source. You can't change the object names.
154
+
+ If you get errors, make sure the embedding and chat completion models exist at the endpoints you provided.
201
155
202
-
We recommend using the Azure portal to validate output creation.
156
+
## Review the created objects
203
157
204
-
1. Check the indexer for success or failure messages. Connection or quota errors appear here. If the indexer failed, try reset and rerun.
158
+
When you create a blob knowledge source, your search service also creates an indexer, data source, skillset, and index. Exercise caution when you edit these objects, as introducing an error or incompatibility can break the pipeline.
205
159
206
-
1. Check the index for searchable content. Use Search Explorer to run your queries.
160
+
After you create a knowledge source, the response lists the created objects. These objects are created according to a fixed template, and their names are based on the name of the knowledge source. You can't change the object names.
207
161
208
-
1. Check the skillset to learn more about how your content is chunked and vectorized.
162
+
We recommend using the Azure portal to validate output creation. The workflow is:
209
163
210
-
1. Modify the data source if you want to change connection details, such as authentication and authorization. The example uses API keys for simplicity but you can use Microsoft Entra ID authentication and role-based access.
164
+
1. Check the indexer for success or failure messages. Connection or quota errors appear here.
165
+
1. Check the index for searchable content. Use Search Explorer to run queries.
166
+
1. Check the skillset to learn how your content is chunked and optionally vectorized.
167
+
1. Modify the data source if you want to change connection details, such as authentication and authorization. Our example uses API keys for simplicity, but you can use Microsoft Entra ID authentication and role-based access.
211
168
212
169
## Assign to a knowledge agent
213
170
@@ -221,7 +178,7 @@ After the knowledge agent is configured, use the retrieve action to query the kn
0 commit comments