You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Indexing of additional documents](#indexing-of-additional-documents)
19
20
-[Removal of documents](#removal-of-documents)
20
21
-[Scheduled indexing](#scheduled-indexing)
@@ -136,11 +137,33 @@ You can also remove individual documents by using the `--remove` flag. Open eith
136
137
137
138
This project includes an optional feature to perform data ingestion in the cloud using Azure Functions as custom skills for Azure AI Search indexers. This approach offloads the ingestion workload from your local machine to the cloud, allowing for more scalable and efficient processing of large datasets.
138
139
139
-
You must first explicitly [enable cloud ingestion](./deploy_features.md#enabling-cloud-ingestion) in the `azd` environment to use this feature.
140
+
### Enabling cloud ingestion
140
141
141
-
This feature cannot be used on existing index. You need to create a new index or drop and recreate an existing index. In the newly created index schema, a new field 'parent_id' is added. This is used internally by the indexer to manage life cycle of chunks.
142
+
1. If you've previously deployed, delete the existing search index or create a new index. This feature cannot be used on existing index. In the newly created index schema, a new field 'parent_id' is added. This is used internally by the indexer to manage life cycle of chunks. Run this command to set a new index name:
142
143
143
-
### Custom skills pipeline
144
+
```shell
145
+
azd env set AZURE_SEARCH_INDEX cloudindex
146
+
```
147
+
148
+
2. Run this command:
149
+
150
+
```shell
151
+
azd env set USE_CLOUD_INGESTION true
152
+
```
153
+
154
+
3. Open `azure.yaml` and un-comment the document-extractor, figure-processor, and text-processor sections. Those are the Azure Functions apps that will be deployed and serve as Azure AI Search skills.
155
+
156
+
4. Provision the new Azure Functions resources, deploy the functionapps, and update the search indexer with:
157
+
158
+
```shell
159
+
azd up
160
+
```
161
+
162
+
5. That will upload the documents in the `data/` folder to the Blob storage container, create the indexer and skillset, and run the indexer to ingest the data. You can monitor the indexer status from the portal.
163
+
164
+
6. When you have new documents to ingest, you can upload documents to the Blob storage container and run the indexer from the Azure Portal to ingest new documents.
165
+
166
+
### Indexer architecture
144
167
145
168
The cloud ingestion pipeline uses four Azure Functions as custom skills within an Azure AI Search indexer. Each functioncorresponds to a stage in the ingestion process. Here's how it works:
*[Enabling persistent chat history with Azure Cosmos DB](#enabling-persistent-chat-history-with-azure-cosmos-db)
13
14
*[Enabling language picker](#enabling-language-picker)
@@ -256,6 +257,12 @@ first [remove the existing documents](./data_ingestion.md#removing-documents) an
256
257
⚠️ This feature does not yet support DOCX, PPTX, or XLSX formats. If you have figures in those formats, they will be ignored.
257
258
Convert them first to PDF or image formats to enable media description.
258
259
260
+
## Enabling cloud data ingestion
261
+
262
+
By default, this project runs a local script in order to ingest data. Once you move beyond the sample documents, you may want to enable [cloud ingestion](./data_ingestion.md#cloud-ingestion), which uses Azure AI Search indexers and custom Azure AI Search skills based off the same code used by the local ingestion. That approach scales better to larger amounts of data.
263
+
264
+
Learn more in the [cloud ingestion guide](./data_ingestion.md#cloud-ingestion).
265
+
259
266
## Enabling client-side chat history
260
267
261
268
[📺 Watch: (RAG Deep Dive series) Storing chat history](https://www.youtube.com/watch?v=1YiTFnnLVIA)
@@ -322,36 +329,6 @@ Alternatively you can use the browser's built-in [Speech Synthesis API](https://
322
329
azd env set USE_SPEECH_OUTPUT_BROWSER true
323
330
```
324
331
325
-
## Enabling cloud data ingestion
326
-
327
-
By default, this project runs a local script in order to ingest data. Once you move beyond the sample documents, you may want cloud ingestion, which uses Azure AI Search indexers and custom Azure AI Search skills based off the same code used by the local ingestion. That approach scales better to larger amounts of data.
328
-
329
-
To enable cloud ingestion:
330
-
331
-
1. If you've previously deployed, delete the existing search index or create a new index using:
332
-
333
-
```shell
334
-
azd env set AZURE_SEARCH_INDEX cloudindex
335
-
```
336
-
337
-
2. Run this command:
338
-
339
-
```shell
340
-
azd env set USE_CLOUD_INGESTION true
341
-
```
342
-
343
-
3. Open `azure.yaml` and un-comment the document-extractor, figure-processor, and text-processor sections. Those are the Azure Functions apps that will be deployed and serve as Azure AI Search skills.
344
-
345
-
4. Provision the new Azure Functions resources, deploy the functionapps, and update the search indexer with:
346
-
347
-
```shell
348
-
azd up
349
-
```
350
-
351
-
5. That will upload the documents in the `data/` folder to the Blob storage container, create the indexer and skillset, and run the indexer to ingest the data. You can monitor the indexer status from the portal.
352
-
353
-
6. When you have new documents to ingest, you can upload documents to the Blob storage container and run the indexer from the Azure Portal to ingest new documents.
354
-
355
332
## Enabling authentication
356
333
357
334
By default, the deployed Azure web app will have no authentication or access restrictions enabled, meaning anyone with routable network access to the web app can chat with your indexed data. If you'd like to automatically setup authentication and user login as part of the `azd up` process, see [this guide](./login_and_acl.md).
0 commit comments