Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions docs/data_ingestion.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@
| ------ | ------------------------------------ | ------------------------ |
| PDF | Yes (DI or local with PyPDF) | Yes |
| HTML | Yes (DI or local with BeautifulSoup) | Yes |
| DOCX, PPTX, XLSX | Yes (DI) | Yes |

Check failure on line 32 in docs/data_ingestion.md

View workflow job for this annotation

GitHub Actions / Check for Markdown linting errors

Table column style [Table pipe does not align with heading for style "aligned"]

Check failure on line 32 in docs/data_ingestion.md

View workflow job for this annotation

GitHub Actions / Check for Markdown linting errors

Table column style [Table pipe does not align with heading for style "aligned"]

Check failure on line 32 in docs/data_ingestion.md

View workflow job for this annotation

GitHub Actions / Check for Markdown linting errors

Table column style [Table pipe does not align with heading for style "aligned"]
| Images (JPG, PNG, BPM, TIFF, HEIFF)| Yes (DI) | Yes |

Check failure on line 33 in docs/data_ingestion.md

View workflow job for this annotation

GitHub Actions / Check for Markdown linting errors

Table column style [Table pipe does not align with heading for style "aligned"]
| TXT | Yes (Local) | Yes |
| JSON | Yes (Local) | Yes |
| CSV | Yes (Local) | Yes |
Expand Down Expand Up @@ -153,15 +153,21 @@

3. Open `azure.yaml` and un-comment the document-extractor, figure-processor, and text-processor sections. Those are the Azure Functions apps that will be deployed and serve as Azure AI Search skills.

4. Provision the new Azure Functions resources, deploy the function apps, and update the search indexer with:
4. (Recommended) Increase the capacity for the embedding model to the maximum quota allowed for your region/subscription, so that the Azure Functions can generate embeddings without hitting rate limits:

```shell
azd env set AZURE_OPENAI_EMB_DEPLOYMENT_CAPACITY 400
```

5. Provision the new Azure Functions resources, deploy the function apps, and update the search indexer with:

```shell
azd up
```

5. That will upload the documents in the `data/` folder to the Blob storage container, create the indexer and skillset, and run the indexer to ingest the data. You can monitor the indexer status from the portal.
6. That will upload the documents in the `data/` folder to the Blob storage container, create the indexer and skillset, and run the indexer to ingest the data. You can monitor the indexer status from the portal.

6. When you have new documents to ingest, you can upload documents to the Blob storage container and run the indexer from the Azure Portal to ingest new documents.
7. When you have new documents to ingest, you can upload documents to the Blob storage container and run the indexer from the Azure Portal to ingest new documents.

### Indexer architecture

Expand Down
2 changes: 1 addition & 1 deletion infra/main.bicep
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ var embedding = {
deploymentName: !empty(embeddingDeploymentName) ? embeddingDeploymentName : 'text-embedding-3-large'
deploymentVersion: !empty(embeddingDeploymentVersion) ? embeddingDeploymentVersion : (embeddingModelName == 'text-embedding-ada-002' ? '2' : '1')
deploymentSkuName: !empty(embeddingDeploymentSkuName) ? embeddingDeploymentSkuName : (embeddingModelName == 'text-embedding-ada-002' ? 'Standard' : 'GlobalStandard')
deploymentCapacity: embeddingDeploymentCapacity != 0 ? embeddingDeploymentCapacity : 30
deploymentCapacity: embeddingDeploymentCapacity != 0 ? embeddingDeploymentCapacity : 200
dimensions: embeddingDimensions != 0 ? embeddingDimensions : 3072
}

Expand Down
Loading