Skip to content

Commit c1d210f

Browse files
authored
Make all files in files folder available indexed by AI Search (#208)
* update ai search * update * code and doc clean up * update doc * switch back eval * update * improved sample
1 parent 254bcdc commit c1d210f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+834
-2776
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,7 @@ This template creates everything you need to get started with Microsoft Foundry:
153153
| [Azure Container Apps](https://learn.microsoft.com/azure/container-apps/) | Hosts and scales the web application with serverless containers |
154154
| [Azure Container Registry](https://learn.microsoft.com/azure/container-registry/) | Stores and manages container images for secure deployment |
155155
| [Storage Account](https://learn.microsoft.com/azure/storage/blobs/) | Provides blob storage for application data and file uploads |
156+
| [Blob Storage](https://learn.microsoft.com/azure/storage/blobs/) | Container (`documents`) used for uploaded files that feed AI Search |
156157
| [AI Search Service](https://learn.microsoft.com/azure/search/) | *Optional* - Enables hybrid search capabilities combining semantic and vector search |
157158
| [Application Insights](https://learn.microsoft.com/azure/azure-monitor/app/app-insights-overview) | *Optional* - Provides application performance monitoring, logging, and telemetry for debugging and optimization |
158159
| [Log Analytics Workspace](https://learn.microsoft.com/azure/azure-monitor/logs/log-analytics-workspace-overview) | *Optional* - Collects and analyzes telemetry data for monitoring and troubleshooting |

azure.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
name: azd-get-started-with-ai-agents
66
metadata:
7-
template: azd-get-started-with-ai-agents@2.0.3
7+
template: azd-get-started-with-ai-agents@2.1.0
88
requiredVersions:
99
azd: ">=1.14.0"
1010

docs/ai_search.md

Lines changed: 54 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -2,76 +2,62 @@
22

33
# Azure AI Search Setup Guide
44
## Overview
5-
The Azure AI Search feature helps improve the responses from your application by combining the power of large language models (LLMs) with extra context retrieved from an external data source. Simply put, when you ask a question, the agent first searches through a set of relevant documents (stored as embeddings) and then uses this context to provide a more accurate and relevant response. If no relevant context is found, the agent returns the LLM response directly and informs customer that there is no relevant information in the documents.
6-
This AI Search feature is optional and is disabled by default. If you prefer to use it, simply set the environment variable `USE_AZURE_AI_SEARCH_SERVICE` to `true`. Doing so will also trigger the deployment of Azure AI Search resources.
5+
The Azure AI Search feature improves responses by letting the agent pull context from indexed documents. It is optional and disabled by default; set `USE_AZURE_AI_SEARCH_SERVICE=true` **before your first `azd up`** so that run will provision the search resources.
76

87
## How does Azure AI Search works?
9-
In our provided example, the application includes a sample dataset containing information about Contoso products. This data was split by 10 sentences, and each chunk of text was transformed into numerical representations called embeddings. These embeddings were created using OpenAI's `text-embedding-3-small` model with `dimensions=100`. The resulting embeddings file (`embeddings.csv`) is located in the `api/data` folder. The agent requires index, capable of both semantic and index search i.e. it can use LLM to search for context in the text fields as well as it can search by embedding vector similarity. The built index also must have the configured vectorizer, which will build the embedding when the agent will apply hybrid search. For search to provide the correct reference, the index must contain the field called "title" and optionally "url" to provide the link, however it is not shown in our sample as we have generated index using files located in `api/files` and there are no links available.
8+
When enabled, the app uploads the `.md` and `.pdf` files in `src/files` to your Azure Storage account and indexes them for hybrid search. Files go into the `documents` container (override with `AZURE_BLOB_CONTAINER_NAME`). The search index is created automatically and includes semantic + vector search with a `title` field for references.
9+
10+
## Environment variables (set before first `azd up`)
11+
- `USE_AZURE_AI_SEARCH_SERVICE`: Set to `true` to enable AI Search; must be set before the first `azd up` to provision search resources.
12+
- `AZURE_AI_SEARCH_INDEX_NAME`: Name of the search index to create/use (default `index_sample`).
13+
- `AZURE_AI_EMBED_DEPLOYMENT_NAME`: Embedding deployment name used for vectorization.
14+
- `AZURE_BLOB_CONTAINER_NAME`: Optional override for the blob container name (default `documents`).
15+
16+
## Indexing pipeline (what happens on startup)
17+
When `USE_AZURE_AI_SEARCH_SERVICE=true` and the app sees no agent yet, it builds the AI Search tool in these steps. Each step is skipped if the resource already exists; nothing is overwritten unless missing:
18+
19+
1) Blob container
20+
- Check for the `documents` container; create it if missing.
21+
22+
2) Upload seed files
23+
- If the container is empty, upload everything under `src/files`.
24+
- If blobs already exist, skip upload to avoid overwriting user content.
25+
26+
3) Search index
27+
- Create the search index with semantic + vector config if it doesn’t exist; otherwise reuse it.
28+
29+
4) Datasource
30+
- Create the blob datasource pointing to `documents`; reuse it if already present.
31+
32+
5) Skillset
33+
- Create the split + embedding skillset; reuse it if already present.
34+
35+
6) Indexers
36+
- Create two indexers if missing: one for `.md` (markdown parsing) and one for `.pdf,.docx,.pptx,.xlsx,.txt` (default parsing).
37+
- Both indexers write chunks, vectors, and set `title` from the blob file name (`metadata_storage_name`).
38+
39+
## Resource naming from the index name
40+
- The index name comes from `AZURE_AI_SEARCH_INDEX_NAME` (default to: `index_sample`).
41+
- Names are sanitized (lowercase, underscores → hyphens) to comply with Azure Search resource naming rules (example: `index-sample`).
42+
- Skillset: `<sanitized>-skillset`.
43+
- Datasource: `<sanitized>-datasource`.
44+
- Indexers: `<sanitized>-markdown-indexer` and `<sanitized>-documents-indexer`.
45+
46+
## Viewing resources in Azure Portal (with screenshots)
47+
- Blob container/files: In Storage **Containers**, open `documents` (or your override). ![Blob container](./images/blob_store_containers.png)
48+
- Datasource: In AI Search **Data sources**, open `<sanitized>-datasource`. ![AI Search datasource](./images/ai_search_datasource.png)
49+
- Skillset: In **Skillsets**, open `<sanitized>-skillset` to see split + embedding skills. ![AI Search skillsets](./images/ai_search_skillsets.png)
50+
- Indexers: In **Indexers**, find `<sanitized>-markdown-indexer` and `<sanitized>-documents-indexer`; check run history here. ![AI Search indexers](./images/ai_search_indexers.png)
51+
- Index: In **Indexes**, open `AZURE_AI_SEARCH_INDEX_NAME` to review fields, vector, and semantic settings. ![AI Search indexes](./images/ai_search_indexes.png)
52+
53+
## Deployment and drift
54+
- Changes you make directly in Azure (index, indexers, skillset, or files in the `documents` container) are kept; `azd up`/`azd deploy` will only recreate these if they are missing.
55+
- If the agent resource used by the web app is deleted, a subsequent `azd up` or `azd deploy` will provision a new agent and recreate any missing AI Search assets (index, indexers, datasource, skillset).
1056

1157

1258
## If you want to use your own dataset
13-
To create a custom embeddings file with your own data, you can use the provided helper class `SearchIndexManager`. Below is a straightforward way to build your own embeddings:
14-
```python
15-
from .api.search_index_manager import SearchIndexManager
16-
17-
search_index_manager = SearchIndexManager(
18-
endpoint=your_search_endpoint,
19-
credential=your_credentials,
20-
index_name=your_index_name,
21-
dimensions=100,
22-
model=your_embedding_model,
23-
deployment_name=your_embedding_model,
24-
embedding_endpoint=your_search_endpoint_url,
25-
embed_api_key=embed_api_key,
26-
embedding_client=embedding_client
27-
)
28-
search_index_manager.build_embeddings_file(
29-
input_directory=input_directory,
30-
output_file=output_directory,
31-
sentences_per_embedding=10
32-
)
33-
```
34-
- Make sure to replace `your_search_endpoint`, `your_credentials`, `your_index_name`, and `embedding_client` with your own Azure service details.
35-
- `your_embedding_model` is the model, used to build embeddings.
36-
- `your_search_endpoint_url` is the url of emedding endpoint, which will be used to create the vectorizer, and `embed_api_key` is the API key to access it.
37-
- Your input data should be placed in the folder specified by `input_directory`.
38-
- `sentences_per_embedding` parameter specifies the number of sentences used to construct the embedding. The larger this number, the broader the context that will be identified during the similarity search.
39-
40-
## Deploying the Application with AI index search enabled
41-
To deploy your application using the AI index search feature, set the following environment variables locally:
42-
In power shell:
43-
```
44-
$env:USE_AZURE_AI_SEARCH_SERVICE="true"
45-
$env:AZURE_AI_SEARCH_INDEX_NAME="index_sample"
46-
$env:AZURE_AI_EMBED_DEPLOYMENT_NAME="text-embedding-3-small"
47-
```
48-
49-
In bash:
50-
```
51-
export USE_AZURE_AI_SEARCH_SERVICE="true"
52-
export AZURE_AI_SEARCH_INDEX_NAME="index_sample"
53-
export AZURE_AI_EMBED_DEPLOYMENT_NAME="text-embedding-3-small"
54-
```
55-
56-
In cmd:
57-
```
58-
set USE_AZURE_AI_SEARCH_SERVICE=true
59-
set AZURE_AI_SEARCH_INDEX_NAME=index_sample
60-
set AZURE_AI_EMBED_DEPLOYMENT_NAME=text-embedding-3-small
61-
```
62-
63-
- `USE_AZURE_AI_SEARCH_SERVICE`: Enables or disables (default) index search.
64-
- `AZURE_AI_SEARCH_INDEX_NAME`: The Azure Search Index the application will use.
65-
- `AZURE_AI_EMBED_DEPLOYMENT_NAME`: The Azure embedding deployment used to create embeddings.
66-
67-
## Creating the Azure Search Index
68-
69-
To utilize index search, you must have an Azure search index. By default, the application uses `index_sample` as the index name. You can create an index either by following these official Azure [instructions](https://learn.microsoft.com/azure/ai-services/agents/how-to/tools/azure-ai-search?tabs=azurecli%2Cpython&pivots=overview-azure-ai-search), or programmatically with the provided helper methods:
70-
```python
71-
# Create Azure Search Index (if it does not yet exist)
72-
await search_index_manager.create_index(raise_on_error=True)
73-
74-
# Upload embeddings to the index
75-
await search_index_manager.upload_documents(embeddings_path)
76-
```
77-
**Important:** If you have already created the index before deploying your application, the system will skip this step and directly use your existing Azure Search Index. The parameter `vector_index_dimensions` is only required if dimension information was not already provided when initially constructing the `SearchIndexManager` object.
59+
- Drop your `.md` and `.pdf` files into `src/files` (or point the upload step to your folder).
60+
- If you already ran `azd up` and the `documents` container exists, delete that container to clear old files before redeploying.
61+
- Run `azd up` or `azd deploy` to recreate the container and upload your files.
62+
- After deployment, run the indexers (from the Azure Portal or CLI) so the new content is processed into the search index.
63+

docs/deploy_customization.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,3 +100,10 @@ Change the SKU of the embeddings deployment:
100100

101101
```shell
102102
azd env set AZURE_AI_EMBED_DEPLOYMENT_SKU Standard
103+
```
104+
105+
Set the embedding model dimensionality (only when using AI Search):
106+
107+
```shell
108+
azd env set AZURE_AI_EMBED_DIMENSIONS 1536
109+
```

docs/deployment.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,7 @@ For a detailed description of customizable fields and instructions, view the [de
203203

204204
### Configurable Agents Knowledge Retrieval
205205

206-
By default, the template deploys OpenAI's [file search](https://learn.microsoft.com/azure/ai-services/agents/how-to/tools/file-search?tabs=python&pivots=overview) for agent's knowledge retrieval. An agent also can perform search using the search index, deployed in Azure AI Search resource. The semantic index search represents so-called hybrid search i.e. it uses LLM to search for the relevant context in the provided index as well as embedding similarity search. This index is built from the `embeddings.csv` file, containing the embeddings vectors, followed by the contexts.
206+
By default, the template deploys OpenAI's [file search](https://learn.microsoft.com/azure/ai-services/agents/how-to/tools/file-search?tabs=python&pivots=overview) for agent's knowledge retrieval. An agent also can perform search using the search index, deployed in Azure AI Search resource. The semantic index search represents so-called hybrid search i.e. it uses LLM to search for the relevant context in the provided index as well as embedding similarity search. This index is built from the files in `src/files`, which are uploaded to blob storage and chunked/embedded automatically.
207207
To use index search, please set the local environment variable `USE_AZURE_AI_SEARCH_SERVICE` to `true` during the `azd up` command. In this case the Azure AI Search resource will be deployed and used. For more information on Azure AI serach, please see the [Azure AI Search Setup Guide](ai_search.md)
208208

209209
To specify the model (e.g. gpt-5-mini, gpt-5) that is deployed for the agent when `azd up` is called, set the following environment variables:
56.3 KB
Loading

docs/images/ai_search_indexers.png

63.1 KB
Loading

docs/images/ai_search_indexes.png

61 KB
Loading
48.6 KB
Loading

docs/images/architecture.png

12.4 KB
Loading

0 commit comments

Comments
 (0)