|
2 | 2 |
|
3 | 3 | # Azure AI Search Setup Guide |
4 | 4 | ## Overview |
5 | | -The Azure AI Search feature helps improve the responses from your application by combining the power of large language models (LLMs) with extra context retrieved from an external data source. Simply put, when you ask a question, the agent first searches through a set of relevant documents (stored as embeddings) and then uses this context to provide a more accurate and relevant response. If no relevant context is found, the agent returns the LLM response directly and informs customer that there is no relevant information in the documents. |
6 | | -This AI Search feature is optional and is disabled by default. If you prefer to use it, simply set the environment variable `USE_AZURE_AI_SEARCH_SERVICE` to `true`. Doing so will also trigger the deployment of Azure AI Search resources. |
| 5 | +The Azure AI Search feature improves responses by letting the agent pull context from indexed documents. It is optional and disabled by default; set `USE_AZURE_AI_SEARCH_SERVICE=true` **before your first `azd up`** so that run will provision the search resources. |
7 | 6 |
|
8 | 7 | ## How does Azure AI Search works? |
9 | | -In our provided example, the application includes a sample dataset containing information about Contoso products. This data was split by 10 sentences, and each chunk of text was transformed into numerical representations called embeddings. These embeddings were created using OpenAI's `text-embedding-3-small` model with `dimensions=100`. The resulting embeddings file (`embeddings.csv`) is located in the `api/data` folder. The agent requires index, capable of both semantic and index search i.e. it can use LLM to search for context in the text fields as well as it can search by embedding vector similarity. The built index also must have the configured vectorizer, which will build the embedding when the agent will apply hybrid search. For search to provide the correct reference, the index must contain the field called "title" and optionally "url" to provide the link, however it is not shown in our sample as we have generated index using files located in `api/files` and there are no links available. |
| 8 | +When enabled, the app uploads the `.md` and `.pdf` files in `src/files` to your Azure Storage account and indexes them for hybrid search. Files go into the `documents` container (override with `AZURE_BLOB_CONTAINER_NAME`). The search index is created automatically and includes semantic + vector search with a `title` field for references. |
| 9 | + |
| 10 | +## Environment variables (set before first `azd up`) |
| 11 | +- `USE_AZURE_AI_SEARCH_SERVICE`: Set to `true` to enable AI Search; must be set before the first `azd up` to provision search resources. |
| 12 | +- `AZURE_AI_SEARCH_INDEX_NAME`: Name of the search index to create/use (default `index_sample`). |
| 13 | +- `AZURE_AI_EMBED_DEPLOYMENT_NAME`: Embedding deployment name used for vectorization. |
| 14 | +- `AZURE_BLOB_CONTAINER_NAME`: Optional override for the blob container name (default `documents`). |
| 15 | + |
| 16 | +## Indexing pipeline (what happens on startup) |
| 17 | +When `USE_AZURE_AI_SEARCH_SERVICE=true` and the app sees no agent yet, it builds the AI Search tool in these steps. Each step is skipped if the resource already exists; nothing is overwritten unless missing: |
| 18 | + |
| 19 | +1) Blob container |
| 20 | + - Check for the `documents` container; create it if missing. |
| 21 | + |
| 22 | +2) Upload seed files |
| 23 | + - If the container is empty, upload everything under `src/files`. |
| 24 | + - If blobs already exist, skip upload to avoid overwriting user content. |
| 25 | + |
| 26 | +3) Search index |
| 27 | + - Create the search index with semantic + vector config if it doesn’t exist; otherwise reuse it. |
| 28 | + |
| 29 | +4) Datasource |
| 30 | + - Create the blob datasource pointing to `documents`; reuse it if already present. |
| 31 | + |
| 32 | +5) Skillset |
| 33 | + - Create the split + embedding skillset; reuse it if already present. |
| 34 | + |
| 35 | +6) Indexers |
| 36 | + - Create two indexers if missing: one for `.md` (markdown parsing) and one for `.pdf,.docx,.pptx,.xlsx,.txt` (default parsing). |
| 37 | + - Both indexers write chunks, vectors, and set `title` from the blob file name (`metadata_storage_name`). |
| 38 | + |
| 39 | +## Resource naming from the index name |
| 40 | +- The index name comes from `AZURE_AI_SEARCH_INDEX_NAME` (default to: `index_sample`). |
| 41 | +- Names are sanitized (lowercase, underscores → hyphens) to comply with Azure Search resource naming rules (example: `index-sample`). |
| 42 | +- Skillset: `<sanitized>-skillset`. |
| 43 | +- Datasource: `<sanitized>-datasource`. |
| 44 | +- Indexers: `<sanitized>-markdown-indexer` and `<sanitized>-documents-indexer`. |
| 45 | + |
| 46 | +## Viewing resources in Azure Portal (with screenshots) |
| 47 | +- Blob container/files: In Storage **Containers**, open `documents` (or your override).  |
| 48 | +- Datasource: In AI Search **Data sources**, open `<sanitized>-datasource`.  |
| 49 | +- Skillset: In **Skillsets**, open `<sanitized>-skillset` to see split + embedding skills.  |
| 50 | +- Indexers: In **Indexers**, find `<sanitized>-markdown-indexer` and `<sanitized>-documents-indexer`; check run history here.  |
| 51 | +- Index: In **Indexes**, open `AZURE_AI_SEARCH_INDEX_NAME` to review fields, vector, and semantic settings.  |
| 52 | + |
| 53 | +## Deployment and drift |
| 54 | +- Changes you make directly in Azure (index, indexers, skillset, or files in the `documents` container) are kept; `azd up`/`azd deploy` will only recreate these if they are missing. |
| 55 | +- If the agent resource used by the web app is deleted, a subsequent `azd up` or `azd deploy` will provision a new agent and recreate any missing AI Search assets (index, indexers, datasource, skillset). |
10 | 56 |
|
11 | 57 |
|
12 | 58 | ## If you want to use your own dataset |
13 | | -To create a custom embeddings file with your own data, you can use the provided helper class `SearchIndexManager`. Below is a straightforward way to build your own embeddings: |
14 | | -```python |
15 | | -from .api.search_index_manager import SearchIndexManager |
16 | | - |
17 | | -search_index_manager = SearchIndexManager( |
18 | | - endpoint=your_search_endpoint, |
19 | | - credential=your_credentials, |
20 | | - index_name=your_index_name, |
21 | | - dimensions=100, |
22 | | - model=your_embedding_model, |
23 | | - deployment_name=your_embedding_model, |
24 | | - embedding_endpoint=your_search_endpoint_url, |
25 | | - embed_api_key=embed_api_key, |
26 | | - embedding_client=embedding_client |
27 | | -) |
28 | | -search_index_manager.build_embeddings_file( |
29 | | - input_directory=input_directory, |
30 | | - output_file=output_directory, |
31 | | - sentences_per_embedding=10 |
32 | | -) |
33 | | -``` |
34 | | -- Make sure to replace `your_search_endpoint`, `your_credentials`, `your_index_name`, and `embedding_client` with your own Azure service details. |
35 | | -- `your_embedding_model` is the model, used to build embeddings. |
36 | | -- `your_search_endpoint_url` is the url of emedding endpoint, which will be used to create the vectorizer, and `embed_api_key` is the API key to access it. |
37 | | -- Your input data should be placed in the folder specified by `input_directory`. |
38 | | -- `sentences_per_embedding` parameter specifies the number of sentences used to construct the embedding. The larger this number, the broader the context that will be identified during the similarity search. |
39 | | - |
40 | | -## Deploying the Application with AI index search enabled |
41 | | -To deploy your application using the AI index search feature, set the following environment variables locally: |
42 | | -In power shell: |
43 | | -``` |
44 | | -$env:USE_AZURE_AI_SEARCH_SERVICE="true" |
45 | | -$env:AZURE_AI_SEARCH_INDEX_NAME="index_sample" |
46 | | -$env:AZURE_AI_EMBED_DEPLOYMENT_NAME="text-embedding-3-small" |
47 | | -``` |
48 | | - |
49 | | -In bash: |
50 | | -``` |
51 | | -export USE_AZURE_AI_SEARCH_SERVICE="true" |
52 | | -export AZURE_AI_SEARCH_INDEX_NAME="index_sample" |
53 | | -export AZURE_AI_EMBED_DEPLOYMENT_NAME="text-embedding-3-small" |
54 | | -``` |
55 | | - |
56 | | -In cmd: |
57 | | -``` |
58 | | -set USE_AZURE_AI_SEARCH_SERVICE=true |
59 | | -set AZURE_AI_SEARCH_INDEX_NAME=index_sample |
60 | | -set AZURE_AI_EMBED_DEPLOYMENT_NAME=text-embedding-3-small |
61 | | -``` |
62 | | - |
63 | | -- `USE_AZURE_AI_SEARCH_SERVICE`: Enables or disables (default) index search. |
64 | | -- `AZURE_AI_SEARCH_INDEX_NAME`: The Azure Search Index the application will use. |
65 | | -- `AZURE_AI_EMBED_DEPLOYMENT_NAME`: The Azure embedding deployment used to create embeddings. |
66 | | - |
67 | | -## Creating the Azure Search Index |
68 | | - |
69 | | -To utilize index search, you must have an Azure search index. By default, the application uses `index_sample` as the index name. You can create an index either by following these official Azure [instructions](https://learn.microsoft.com/azure/ai-services/agents/how-to/tools/azure-ai-search?tabs=azurecli%2Cpython&pivots=overview-azure-ai-search), or programmatically with the provided helper methods: |
70 | | -```python |
71 | | -# Create Azure Search Index (if it does not yet exist) |
72 | | -await search_index_manager.create_index(raise_on_error=True) |
73 | | - |
74 | | -# Upload embeddings to the index |
75 | | -await search_index_manager.upload_documents(embeddings_path) |
76 | | -``` |
77 | | -**Important:** If you have already created the index before deploying your application, the system will skip this step and directly use your existing Azure Search Index. The parameter `vector_index_dimensions` is only required if dimension information was not already provided when initially constructing the `SearchIndexManager` object. |
| 59 | +- Drop your `.md` and `.pdf` files into `src/files` (or point the upload step to your folder). |
| 60 | +- If you already ran `azd up` and the `documents` container exists, delete that container to clear old files before redeploying. |
| 61 | +- Run `azd up` or `azd deploy` to recreate the container and upload your files. |
| 62 | +- After deployment, run the indexers (from the Azure Portal or CLI) so the new content is processed into the search index. |
| 63 | + |
0 commit comments