You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/tutorial-rag-build-solution-index-schema.md
+25-8Lines changed: 25 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ author: HeidiSteen
8
8
ms.author: heidist
9
9
ms.service: cognitive-search
10
10
ms.topic: tutorial
11
-
ms.date: 09/12/2024
11
+
ms.date: 10/01/2024
12
12
13
13
---
14
14
@@ -111,8 +111,25 @@ A minimal index for LLM is designed to store chunks of content. It typically inc
111
111
The schema also includes a `locations` field for storing generated content that's created by the [indexing pipeline](tutorial-rag-build-solution-pipeline.md).
112
112
113
113
```python
114
+
from azure.identity import DefaultAzureCredential
115
+
from azure.identity import get_bearer_token_provider
116
+
from azure.search.documents.indexes import SearchIndexClient
117
+
from azure.search.documents.indexes.models import (
Copy file name to clipboardExpand all lines: articles/search/tutorial-rag-build-solution-models.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.author: heidist
9
9
ms.service: cognitive-search
10
10
ms.topic: tutorial
11
11
ms.custom: references_regions
12
-
ms.date: 09/30/2024
12
+
ms.date: 10/01/2024
13
13
14
14
---
15
15
@@ -32,7 +32,7 @@ If you don't have an Azure subscription, create a [free account](https://azure.m
32
32
33
33
- The Azure portal, used to deploy models and configure role assignments in the Azure cloud.
34
34
35
-
- An **Owner** role on your Azure subscription, necessary for creating role assignments. You use at least three Azure resources in this tutorial. The connections are authenticated using Microsoft Entra ID, which requires the ability to create roles. Role assignments for connecting to models are documented in this article.
35
+
- An **Owner**or **User Access Administrator**role on your Azure subscription, necessary for creating role assignments. You use at least three Azure resources in this tutorial. The connections are authenticated using Microsoft Entra ID, which requires the ability to create roles. Role assignments for connecting to models are documented in this article. If you can't create roles, you can use [API keys](search-security-api-keys.md) instead.
36
36
37
37
- A model provider, such as [Azure OpenAI](/azure/ai-services/openai/how-to/create-resource), Azure AI Vision via an [Azure AI multi-service account](/azure/ai-services/multi-service-resource), or [Azure AI Studio](https://ai.azure.com/).
Copy file name to clipboardExpand all lines: articles/search/tutorial-rag-build-solution-pipeline.md
+54-31Lines changed: 54 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ author: HeidiSteen
8
8
ms.author: heidist
9
9
ms.service: cognitive-search
10
10
ms.topic: tutorial
11
-
ms.date: 09/23/2024
11
+
ms.date: 10/01/2024
12
12
13
13
---
14
14
@@ -19,7 +19,7 @@ Learn how to build an automated indexing pipeline for a RAG solution on Azure AI
19
19
In this tutorial, you:
20
20
21
21
> [!div class="checklist"]
22
-
> - Provide the index schema from the previous tutorial
22
+
> - Provide the index schema from the previous tutorial
23
23
> - Create a data source connection
24
24
> - Create an indexer
25
25
> - Create a skillset that chunks, vectorizes, and recognizes entities
@@ -53,8 +53,25 @@ Open or create a Jupyter notebook (`.ipynb`) in Visual Studio Code to contain th
53
53
Let's start with the index schema from the [previous tutorial](tutorial-rag-build-solution-index-schema.md). It's organized around vectorized and nonvectorized chunks. It includes a `locations` field that stores AI-generated content created by the skillset.
54
54
55
55
```python
56
+
from azure.identity import DefaultAzureCredential
57
+
from azure.identity import get_bearer_token_provider
58
+
from azure.search.documents.indexes import SearchIndexClient
59
+
from azure.search.documents.indexes.models import (
index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search)
94
111
result = index_client.create_or_update_index(index)
95
112
print(f"{result.name} created")
@@ -101,11 +118,11 @@ In this step, set up the sample data and a connection to Azure Blob Storage. The
101
118
102
119
The original ebook is large, over 100 pages and 35 MB in size. We broke it up into smaller PDFs, one per page of text, to stay under the [API payload limit](search-limits-quotas-capacity.md#api-request-limits) of 16 MB per API call and also the [AI enrichment data limits](search-limits-quotas-capacity.md#data-limits-ai-enrichment). For simplicity, we omit image vectorization for this exercise.
103
120
104
-
1. Sign in to the Azure portal and find your Azure Storage account.
121
+
1. Sign in to the [Azure portal](https://portal.azure.com) and find your Azure Storage account.
105
122
106
123
1. Create a container and upload the PDFs from [earth_book_2019_text_pages](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/nasa-e-book/earth_book_2019_text_pages).
107
124
108
-
1. Make sure Azure AI Search has **Storage Blob Data Reader** permissions on the resource.
125
+
1. Make sure Azure AI Search has [**Storage Blob Data Reader** permissions](/azure/role-based-access-control/role-assignments-portal) on the resource.
109
126
110
127
1. Next, in Visual Studio Code, define an indexer data source that provides connection information during indexing.
111
128
@@ -117,8 +134,8 @@ The original ebook is large, over 100 pages and 35 MB in size. We broke it up in
@@ -130,11 +147,15 @@ The original ebook is large, over 100 pages and 35 MB in size. We broke it up in
130
147
print(f"Data source '{data_source.name}' created or updated")
131
148
```
132
149
150
+
If you set up a managed identity for Azure AI Search for the connection, the connection string includes a `ResourceId=` suffix. It should look similar to the following example: `"ResourceId=/subscriptions/FAKE-SUBCRIPTION=ID/resourceGroups/FAKE-RESOURCE-GROUP/providers/Microsoft.Storage/storageAccounts/FAKE-ACCOUNT;"`
151
+
133
152
## Create a skillset
134
153
135
154
Skills are the basis for integrated data chunking and vectorization. At a minimum, you want a Text Split skill to chunk your content, and an embedding skill that create vector representations of your chunked content.
136
155
137
-
In this skillset, an extra skill is used to create structured data in the index. The Entity Recognition skill is used to identify locations, which can rangefrom proper names to generic references, such as"ocean"or"mountain". Having structured data gives you more options for creating interesting queries and boosting relevance.
156
+
In this skillset, an extra skill is used to create structured data in the index. The [Entity Recognition skill](cognitive-search-skill-entity-recognition-v3.md) is used to identify locations, which can rangefrom proper names to generic references, such as"ocean"or"mountain". Having structured data gives you more options for creating interesting queries and boosting relevance.
157
+
158
+
The AZURE_AI_MULTISERVICE_KEYis needed even if you're using role-based access control. Azure AI Search uses the key for billing purposes and it's required unless your workloads stay under the free limit.
138
159
139
160
```python
140
161
from azure.search.documents.indexes.models import (
@@ -143,7 +164,7 @@ from azure.search.documents.indexes.models import (
143
164
OutputFieldMappingEntry,
144
165
AzureOpenAIEmbeddingSkill,
145
166
EntityRecognitionSkill,
146
-
SearchIndexerIndexProjections,
167
+
SearchIndexerIndexProjection,
147
168
SearchIndexerIndexProjectionSelector,
148
169
SearchIndexerIndexProjectionsParameters,
149
170
IndexProjectionMode,
@@ -171,8 +192,8 @@ split_skill = SplitSkill(
171
192
embedding_skill = AzureOpenAIEmbeddingSkill(
172
193
description="Skill to generate embeddings via Azure OpenAI",
print(f'{indexer_name} is created and running. Give the indexer a few minutes before running a query.')
288
+
print(f'{indexer_name} is created and running. Give the indexer a few minutes before running a query.')
268
289
```
269
290
270
291
## Run a query to check results
271
292
272
293
Send a query to confirm your index is operational. This request converts the text string "`where are the nasa headquarters located?`" into a vector for a vector search. Results consist of the fields in the select statement, some of which are printed as output.
273
294
295
+
There's no chat or generative AI at this point. The results are verbatim content from your search index.
296
+
274
297
```python
275
298
from azure.search.documents import SearchClient
276
299
from azure.search.documents.models import VectorizableTextQuery
277
300
278
-
#Hybrid Search
279
-
query ="where are the nasa headquarters located?"
301
+
#Vector Search using text-to-vector conversion of the querystring
This query returns a single match (`top=1`) consisting of the one chunk determined by the search engine to be the most relevant. Results from the query should look similar to the following example:
0 commit comments