You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/tutorial-rag-build-solution-pipeline.md
+58-3Lines changed: 58 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,7 +46,7 @@ If you don't have an Azure subscription, create a [free account](https://azure.m
46
46
47
47
## Provide the index schema
48
48
49
-
Here's the index schema from the [previous tutorial](search\tutorial-rag-build-solution-index-schema.md). It's organized around vectorized and nonvectorized chunks. It includes a `locations` field that stores AI-generated content created by the skillset.
49
+
Here's the index schema from the [previous tutorial](tutorial-rag-build-solution-index-schema.md). It's organized around vectorized and nonvectorized chunks. It includes a `locations` field that stores AI-generated content created by the skillset.
Indexers are the component that sets all of the processes in motion. You can create an indexer in a disabled state, but the default is to run it immediately. In this tutorial, create and run the indexer to retrieve the data from Blob storage, execute the skills, including chunking and vectorization, and load the index.
231
+
232
+
The indexer takes several minutes to run. When it's done, you can move on to the final step: querying your index.
233
+
230
234
```python
231
235
from azure.search.documents.indexes.models import (
232
236
SearchIndexer,
@@ -259,6 +263,8 @@ print(f' {indexer_name} is created and running. Give the indexer a few minutes b
259
263
260
264
## Run hybrid search to check results
261
265
266
+
Send a query to confirm your index is operational. A hybrid query is useful for verifying text and vector search.
267
+
262
268
```python
263
269
from azure.search.documents import SearchClient
264
270
from azure.search.documents.models import VectorizableTextQuery
This query returns a single match (`top=1`) consisting of the one chunk determined by the search engine to be the most relevant. Results from the query should look similar to the following example:
292
+
293
+
```
294
+
Score: 0.03306011110544205
295
+
Content: national Aeronautics and Space Administration
296
+
297
+
earth Science
298
+
299
+
NASA Headquarters
300
+
301
+
300 E Street SW
302
+
303
+
Washington, DC 20546
304
+
305
+
www.nasa.gov
306
+
307
+
np-2018-05-2546-hQ
308
+
```
309
+
310
+
Try a few more queries to get a sense of what the search engine returns directly so that you can compare it with an LLM-enabled response. Re-run the previous script with this query: "how much of the earth is covered in water"?
311
+
312
+
Results from this second query should look similar to the following results, which are lightly edited for concision.
313
+
314
+
With this example, it's easier to spot how chunks are returned verbatim, and how keyword and similarity search identify top matches. This specific chunk definitely has information about water and coverage over the earth, but it's not exactly relevant to the query. Semantic ranking would find a better answer, but as a next step, let's see how to connect Azure AI Search to an LLM for conversational search.
315
+
316
+
```
317
+
Score: 0.03333333507180214
318
+
Content:
319
+
320
+
Land of Lakes
321
+
Canada
322
+
323
+
During the last Ice Age, nearly all of Canada was covered by a massive ice sheet. Thousands of years later, the landscape still shows
324
+
325
+
the scars of that icy earth-mover. Surfaces that were scoured by retreating ice and flooded by Arctic seas are now dotted with
326
+
327
+
millions of lakes, ponds, and streams. In this false-color view from the Terra satellite, water is various shades of blue, green, tan, and
328
+
329
+
black, depending on the amount of suspended sediment and phytoplankton; vegetation is red.
330
+
331
+
The region of Nunavut Territory is sometimes referred to as the “Barren Grounds,” as it is nearly treeless and largely unsuitable for
332
+
333
+
agriculture. The ground is snow-covered for much of the year, and the soil typically remains frozen (permafrost) even during the
334
+
335
+
summer thaw. Nonetheless, this July 2001 image shows plenty of surface vegetation in midsummer, including lichens, mosses,
336
+
337
+
shrubs, and grasses. The abundant fresh water also means the area is teeming with flies and mosquitoes.
In this example, the answer is based on a single input (`top=1`) consisting of the one chunk determined by the search engine to be the most relevant. Results from the query should look similar to the following example.
81
+
82
+
```
83
+
About 72% of the Earth's surface is covered in water, according to page-79.pdf. The provided sources do not give further information on this topic.
84
+
```
85
+
86
+
Run the same query again after setting `top=3`. When you increase the inputs, the model returns different results each time, even if the query doesn't change. Here's one example of what the model returns after increasing the inputs to 3.
87
+
88
+
```
89
+
About 71% of the earth is covered by water, while the remaining 29% is land. Canada has numerous water bodies like lakes, ponds, and streams, giving it a unique landscape. The Nunavut territory is unsuitable for agriculture due to being snow-covered most of the year and frozen during the summer thaw. Don Juan Pond in the McMurdo Dry Valleys of Antarctica is the saltiest body of water on earth with a salinity level over 40%, much higher than the Dead Sea and Great Salt Lake. It rarely snows in the valley and Don Juan's calcium chloride–rich waters rarely freeze. NASA studies our planet's physical processes, including the water cycle, carbon cycle, ocean circulation, heat movement, and light interaction. NASA has a unique vantage point of observing the earth and making sense of it from space.
90
+
```
91
+
92
+
93
+
<!-- In this tutorial, learn how to send queries and prompts to a chat model for generative search.
18
94
19
95
Objective:
20
96
@@ -34,7 +110,7 @@ Tasks:
34
110
- H2 Set up clients and configure access (to the chat model)
35
111
- H2 Query using text, with a filter
36
112
- H2 Query using vectors and text-to-vector conversion at query time (not sure what the code looks like for this)
37
-
- H2 Query parent-child two indexes (unclear how to do this, Carey said query on child, do a lookup query on parent)
113
+
- H2 Query parent-child two indexes (unclear how to do this, Carey said query on child, do a lookup query on parent)-->
0 commit comments