You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-get-started-portal-import-vectors.md
+55-25Lines changed: 55 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.custom:
9
9
- build-2024
10
10
- ignite-2024
11
11
ms.topic: quickstart
12
-
ms.date: 11/20/2024
12
+
ms.date: 11/22/2024
13
13
---
14
14
15
15
# Quickstart: Vectorize text and images by using the Azure portal
@@ -22,7 +22,7 @@ This quickstart helps you get started with [integrated vectorization](vector-sea
22
22
23
23
+[An Azure AI Search service](search-create-service-portal.md) in the same region as Azure AI. We recommend the Basic tier or higher.
24
24
25
-
+[A supported data source](#supported-data-sources).
25
+
+[A supported data source](#supported-data-sources) with the [Health Plan PDF](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/health-plan) sample documents.
@@ -333,19 +333,21 @@ Chunking is built in and nonconfigurable. The effective settings are:
333
333
334
334
1. Select the checkbox that acknowledges the billing effects of using these resources.
335
335
336
+
:::image type="content" source="media/search-get-started-portal-import-vectors/vectorize-text.png" alt-text="Screenshot of the vectorize text page in the wizard.":::
337
+
336
338
1. Select **Next**.
337
339
338
340
## Vectorize and enrich your images
339
341
340
-
The health plan PDFs don't include images, so you can skip this step.
342
+
The health plan PDFs include a corporate logo, but otherwise there are no images. You can skip this step if you're using the sample documents.
341
343
342
-
However, if you work with content that includes images, you can apply AI in two ways:
344
+
However, if you work with content that includes useful images, you can apply AI in two ways:
343
345
344
346
+ Use a supported image embedding model from the catalog, or choose the Azure AI Vision multimodal embeddings API to vectorize images.
345
347
346
348
+ Use optical character recognition (OCR) to recognize text in images. This option invokes the [OCR skill](cognitive-search-skill-ocr.md) to read text from images.
347
349
348
-
Azure AI Search and your Azure AI resource must be in the same region.
350
+
Azure AI Search and your Azure AI resource must be in the same region or configured for [keyless billing connections](cognitive-search-attach-cognitive-services.md).
349
351
350
352
1. On the **Vectorize your images** page, specify the kind of connection the wizard should make. For image vectorization, the wizard can connect to embedding models in Azure AI Studio or Azure AI Vision.
351
353
@@ -357,6 +359,8 @@ Azure AI Search and your Azure AI resource must be in the same region.
357
359
358
360
1. Select the checkbox that acknowledges the billing effects of using these resources.
359
361
362
+
:::image type="content" source="media/search-get-started-portal-import-vectors/vectorize-images.png" alt-text="Screenshot of the vectorize images page in the wizard.":::
363
+
360
364
1. Select **Next**.
361
365
362
366
## Add semantic ranking
@@ -371,12 +375,12 @@ Key points about this step:
371
375
+ You can add fields, but you can't delete or modify generated fields.
On the **Advanced settings** page, you can optionally add new fields. By default, the wizard generates the following fields with these attributes:
378
+
On the **Advanced settings** page, you can optionally add new fields assuming the data source provides metadata or fields that aren't picked up on the first pass. By default, the wizard generates the following fields with these attributes:
375
379
376
380
| Field | Applies to | Description |
377
381
|-------|------------|-------------|
378
382
| chunk_id | Text and image vectors | Generated string field. Searchable, retrievable, sortable. This is the document key for the index. |
379
-
|parent_id| Text vectors | Generated string field. Retrievable, filterable. Identifies the parent document from which the chunk originates. |
383
+
|text_parent_id| Text vectors | Generated string field. Retrievable, filterable. Identifies the parent document from which the chunk originates. |
380
384
| chunk | Text and image vectors | String field. Human readable version of the data chunk. Searchable and retrievable, but not filterable, facetable, or sortable. |
381
385
| title | Text and image vectors | String field. Human readable document title or page title or page number. Searchable and retrievable, but not filterable, facetable, or sortable. |
382
386
| text_vector | Text vectors | Collection(Edm.single). Vector representation of the chunk. Searchable and retrievable, but not filterable, facetable, or sortable.|
@@ -419,51 +423,77 @@ Search Explorer accepts text strings as input and then vectorizes the text for v
419
423
420
424
1. In the Azure portal, go to **Search Management** > **Indexes**, and then select the index that you created.
421
425
422
-
1.Optionally, select**Query options** and hide vector values in search results. This step makes your search results easier to read.
426
+
1.Select**Query options** and hide vector values in search results. This step makes your search results easier to read.
423
427
424
428
:::image type="content" source="media/search-get-started-portal-import-vectors/query-options.png" alt-text="Screenshot of the button for query options.":::
425
429
426
430
1. On the **View** menu, select **JSON view** so that you can enter text for your vector query in the `text` vector query parameter.
427
431
428
432
:::image type="content" source="media/search-get-started-portal-import-vectors/select-json-view.png" alt-text="Screenshot of the menu command for opening the JSON view.":::
429
433
430
-
The wizard offers a default query that issues a vector query on the `vector` field and returns the five nearest neighbors. If you opted to hide vector values, your default query includes a `select` statement that excludes the `vector` field from search results.
434
+
The default query is an empty search (`"*"`), but includes parameters for returning the number matches. It's a hybrid query that runs text and vector queries in parallel. It includes semantic ranking. It specifies which fields to return in the results through the `select` statement.
:::image type="content" source="media/search-get-started-portal-import-vectors/search-results.png" alt-text="Screenshot of search results.":::
451
481
452
-
Five matches should appear. Each document is a chunk of the original PDF. The `title` field shows which PDF the chunk comes from.
482
+
Each document is a chunk of the original PDF. The `title` field shows which PDF the chunk comes from. Each `chunk` is quite long. You can copy and paste one into a text editor to read the entire value.
453
483
454
-
1. To see all of the chunks from a specific document, add a filter for the `title` field for a specific PDF:
484
+
1. To see all of the chunks from a specific document, add a filter for the `title_parent_id` field for a specific PDF. You can check the **Fields** tab of your index to confirm this field is filterable.
0 commit comments