Skip to content

Commit 14de7d8

Browse files
committed
Refresh quickstart for BYOE
1 parent 4135a6f commit 14de7d8

File tree

4 files changed

+55
-25
lines changed

4 files changed

+55
-25
lines changed
218 KB
Loading
77.6 KB
Loading
55.4 KB
Loading

articles/search/search-get-started-portal-import-vectors.md

Lines changed: 55 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.custom:
99
- build-2024
1010
- ignite-2024
1111
ms.topic: quickstart
12-
ms.date: 11/20/2024
12+
ms.date: 11/22/2024
1313
---
1414

1515
# Quickstart: Vectorize text and images by using the Azure portal
@@ -22,7 +22,7 @@ This quickstart helps you get started with [integrated vectorization](vector-sea
2222

2323
+ [An Azure AI Search service](search-create-service-portal.md) in the same region as Azure AI. We recommend the Basic tier or higher.
2424

25-
+ [A supported data source](#supported-data-sources).
25+
+ [A supported data source](#supported-data-sources) with the [Health Plan PDF](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/health-plan) sample documents.
2626

2727
+ [A supported embedding model](#supported-embedding-models).
2828

@@ -333,19 +333,21 @@ Chunking is built in and nonconfigurable. The effective settings are:
333333

334334
1. Select the checkbox that acknowledges the billing effects of using these resources.
335335

336+
:::image type="content" source="media/search-get-started-portal-import-vectors/vectorize-text.png" alt-text="Screenshot of the vectorize text page in the wizard.":::
337+
336338
1. Select **Next**.
337339

338340
## Vectorize and enrich your images
339341

340-
The health plan PDFs don't include images, so you can skip this step.
342+
The health plan PDFs include a corporate logo, but otherwise there are no images. You can skip this step if you're using the sample documents.
341343

342-
However, if you work with content that includes images, you can apply AI in two ways:
344+
However, if you work with content that includes useful images, you can apply AI in two ways:
343345

344346
+ Use a supported image embedding model from the catalog, or choose the Azure AI Vision multimodal embeddings API to vectorize images.
345347

346348
+ Use optical character recognition (OCR) to recognize text in images. This option invokes the [OCR skill](cognitive-search-skill-ocr.md) to read text from images.
347349

348-
Azure AI Search and your Azure AI resource must be in the same region.
350+
Azure AI Search and your Azure AI resource must be in the same region or configured for [keyless billing connections](cognitive-search-attach-cognitive-services.md).
349351

350352
1. On the **Vectorize your images** page, specify the kind of connection the wizard should make. For image vectorization, the wizard can connect to embedding models in Azure AI Studio or Azure AI Vision.
351353

@@ -357,6 +359,8 @@ Azure AI Search and your Azure AI resource must be in the same region.
357359

358360
1. Select the checkbox that acknowledges the billing effects of using these resources.
359361

362+
:::image type="content" source="media/search-get-started-portal-import-vectors/vectorize-images.png" alt-text="Screenshot of the vectorize images page in the wizard.":::
363+
360364
1. Select **Next**.
361365

362366
## Add semantic ranking
@@ -371,12 +375,12 @@ Key points about this step:
371375
+ You can add fields, but you can't delete or modify generated fields.
372376
+ Document parsing mode creates chunks (one search document per chunk).
373377

374-
On the **Advanced settings** page, you can optionally add new fields. By default, the wizard generates the following fields with these attributes:
378+
On the **Advanced settings** page, you can optionally add new fields assuming the data source provides metadata or fields that aren't picked up on the first pass. By default, the wizard generates the following fields with these attributes:
375379

376380
| Field | Applies to | Description |
377381
|-------|------------|-------------|
378382
| chunk_id | Text and image vectors | Generated string field. Searchable, retrievable, sortable. This is the document key for the index. |
379-
| parent_id | Text vectors | Generated string field. Retrievable, filterable. Identifies the parent document from which the chunk originates. |
383+
| text_parent_id | Text vectors | Generated string field. Retrievable, filterable. Identifies the parent document from which the chunk originates. |
380384
| chunk | Text and image vectors | String field. Human readable version of the data chunk. Searchable and retrievable, but not filterable, facetable, or sortable. |
381385
| title | Text and image vectors | String field. Human readable document title or page title or page number. Searchable and retrievable, but not filterable, facetable, or sortable. |
382386
| text_vector | Text vectors | Collection(Edm.single). Vector representation of the chunk. Searchable and retrievable, but not filterable, facetable, or sortable.|
@@ -419,51 +423,77 @@ Search Explorer accepts text strings as input and then vectorizes the text for v
419423

420424
1. In the Azure portal, go to **Search Management** > **Indexes**, and then select the index that you created.
421425

422-
1. Optionally, select **Query options** and hide vector values in search results. This step makes your search results easier to read.
426+
1. Select **Query options** and hide vector values in search results. This step makes your search results easier to read.
423427

424428
:::image type="content" source="media/search-get-started-portal-import-vectors/query-options.png" alt-text="Screenshot of the button for query options.":::
425429

426430
1. On the **View** menu, select **JSON view** so that you can enter text for your vector query in the `text` vector query parameter.
427431

428432
:::image type="content" source="media/search-get-started-portal-import-vectors/select-json-view.png" alt-text="Screenshot of the menu command for opening the JSON view.":::
429433

430-
The wizard offers a default query that issues a vector query on the `vector` field and returns the five nearest neighbors. If you opted to hide vector values, your default query includes a `select` statement that excludes the `vector` field from search results.
434+
The default query is an empty search (`"*"`), but includes parameters for returning the number matches. It's a hybrid query that runs text and vector queries in parallel. It includes semantic ranking. It specifies which fields to return in the results through the `select` statement.
431435

432436
```json
433-
{
434-
"select": "chunk_id,parent_id,chunk,title",
437+
{
438+
"search": "*",
439+
"count": true,
435440
"vectorQueries": [
436-
{
437-
"kind": "text",
438-
"text": "*",
439-
"k": 5,
440-
"fields": "vector"
441-
}
442-
]
443-
}
441+
{
442+
"kind": "text",
443+
"text": "*",
444+
"fields": "text_vector,image_vector"
445+
}
446+
],
447+
"queryType": "semantic",
448+
"semanticConfiguration": "my-demo-semantic-configuration",
449+
"captions": "extractive",
450+
"answers": "extractive|count-3",
451+
"queryLanguage": "en-us",
452+
"select": "chunk_id,text_parent_id,chunk,title,image_parent_id"
453+
}
444454
```
445455

446-
1. For the `text` value, replace the asterisk (`*`) with a question related to health plans, such as `Which plan has the lowest deductible?`.
456+
1. Replace both asterisk (`*`) placeholders with a question related to health plans, such as `Which plan has the lowest deductible?`.
457+
458+
```json
459+
{
460+
"search": "Which plan has the lowest deductible?",
461+
"count": true,
462+
"vectorQueries": [
463+
{
464+
"kind": "text",
465+
"text": "Which plan has the lowest deductible?",
466+
"fields": "text_vector,image_vector"
467+
}
468+
],
469+
"queryType": "semantic",
470+
"semanticConfiguration": "my-demo-semantic-configuration",
471+
"captions": "extractive",
472+
"answers": "extractive|count-3",
473+
"queryLanguage": "en-us",
474+
"select": "chunk_id,text_parent_id,chunk,title"
475+
}
476+
```
447477

448478
1. Select **Search** to run the query.
449479

450480
:::image type="content" source="media/search-get-started-portal-import-vectors/search-results.png" alt-text="Screenshot of search results.":::
451481

452-
Five matches should appear. Each document is a chunk of the original PDF. The `title` field shows which PDF the chunk comes from.
482+
Each document is a chunk of the original PDF. The `title` field shows which PDF the chunk comes from. Each `chunk` is quite long. You can copy and paste one into a text editor to read the entire value.
453483

454-
1. To see all of the chunks from a specific document, add a filter for the `title` field for a specific PDF:
484+
1. To see all of the chunks from a specific document, add a filter for the `title_parent_id` field for a specific PDF. You can check the **Fields** tab of your index to confirm this field is filterable.
455485

456486
```json
457487
{
458-
"select": "chunk_id,parent_id,chunk,title",
459-
"filter": "title eq 'Benefit_Options.pdf'",
488+
"select": "chunk_id,text_parent_id,chunk,title",
489+
"filter": "text_parent_id eq 'aHR0cHM6Ly9oZWlkaXN0c3RvcmFnZWRlbW9lYXN0dXMuYmxvYi5jb3JlLndpbmRvd3MubmV0L2hlYWx0aC1wbGFuLXBkZnMvTm9ydGh3aW5kX1N0YW5kYXJkX0JlbmVmaXRzX0RldGFpbHMucGRm0'",
460490
"count": true,
461491
"vectorQueries": [
462492
{
463493
"kind": "text",
464494
"text": "*",
465495
"k": 5,
466-
"fields": "vector"
496+
"fields": "text_vector"
467497
}
468498
]
469499
}

0 commit comments

Comments
 (0)