Skip to content

Commit e02bc15

Browse files
authored
Merge pull request #262172 from HeidiSteen/heidist-gh2
[azure search] Addressed import vector wizard GH issues
2 parents 88e01ac + d7e5567 commit e02bc15

File tree

5 files changed

+57
-14
lines changed

5 files changed

+57
-14
lines changed
17 KB
Loading
-23.9 KB
Loading
92.7 KB
Loading
37 KB
Loading

articles/search/search-get-started-portal-import-vectors.md

Lines changed: 57 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -9,22 +9,22 @@ ms.service: cognitive-search
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: quickstart
12-
ms.date: 11/29/2023
12+
ms.date: 01/02/2024
1313
---
1414

1515
# Quickstart: Integrated vectorization (preview)
1616

17-
> [!IMPORTANT]
17+
> [!IMPORTANT]
1818
> **Import and vectorize data** wizard is in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). It targets the [2023-10-01-Preview REST API](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true).
1919
20-
Get started with [integrated vectorization](vector-search-integrated-vectorization.md) using the **Import and vectorize data** wizard in the Azure portal.
20+
Get started with [integrated vectorization (preview)](vector-search-integrated-vectorization.md) using the **Import and vectorize data** wizard in the Azure portal. This wizard calls an Azure OpenAI text embedding model to vectorize content during indexing and for queries.
2121

2222
In this preview version of the wizard:
2323

2424
+ Source data is blob only, using the default parsing mode (one search document per blob).
25-
+ Index schema is non-configurable. Source fields include `content` (chunked and vectorized), `metadata_storage_name` for title, and a `metadata_storage_path` for the document key.
26-
+ Vectorization is Azure OpenAI only, using the [HNSW](vector-search-ranking.md) algorithm with defaults.
27-
+ Chunking is non-configurable. The effective settings are:
25+
+ Index schema is nonconfigurable. Source fields include `content` (chunked and vectorized), `metadata_storage_name` for title, and a `metadata_storage_path` for the document key.
26+
+ Vectorization is Azure OpenAI only (text-embedding-ada-002), using the [HNSW](vector-search-ranking.md) algorithm with defaults.
27+
+ Chunking is nonconfigurable. The effective settings are:
2828

2929
```json
3030
textSplitMode: "pages",
@@ -36,7 +36,7 @@ In this preview version of the wizard:
3636

3737
+ An Azure subscription. [Create one for free](https://azure.microsoft.com/free/).
3838

39-
+ Azure AI Search, in any region and on any tier. Most existing services support vector search. For a small subset of services created prior to January 2019, an index containing vector fields will fail on creation. In this situation, a new service must be created.
39+
+ Azure AI Search, in any region and on any tier. Most existing services support vector search. For a small subset of services created prior to January 2019, an index containing vector fields fails on creation. In this situation, a new service must be created.
4040

4141
+ [Azure OpenAI](https://aka.ms/oai/access) endpoint with a deployment of **text-embedding-ada-002** and an API key or [**Cognitive Services OpenAI User**](/azure/ai-services/openai/how-to/role-based-access-control#azure-openai-roles) permissions to upload data. You can only choose one vectorizer in this preview, and the vectorizer must be Azure OpenAI.
4242

@@ -50,6 +50,12 @@ In this preview version of the wizard:
5050

5151
Many customers start with the free service. The free tier is limited to three indexes, three data sources, three skillsets, and three indexers. Make sure you have room for extra items before you begin. This quickstart creates one of each object.
5252

53+
## Check for semantic ranking
54+
55+
This wizard supports semantic ranking, but only on Basic tier and above, and only if semantic ranking is already [enabled on your search service](semantic-how-to-enable-disable.md). If you're using a billable tier, check to see if semantic ranking is enabled.
56+
57+
:::image type="content" source="media/search-get-started-portal-import-vectors/semantic-ranker-enabled.png" alt-text="Screenshot of the semantic ranker configuration page.":::
58+
5359
## Prepare sample data
5460

5561
This section points you to data that works for this quickstart.
@@ -91,7 +97,7 @@ To get started, browse to your Azure AI Search service in the Azure portal and o
9197

9298
The next step is to connect to a data source to use for the search index.
9399

94-
1. In the **Import data** wizard on the **Connect to your data** tab, expand the **Data Source** dropdown list and select **Azure Blob Storage**.
100+
1. In the **Import and vectorize data** wizard on the **Connect to your data** tab, expand the **Data Source** dropdown list and select **Azure Blob Storage**.
95101

96102
1. Specify the Azure subscription, storage account, and container that provides the data.
97103

@@ -139,17 +145,54 @@ Search explorer accepts text strings as input and then vectorizes the text for v
139145

140146
1. Select your index.
141147

142-
1. Make sure the API version is **2023-10-01-preview**.
148+
1. Optionally, select **Query options** and hide vector values in search results. This step makes your search results easier to read.
143149

144-
1. Select **JSON view** so that you can enter text for your vector query in the **text** vector query parameter.
150+
:::image type="content" source="media/search-get-started-portal-import-vectors/query-options.png" alt-text="Screenshot of the query options button.":::
145151

146-
1. Select **Search**.
152+
1. Select **JSON view** so that you can enter text for your vector query in the **text** vector query parameter.
147153

148-
:::image type="content" source="media/search-get-started-portal-import-vectors/search-results.png" alt-text="Screenshot of search results.":::
154+
:::image type="content" source="media/search-get-started-portal-import-vectors/select-json-view.png" alt-text="Screenshot of JSON selector.":::
155+
156+
This wizard offers a default query that issues a vector query on the "vector" field, returning the 5 nearest neighbors. If you opted to hide vector values, your default query includes a "select" statement that excludes the vector field from search results.
149157

150-
You should see 84 documents, where each document is a chunk of the original PDF. The title field shows which PDF the chunk comes from.
158+
```json
159+
{
160+
"select": "chunk_id,parent_id,chunk,title",
161+
"vectorQueries": [
162+
{
163+
"kind": "text",
164+
"text": "*",
165+
"k": 5,
166+
"fields": "vector"
167+
}
168+
]
169+
}
170+
```
171+
172+
1. Replace the text `"*"` with a question related to health plans, such as *"which plan has the lowest deductible"*.
173+
174+
1. Select **Search** to run the query.
175+
176+
:::image type="content" source="media/search-get-started-portal-import-vectors/search-results.png" alt-text="Screenshot of search results.":::
151177

152-
The index definition isn't configurable so you can't filter by "title". To work around this limitation, you could define an index manually, making "title" filterable to get all of the chunks for a single document.
178+
You should see 5 matches, where each document is a chunk of the original PDF. The title field shows which PDF the chunk comes from.
179+
180+
1. To see all of the chunks from a specific document, add a filter for the title field for a specific PDF:
181+
182+
```json
183+
{
184+
"select": "chunk_id,parent_id,chunk,title",
185+
"filter": "title eq 'Benefit_Options.pdf'",
186+
"count": true,
187+
"vectorQueries": [
188+
{
189+
"kind": "text",
190+
"text": "*",
191+
"k": 5,
192+
"fields": "vector"
193+
}
194+
]
195+
}
153196

154197
## Clean up
155198

0 commit comments

Comments
 (0)