You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-get-started-portal-import-vectors.md
+57-14Lines changed: 57 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,22 +9,22 @@ ms.service: cognitive-search
9
9
ms.custom:
10
10
- ignite-2023
11
11
ms.topic: quickstart
12
-
ms.date: 11/29/2023
12
+
ms.date: 01/02/2024
13
13
---
14
14
15
15
# Quickstart: Integrated vectorization (preview)
16
16
17
-
> [!IMPORTANT]
17
+
> [!IMPORTANT]
18
18
> **Import and vectorize data** wizard is in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). It targets the [2023-10-01-Preview REST API](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true).
19
19
20
-
Get started with [integrated vectorization](vector-search-integrated-vectorization.md) using the **Import and vectorize data** wizard in the Azure portal.
20
+
Get started with [integrated vectorization (preview)](vector-search-integrated-vectorization.md) using the **Import and vectorize data** wizard in the Azure portal. This wizard calls an Azure OpenAI text embedding model to vectorize content during indexing and for queries.
21
21
22
22
In this preview version of the wizard:
23
23
24
24
+ Source data is blob only, using the default parsing mode (one search document per blob).
25
-
+ Index schema is non-configurable. Source fields include `content` (chunked and vectorized), `metadata_storage_name` for title, and a `metadata_storage_path` for the document key.
26
-
+ Vectorization is Azure OpenAI only, using the [HNSW](vector-search-ranking.md) algorithm with defaults.
27
-
+ Chunking is non-configurable. The effective settings are:
25
+
+ Index schema is nonconfigurable. Source fields include `content` (chunked and vectorized), `metadata_storage_name` for title, and a `metadata_storage_path` for the document key.
26
+
+ Vectorization is Azure OpenAI only (text-embedding-ada-002), using the [HNSW](vector-search-ranking.md) algorithm with defaults.
27
+
+ Chunking is nonconfigurable. The effective settings are:
28
28
29
29
```json
30
30
textSplitMode: "pages",
@@ -36,7 +36,7 @@ In this preview version of the wizard:
36
36
37
37
+ An Azure subscription. [Create one for free](https://azure.microsoft.com/free/).
38
38
39
-
+ Azure AI Search, in any region and on any tier. Most existing services support vector search. For a small subset of services created prior to January 2019, an index containing vector fields will fail on creation. In this situation, a new service must be created.
39
+
+ Azure AI Search, in any region and on any tier. Most existing services support vector search. For a small subset of services created prior to January 2019, an index containing vector fields fails on creation. In this situation, a new service must be created.
40
40
41
41
+[Azure OpenAI](https://aka.ms/oai/access) endpoint with a deployment of **text-embedding-ada-002** and an API key or [**Cognitive Services OpenAI User**](/azure/ai-services/openai/how-to/role-based-access-control#azure-openai-roles) permissions to upload data. You can only choose one vectorizer in this preview, and the vectorizer must be Azure OpenAI.
42
42
@@ -50,6 +50,12 @@ In this preview version of the wizard:
50
50
51
51
Many customers start with the free service. The free tier is limited to three indexes, three data sources, three skillsets, and three indexers. Make sure you have room for extra items before you begin. This quickstart creates one of each object.
52
52
53
+
## Check for semantic ranking
54
+
55
+
This wizard supports semantic ranking, but only on Basic tier and above, and only if semantic ranking is already [enabled on your search service](semantic-how-to-enable-disable.md). If you're using a billable tier, check to see if semantic ranking is enabled.
56
+
57
+
:::image type="content" source="media/search-get-started-portal-import-vectors/semantic-ranker-enabled.png" alt-text="Screenshot of the semantic ranker configuration page.":::
58
+
53
59
## Prepare sample data
54
60
55
61
This section points you to data that works for this quickstart.
@@ -91,7 +97,7 @@ To get started, browse to your Azure AI Search service in the Azure portal and o
91
97
92
98
The next step is to connect to a data source to use for the search index.
93
99
94
-
1. In the **Import data** wizard on the **Connect to your data** tab, expand the **Data Source** dropdown list and select **Azure Blob Storage**.
100
+
1. In the **Import and vectorize data** wizard on the **Connect to your data** tab, expand the **Data Source** dropdown list and select **Azure Blob Storage**.
95
101
96
102
1. Specify the Azure subscription, storage account, and container that provides the data.
97
103
@@ -139,17 +145,54 @@ Search explorer accepts text strings as input and then vectorizes the text for v
139
145
140
146
1. Select your index.
141
147
142
-
1.Make sure the API version is **2023-10-01-preview**.
148
+
1.Optionally, select **Query options** and hide vector values in search results. This step makes your search results easier to read.
143
149
144
-
1. Select **JSON view** so that you can enter text for your vector query in the **text** vector query parameter.
150
+
:::image type="content" source="media/search-get-started-portal-import-vectors/query-options.png" alt-text="Screenshot of the query options button.":::
145
151
146
-
1. Select **Search**.
152
+
1. Select **JSON view** so that you can enter text for your vector query in the **text** vector query parameter.
147
153
148
-
:::image type="content" source="media/search-get-started-portal-import-vectors/search-results.png" alt-text="Screenshot of search results.":::
154
+
:::image type="content" source="media/search-get-started-portal-import-vectors/select-json-view.png" alt-text="Screenshot of JSON selector.":::
155
+
156
+
This wizard offers a default query that issues a vector query on the "vector" field, returning the 5 nearest neighbors. If you opted to hide vector values, your default query includes a "select" statement that excludes the vector field from search results.
149
157
150
-
You should see 84 documents, where each document is a chunk of the original PDF. The title field shows which PDF the chunk comes from.
158
+
```json
159
+
{
160
+
"select": "chunk_id,parent_id,chunk,title",
161
+
"vectorQueries": [
162
+
{
163
+
"kind": "text",
164
+
"text": "*",
165
+
"k": 5,
166
+
"fields": "vector"
167
+
}
168
+
]
169
+
}
170
+
```
171
+
172
+
1. Replace the text `"*"` with a question related to health plans, such as *"which plan has the lowest deductible"*.
173
+
174
+
1. Select **Search** to run the query.
175
+
176
+
:::image type="content" source="media/search-get-started-portal-import-vectors/search-results.png" alt-text="Screenshot of search results.":::
151
177
152
-
The index definition isn't configurable so you can't filter by "title". To work around this limitation, you could define an index manually, making "title" filterable to get all of the chunks for a single document.
178
+
You should see 5 matches, where each document is a chunk of the original PDF. The title field shows which PDF the chunk comes from.
179
+
180
+
1. To see all of the chunks from a specific document, add a filter for the title field for a specific PDF:
0 commit comments