You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-skill-image-analysis.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,11 +23,13 @@ This skill uses the machine learning models provided by [Azure AI Vision](/azure
23
23
+ The file size of the image must be less than 4 megabytes (MB)
24
24
+ The dimensions of the image must be greater than 50 x 50 pixels
25
25
26
+
Supported data sources for OCR and image analysis are blobs in Azure Blob Storage and Azure Data Lake Storage (ADLS) Gen2, and image content in OneLake. Images can be standalone files or embedded images in a PDF or other files.
27
+
26
28
This skill is implemented using the [AI Image Analysis API](/azure/ai-services/computer-vision/overview-image-analysis) version 3.2. If your solution requires calling a newer version of that service API (such as version 4.0), consider implementing through [Web API custom skill](cognitive-search-custom-skill-web-api.md).
27
29
28
30
> [!NOTE]
29
31
> This skill is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. Execution of built-in skills is charged at the existing [Azure AI services pay-as-you go price](https://azure.microsoft.com/pricing/details/cognitive-services/).
30
-
>
32
+
>
31
33
> In addition, image extraction is [billable by Azure AI Search](https://azure.microsoft.com/pricing/details/search/).
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-skill-ocr.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,14 +22,16 @@ An OCR skill uses the machine learning models provided by [Azure AI Vision](/azu
22
22
23
23
+ For Greek and Serbian Cyrillic, the legacy [OCR in version 3.2](https://github.com/Azure/azure-rest-api-specs/tree/master/specification/cognitiveservices/data-plane/ComputerVision/stable/v3.2) API is used.
24
24
25
-
The **OCR** skill extracts text from image files. Supported file formats include:
25
+
The **OCR** skill extracts text from image files and embedded images. Supported file formats include:
26
26
27
27
+ .JPEG
28
28
+ .JPG
29
29
+ .PNG
30
30
+ .BMP
31
31
+ .TIFF
32
32
33
+
Supported data sources for OCR and image analysis are blobs in Azure Blob Storage and Azure Data Lake Storage (ADLS) Gen2, and image content in OneLake. Images can be standalone files or embedded images in a PDF or other files.
34
+
33
35
> [!NOTE]
34
36
> This skill is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. Execution of built-in skills is charged at the existing [Azure AI services pay-as-you go price](https://azure.microsoft.com/pricing/details/cognitive-services/).
Copy file name to clipboardExpand all lines: articles/search/search-get-started-portal-import-vectors.md
+22-27Lines changed: 22 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.service: azure-ai-search
8
8
ms.custom:
9
9
- build-2024
10
10
ms.topic: quickstart
11
-
ms.date: 11/19/2024
11
+
ms.date: 11/20/2024
12
12
---
13
13
14
14
# Quickstart: Vectorize text and images by using the Azure portal
@@ -27,15 +27,15 @@ This quickstart helps you get started with [integrated vectorization](vector-sea
27
27
28
28
### Supported data sources
29
29
30
-
+[Azure Data Lake Storage (ADLS) Gen2](/azure/storage/blobs/create-data-lake-storage-account) (a storage account with a hierarchical namespace).
30
+
The **Import and vectorize data** wizard [supports a wide range of Azure data sources](search-import-data-portal.md#supported-data-sources-and-scenarios), but this quickstart provides steps for just those data sources that work with whole files:
31
31
32
-
+[Azure Storage](/azure/storage/common/storage-account-create) for blobs, files, and tables. Azure Storage must be a standard performance (general-purpose v2) account. Access tiers can be hot, cool, and cold.
32
+
+[Azure Blob Storage](search-howto-indexing-azure-blob-storage.md) for blobs and tables. Azure Storage must be a standard performance (general-purpose v2) account. Access tiers can be hot, cool, and cold.
33
33
34
-
+[Azure Cosmos DB](/azure/cosmos-db/nosql/quickstart-portal) for NoSQL, Mongo DB, and Apache Gremlin.
34
+
+[Azure Data Lake Storage (ADLS) Gen2](/azure/storage/blobs/create-data-lake-storage-account) (an Azure Storage account with a hierarchical namespace enabled). You can confirm that you have Data Lake Storage by checking the **Properties** tab on the **Overview** page.
35
35
36
-
+[Azure SQL Database](/azure/azure-sql/database/single-database-create-quickstart), [Azure SQL Managed Instance](/azure/azure-sql/managed-instance/instance-create-quickstart), and Azure SQL Server virtual machines.
36
+
:::image type="content" source="media/search-get-started-portal-import-vectors/data-lake-storage.png" alt-text="Screenshot of the storage account properties page showing Data Lake Storage.":::
@@ -45,9 +45,9 @@ Use an embedding model on an Azure AI platform in the [same region as Azure AI S
45
45
|---|---|
46
46
|[Azure OpenAI Service](https://aka.ms/oai/access)| text-embedding-ada-002, text-embedding-3-large, or text-embedding-3-small. |
47
47
|[Azure AI Studio model catalog](/azure/ai-studio/what-is-ai-studio)| Azure, Cohere, and Facebook embedding models. |
48
-
|[Azure AI services multi-service account](/azure/ai-services/multi-service-resource)|[Azure AI Vision multimodal](/azure/ai-services/computer-vision/how-to/image-retrieval) for image and text vectorization. Azure AI Vision multimodal is available in selected regions. [Check the documentation](/azure/ai-services/computer-vision/how-to/image-retrieval?tabs=csharp) for an updated list. **To use this resource, the account must be in an available region and in the same region as Azure AI Search**. |
48
+
|[Azure AI services multi-service account](/azure/ai-services/multi-service-resource)|[Azure AI Vision multimodal](/azure/ai-services/computer-vision/how-to/image-retrieval) for image and text vectorization. Azure AI Vision multimodal is available in selected regions. [Check the documentation](/azure/ai-services/computer-vision/how-to/image-retrieval?tabs=csharp) for an updated list. Depending on how you [attach the multi-service resource](cognitive-search-attach-cognitive-services.md), the account might need to be in the same region as Azure AI Search. |
49
49
50
-
If using the Azure OpenAI Service, it must have an associated [custom subdomain](/azure/ai-services/cognitive-services-custom-subdomains). If the service was created through the Azure portal, this subdomain is automatically generated as part of your service setup. Ensure that your service includes a custom subdomain before using it with the Azure AI Search integration.
50
+
If you use the Azure OpenAI Service, the endpoint must have an associated [custom subdomain](/azure/ai-services/cognitive-services-custom-subdomains). A custom subdomain is an endpoint that includes a unique name (for example, `https://hereismyuniquename.cognitiveservices.azure.com`). If the service was created through the Azure portal, this subdomain is automatically generated as part of your service setup. Ensure that your service includes a custom subdomain before using it with the Azure AI Search integration.
51
51
52
52
Azure OpenAI Service resources (with access to embedding models) that were created in AI Studio aren't supported. Only the Azure OpenAI Service resources created in the Azure portal are compatible with the **Azure OpenAI Embedding** skill integration.
53
53
@@ -57,9 +57,9 @@ For the purposes of this quickstart, all of the preceding resources must have pu
57
57
58
58
If private endpoints are already present and you can't disable them, the alternative option is to run the respective end-to-end flow from a script or program on a virtual machine. The virtual machine must be on the same virtual network as the private endpoint. [Here's a Python code sample](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-python/code/integrated-vectorization) for integrated vectorization. The same [GitHub repo](https://github.com/Azure/azure-search-vector-samples/tree/main) has samples in other programming languages.
59
59
60
-
### Role requirements
60
+
### Permissions
61
61
62
-
We recommend role assignments for search service connections to other resources.
62
+
You can use key authentication and full access connection strings, or Microsoft Entra ID with role assignments. We recommend role assignments for search service connections to other resources.
63
63
64
64
1. On Azure AI Search, [enable roles](search-security-enable-roles.md).
65
65
@@ -81,13 +81,9 @@ For more secure connections:
81
81
82
82
If you're starting with the free service, you're limited to three indexes, data sources, skillsets, and indexers. Basic limits you to 15. Make sure you have room for extra items before you begin. This quickstart creates one of each object.
83
83
84
-
### Check for semantic ranker
85
-
86
-
The wizard supports semantic ranking, but only on the Basic tier and higher, and only if semantic ranker is already [enabled on your search service](semantic-how-to-enable-disable.md). If you're using a billable tier, check whether semantic ranker is enabled.
87
-
88
84
## Prepare sample data
89
85
90
-
This section points you to data that works for this quickstart.
86
+
This section points you to the content that works for this quickstart.
@@ -111,10 +107,6 @@ This section points you to data that works for this quickstart.
111
107
112
108
1. Sign in to the [Azure portal](https://portal.azure.com/) with your Azure account, and go to your Azure Storage account.
113
109
114
-
1. You can confirm that you have Data Lake Storage by checking the **Properties** tab on the **Overview** page.
115
-
116
-
:::image type="content" source="media/search-get-started-portal-import-vectors/data-lake-storage.png" alt-text="Screenshot of the storage account properties page showing Data Lake Storage.":::
117
-
118
110
1. On the left pane, under **Data Storage**, select **Containers**.
119
111
120
112
1. Create a new container and then upload the [health-plan PDF documents](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/health-plan) used for this quickstart.
@@ -231,7 +223,7 @@ The next step is to connect to a data source to use for the search index.
Support for OneLake indexing is in preview. For more information about supported shortcuts and limitations, see ([OneLake indexing](search-how-to-index-onelake-files.md)).
287
279
288
-
1. On the **Set up your data connection** page, select **OneLake**.
280
+
1. On **Connect to your data**, select **OneLake**.
289
281
290
282
1. Specify the type of connection:
291
283
@@ -304,7 +296,7 @@ Support for OneLake indexing is in preview. For more information about supported
304
296
305
297
In this step, specify the embedding model for vectorizing chunked data.
306
298
307
-
Chunking is built-in and nonconfigurable. The effective settings are:
299
+
Chunking is builtin and nonconfigurable. The effective settings are:
308
300
309
301
```json
310
302
"textSplitMode": "pages",
@@ -336,13 +328,15 @@ Chunking is built-in and nonconfigurable. The effective settings are:
336
328
337
329
+ The identity should have a **Cognitive Services OpenAI User** role on the Azure AI multi-services account.
338
330
339
-
1. Select the checkbox that acknowledges the billing impact of using these resources.
331
+
1. Select the checkbox that acknowledges the billing effects of using these resources.
340
332
341
333
1. Select **Next**.
342
334
343
335
## Vectorize and enrich your images
344
336
345
-
If your content includes images, you can apply AI in two ways:
337
+
The health plan PDFs don't include images, so you can skip this step.
338
+
339
+
However, if you work with content that includes images, you can apply AI in two ways:
346
340
347
341
+ Use a supported image embedding model from the catalog, or choose the Azure AI Vision multimodal embeddings API to vectorize images.
348
342
@@ -358,7 +352,7 @@ Azure AI Search and your Azure AI resource must be in the same region.
358
352
359
353
1. Optionally, you can crack binary images (for example, scanned document files) and [use OCR](cognitive-search-skill-ocr.md) to recognize text.
360
354
361
-
1. Select the checkbox that acknowledges the billing impact of using these resources.
355
+
1. Select the checkbox that acknowledges the billing effects of using these resources.
362
356
363
357
1. Select **Next**.
364
358
@@ -470,6 +464,7 @@ Search Explorer accepts text strings as input and then vectorizes the text for v
Copy file name to clipboardExpand all lines: articles/search/search-get-started-portal.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -77,7 +77,7 @@ The **Import data** wizard supports the creation of a skillset and [AI-enrichmen
77
77
78
78
### Configure the index
79
79
80
-
The wizard infers a schema for the built-in hotels-sample index. Follow these steps to configure the index:
80
+
The wizard infers a schema for the built-in hotels-sample index. To configure the index, follow these steps:
81
81
82
82
1. Accept the system-generated values for the **Index name** (_hotels-sample-index_) and **Key** field (_HotelId_).
83
83
@@ -174,7 +174,7 @@ The following examples assume the JSON view and the 2024-05-01-preview REST API
174
174
175
175
### Filter examples
176
176
177
-
Parking, tags, renovation date, rating and location are filterable.
177
+
Parking, tags, renovation date, rating, and location are filterable.
178
178
179
179
```json
180
180
{
@@ -223,7 +223,7 @@ The default syntax is [simple syntax](query-simple-syntax.md), but if you want f
223
223
}
224
224
```
225
225
226
-
By default, misspelled query terms like `seatle` for `Seattle` fail to return matches in a typical search. The `queryType=full` parameter invokes the full Lucene query parser, which supports the tilde `~` operand. When these parameters are present, the query performs a fuzzy search for the specified keyword. The query seeks matching results along with results that are similar to but not an exact match to the keyword.
226
+
By default, misspelled query terms like `seatle` for `Seattle` fail to return matches in a typical search. The `queryType=full` parameter invokes the full Lucene query parser, which supports the tilde `~` operand. When these parameters are present, the query performs a fuzzy search for the specified keyword. The query matches on documents that are similar to but not an exact match to the keyword.
227
227
228
228
Take a minute to try a few of these example queries for your index. To learn more about queries, see [Querying in Azure AI Search](search-query-overview.md).
Copy file name to clipboardExpand all lines: articles/search/search-get-started-skillset.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ ms.service: azure-ai-search
10
10
ms.custom:
11
11
- ignite-2023
12
12
ms.topic: quickstart
13
-
ms.date: 10/15/2024
13
+
ms.date: 11/20/2024
14
14
---
15
15
16
16
# Quickstart: Create a skillset in the Azure portal
@@ -84,6 +84,8 @@ If you get *Error detecting index schema from data source*, the indexer that pow
84
84
85
85
Next, configure AI enrichment to invoke OCR, image analysis, and natural language processing.
86
86
87
+
OCR and image analysis are available for blobs in Azure Blob Storage and Azure Data Lake Storage (ADLS) Gen2, and for image content in OneLake. Images can be standalone files or embedded images in a PDF or other files.
88
+
87
89
1. For this quickstart, we're using the **Free** Azure AI services resource. The sample data consists of 14 files, so the free allotment of 20 transactions on Azure AI services is sufficient for this quickstart.
88
90
89
91
:::image type="content" source="media/cognitive-search-quickstart-blob/cog-search-attach.png" alt-text="Screenshot of the Attach Azure AI services tab." border="true":::
0 commit comments