Skip to content

Commit 65aa519

Browse files
committed
OCR and image analysis data sources, standalone files and PDFs
1 parent 5930a6b commit 65aa519

7 files changed

+21
-6
lines changed

articles/search/.openpublishing.redirection.search.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
{
22
"redirections": [
3+
{
4+
"source_path_from_root": "/articles/search/cognitive-search-quickstart-blob.md",
5+
"redirect_url": "/azure/search/search-get-started-skillset",
6+
"redirect_document_id": true
7+
},
38
{
49
"source_path_from_root": "/articles/search/search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md",
510
"redirect_url": "/azure/search/search-how-to-index-sql-database",

articles/search/cognitive-search-skill-image-analysis.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,13 @@ This skill uses the machine learning models provided by [Azure AI Vision](/azure
2323
+ The file size of the image must be less than 4 megabytes (MB)
2424
+ The dimensions of the image must be greater than 50 x 50 pixels
2525

26+
Supported data sources for OCR and image analysis are blobs in Azure Blob Storage and Azure Data Lake Storage (ADLS) Gen2, and image content in OneLake. Images can be standalone files or embedded images in a PDF or other files.
27+
2628
This skill is implemented using the [AI Image Analysis API](/azure/ai-services/computer-vision/overview-image-analysis) version 3.2. If your solution requires calling a newer version of that service API (such as version 4.0), consider implementing through [Web API custom skill](cognitive-search-custom-skill-web-api.md).
2729

2830
> [!NOTE]
2931
> This skill is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. Execution of built-in skills is charged at the existing [Azure AI services pay-as-you go price](https://azure.microsoft.com/pricing/details/cognitive-services/).
30-
>
32+
>
3133
> In addition, image extraction is [billable by Azure AI Search](https://azure.microsoft.com/pricing/details/search/).
3234
>
3335

articles/search/cognitive-search-skill-ocr.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,14 +22,16 @@ An OCR skill uses the machine learning models provided by [Azure AI Vision](/azu
2222

2323
+ For Greek and Serbian Cyrillic, the legacy [OCR in version 3.2](https://github.com/Azure/azure-rest-api-specs/tree/master/specification/cognitiveservices/data-plane/ComputerVision/stable/v3.2) API is used.
2424

25-
The **OCR** skill extracts text from image files. Supported file formats include:
25+
The **OCR** skill extracts text from image files and embedded images. Supported file formats include:
2626

2727
+ .JPEG
2828
+ .JPG
2929
+ .PNG
3030
+ .BMP
3131
+ .TIFF
3232

33+
Supported data sources for OCR and image analysis are blobs in Azure Blob Storage and Azure Data Lake Storage (ADLS) Gen2, and image content in OneLake. Images can be standalone files or embedded images in a PDF or other files.
34+
3335
> [!NOTE]
3436
> This skill is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. Execution of built-in skills is charged at the existing [Azure AI services pay-as-you go price](https://azure.microsoft.com/pricing/details/cognitive-services/).
3537
>

articles/search/search-get-started-portal-import-vectors.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ The **Import and vectorize data** wizard supports the following data sources:
4242
+ [Azure SQL Database](/azure/azure-sql/database/single-database-create-quickstart), [Azure SQL Managed Instance](/azure/azure-sql/managed-instance/instance-create-quickstart), and Azure SQL Server virtual machines.
4343

4444
> [!NOTE]
45-
> This quicktart provides steps for just those data sources that work with whole files: Azure Blob storage, ADLS Gen2, OneLake. To use the wizard with other data sources, see [Azure Table indexer](search-howto-indexing-azure-tables.md), [Cosmos DB for NoSQL indexer](search-howto-index-cosmosdb.md), and [Azuer SQL indexer](search-how-to-index-sql-database.md).
45+
> This quicktart provides steps for just those data sources that work with whole files: Azure Blob sSazure-tables.md), [Cosmos DB for NoSQL indexer](search-howto-index-cosmosdb.md), and [Azuer SQL indexer](search-how-to-index-sql-database.md).
4646
4747
### Supported embedding models
4848

articles/search/cognitive-search-quickstart-blob.md renamed to articles/search/search-get-started-skillset.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.service: azure-ai-search
1010
ms.custom:
1111
- ignite-2023
1212
ms.topic: quickstart
13-
ms.date: 10/15/2024
13+
ms.date: 11/20/2024
1414
---
1515

1616
# Quickstart: Create a skillset in the Azure portal
@@ -84,6 +84,8 @@ If you get *Error detecting index schema from data source*, the indexer that pow
8484

8585
Next, configure AI enrichment to invoke OCR, image analysis, and natural language processing.
8686

87+
OCR and image analysis are available for blobs in Azure Blob Storage and Azure Data Lake Storage (ADLS) Gen2, and for image content in OneLake. Images can be standalone files or embedded images in a PDF or other files.
88+
8789
1. For this quickstart, we're using the **Free** Azure AI services resource. The sample data consists of 14 files, so the free allotment of 20 transactions on Azure AI services is sufficient for this quickstart.
8890

8991
:::image type="content" source="media/cognitive-search-quickstart-blob/cog-search-attach.png" alt-text="Screenshot of the Attach Azure AI services tab." border="true":::

articles/search/search-import-data-portal.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,9 @@ Microsoft hosts sample data so that you can omit a data source configuration ste
5959

6060
### Skills
6161

62-
The wizards generate skillset and output field mappings based on options you select. You can modify a skillset's JSON definition to add more skills later. Text Split and Text Merge are added for data chunking if you choose an embedding model, and for other skills if the granularity is set to pages or sentences. Shaper is added if you configure a knowledge store.
62+
The wizards generate skillset and output field mappings based on options you select. You can modify a skillset's JSON definition to add more skills later.
63+
64+
OCR and image analysis options are available for blobs and whole-file indexing in OneLake, assuming the default parsing mode. Shaper is added if you configure a knowledge store. Text Split and Text Merge are added for data chunking if you choose an embedding model, and for other skills if the granularity is set to pages or sentences.
6365

6466
| Skills | Import data wizard | Import and vectorize data wizard |
6567
|------|--------------------|----------------------------------|
@@ -68,9 +70,11 @@ The wizards generate skillset and output field mappings based on options you sel
6870
| [Azure Machine Learning (AI Studio model catalog)](cognitive-search-aml-skill.md) |||
6971
| [Document layout](cognitive-search-skill-document-intelligence-layout.md) |||
7072
| [Entity recognition](cognitive-search-skill-entity-recognition-v3.md) |||
73+
| [Image analysis (applies to blobs, default parsing, whole file indexing](cognitive-search-skill-image-analysis.md) |||
7174
| [Keyword extraction](cognitive-search-skill-keyphrases.md) |||
7275
| [Language detection](cognitive-search-skill-language-detection.md) |||
7376
| [Text translation](cognitive-search-skill-text-translation.md) |||
77+
| [OCR (applies to blobs, default parsing, whole file indexing)](cognitive-search-skill-ocr.md) |||
7478
| [PII detection](cognitive-search-skill-pii-detection.md) |||
7579
| [Sentiment analysis](cognitive-search-skill-sentiment.md) |||
7680
| [Shaper (applies to knowledge store)](cognitive-search-skill-shaper.md) |||

articles/search/toc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ items:
3737
- name: Create a demo app
3838
href: search-create-app-portal.md
3939
- name: Create a skillset
40-
href: cognitive-search-quickstart-blob.md
40+
href: search-get-started-skillset.md
4141
- name: Create a knowledge store
4242
href: knowledge-store-create-portal.md
4343
- name: Query with Search Explorer

0 commit comments

Comments
 (0)