Skip to content

Commit db86ea4

Browse files
committed
updated image how-to with OCR exception info
1 parent d4bfea4 commit db86ea4

File tree

1 file changed

+23
-8
lines changed

1 file changed

+23
-8
lines changed

articles/search/cognitive-search-concept-image-scenarios.md

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -35,28 +35,28 @@ Optionally, you can define projections to accept image-analyzed output into a [k
3535

3636
## Set up source files
3737

38-
Image processing is indexer-driven, which means that the raw inputs must be a supported file type (as determined by the skills you choose) from a [supported data source](search-indexer-overview.md#supported-data-sources).
38+
Image processing is indexer-driven, which means that the raw inputs must be in a [supported data source](search-indexer-overview.md#supported-data-sources).
3939

4040
+ Image analysis supports JPEG, PNG, GIF, and BMP
4141
+ OCR supports JPEG, PNG, BMP, and TIF
4242

4343
Images are either standalone binary files or embedded in documents (PDF, RTF, and Microsoft application files). A maximum of 1000 images will be extracted from a given document. If there are more than 1000 images in a document, the first 1000 will be extracted and a warning will be generated.
4444

45-
Azure Blob Storage is the most frequently used storage for image processing in Cognitive Search. There are three main tasks related to retrieving images from the source:
45+
Azure Blob Storage is the most frequently used storage for image processing in Cognitive Search. There are three main tasks related to retrieving images from a blob container:
4646

47-
+ Access rights on the container. If you're using a full access connection string that includes a key, the key gives you access to the content. Alternatively, you can [authenticate using Azure Active Directory (Azure AD)](search-howto-managed-identities-data-sources.md) or [connect as a trusted service](search-indexer-howto-access-trusted-service-exception.md).
47+
+ Enable access to content in the container. If you're using a full access connection string that includes a key, the key gives you permission to the content. Alternatively, you can [authenticate using Azure Active Directory (Azure AD)](search-howto-managed-identities-data-sources.md) or [connect as a trusted service](search-indexer-howto-access-trusted-service-exception.md).
4848

4949
+ [Create a data source](search-howto-indexing-azure-blob-storage.md) of type "azureblob" that connects to the blob container storing your files.
5050

51-
+ Optionally, [set file type criteria](search-blob-storage-integration.md#PartsOfBlobToIndex) if the workload targets a specific file type. Blob indexer configuration includes file inclusion and exclusion settings. You can filter out files you don't want.
51+
+ Review [service tier limits](search-limits-quotas-capacity.md) to make sure that your source data is under maximum size and quantity limits for indexers and enrichment.
5252

5353
<a name="get-normalized-images"></a>
5454

5555
## Configure indexers for image processing
5656

57-
Image extraction is the first step of indexer processing. Extracted images are queued for image processing. Extracted text is queued for text processing, if applicable.
57+
Extracting images from the source content files is the first step of indexer processing. Extracted images are queued for image processing. Extracted text is queued for text processing, if applicable.
5858

59-
Image processing requires image normalization to make images more uniform for downstream processing. This step occurs automatically and is internal to indexer processing. As a developer, you enable image normalization by setting the `"imageAction"` parameter in indexer configuration.
59+
Image processing requires image normalization to make images more uniform for downstream processing. This second step occurs automatically and is internal to indexer processing. As a developer, you enable image normalization by setting the `"imageAction"` parameter in indexer configuration.
6060

6161
Image normalization includes the following operations:
6262

@@ -90,7 +90,7 @@ Metadata adjustments are captured in a complex type created for each image. You
9090
1. Set `"imageAction"` to enable the *normalized_images* node in an enrichment tree (required):
9191

9292
+ `"generateNormalizedImages"` to generate an array of normalized images as part of document cracking.
93-
93+
9494
+ `"generateNormalizedImagePerPage"` (applies to PDF only) to generate an array of normalized images where each page in the PDF is rendered to one output image. For non-PDF files, the behavior of this parameter is same as if you had set "generateNormalizedImages".
9595

9696
1. Optionally, adjust the width or height of the generated normalized images:
@@ -101,6 +101,19 @@ Metadata adjustments are captured in a complex type created for each image. You
101101

102102
The default of 2000 pixels for the normalized images maximum width and height is based on the maximum sizes supported by the [OCR skill](cognitive-search-skill-ocr.md) and the [image analysis skill](cognitive-search-skill-image-analysis.md). The [OCR skill](cognitive-search-skill-ocr.md) supports a maximum width and height of 4200 for non-English languages, and 10000 for English. If you increase the maximum limits, processing could fail on larger images depending on your skillset definition and the language of the documents.
103103

104+
+ Optionally, [set file type criteria](search-blob-storage-integration.md#PartsOfBlobToIndex) if the workload targets a specific file type. Blob indexer configuration includes file inclusion and exclusion settings. You can filter out files you don't want.
105+
106+
```json
107+
{
108+
"parameters" : {
109+
"configuration" : {
110+
"indexedFileNameExtensions" : ".pdf, .docx",
111+
"excludedFileNameExtensions" : ".png, .jpeg"
112+
}
113+
}
114+
}
115+
```
116+
104117
### About normalized images
105118

106119
When "imageAction" is set to a value other than "none", the new *normalized_images* field will contain an array of images. Each image is a complex type that has the following members:
@@ -143,6 +156,8 @@ This section supplements the [skill reference](cognitive-search-predefined-skill
143156

144157
1. If necessary, [include multi-service key](cognitive-search-attach-cognitive-services.md) in the Cognitive Services property of the skillset. Cognitive Search makes calls to a billable Azure Cognitive Services resource for OCR and image analysis for transactions that exceed the free limit (20 per indexer per day). Cognitive Services must be in the same region as your search service.
145158

159+
1. If original images are embedded in PDF or application files like PPTX or DOCX, you'll need to add a Text Merge skill if you want image output and text output together. Working with embedded images is discussed further on in this article.
160+
146161
Once the basic framework of your skillset is created and Cognitive Services is configured, you can focus on each individual image skill, defining inputs and source context, and mapping outputs to fields in either an index or knowledge store.
147162

148163
> [!NOTE]
@@ -391,7 +406,7 @@ Image analysis output is illustrated in the JSON below (search result). The skil
391406

392407
When the images you want to process are embedded in other files, such as PDF or DOCX, the enrichment pipeline will extract just the images and then pass them to OCR or image analysis for processing. Separation of image from text content occurs during the document cracking phase, and once the images are separated, they remain separate unless you explicitly merge the processed output back into the source text.
393408

394-
[**Text Merge**](cognitive-search-skill-textmerger.md) is used to put image processing output back into the document. Although Text Merge is not a hard requirement, it's frequently invoked so that image output (OCR text, OCR layoutText, image tags, image captions) can be reintroduced into the document at the same location where the image was found. Essentially, the goal is to replace an embedded binary image with an in-place text equivalent.
409+
[**Text Merge**](cognitive-search-skill-textmerger.md) is used to put image processing output back into the document. Although Text Merge is not a hard requirement, it's frequently invoked so that image output (OCR text, OCR layoutText, image tags, image captions) can be reintroduced into the document. Depending on the skill, the image output replaces an embedded binary image with an in-place text equivalent. Image Analysis output can be merged at image location. OCR output always appears at the end of each page.
395410

396411
The following workflow outlines the process of image extraction, analysis, merging, and how to extend the pipeline to push image-processed output into other text-based skills such as Entity Recognition or Text Translation.
397412

0 commit comments

Comments
 (0)