Skip to content

Commit d0e20e2

Browse files
committed
Fix issue
1 parent 4d405ba commit d0e20e2

File tree

3 files changed

+14
-14
lines changed

3 files changed

+14
-14
lines changed

articles/search/cognitive-search-concept-image-scenarios.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -78,17 +78,17 @@ Metadata adjustments are captured in a complex type created for each image. You
7878
}
7979
```
8080

81-
1. Set `dataToExtract` to *contentAndMetadata* (required).
81+
1. Set `dataToExtract` to `contentAndMetadata` (required).
8282

8383
1. Verify that the `parsingMode` is set to *default* (required).
8484

8585
This parameter determines the granularity of search documents created in the index. The default mode sets up a one-to-one correspondence so that one blob results in one search document. If documents are large, or if skills require smaller chunks of text, you can add the Text Split skill that subdivides a document into paging for processing purposes. But for search scenarios, one blob per document is required if enrichment includes image processing.
8686

8787
1. Set `imageAction` to enable the `normalized_images` node in an enrichment tree (required):
8888

89-
+ *generateNormalizedImages* to generate an array of normalized images as part of document cracking.
89+
+ `generateNormalizedImages` to generate an array of normalized images as part of document cracking.
9090

91-
+ *generateNormalizedImagePerPage* (applies to PDF only) to generate an array of normalized images where each page in the PDF is rendered to one output image. For non-PDF files, the behavior of this parameter is similar as if you had set *generateNormalizedImages*. However, setting *generateNormalizedImagePerPage* can make indexing operation less performant by design (especially for large documents) since several images would have to be generated.
91+
+ `generateNormalizedImagePerPage` (applies to PDF only) to generate an array of normalized images where each page in the PDF is rendered to one output image. For non-PDF files, the behavior of this parameter is similar as if you had set `generateNormalizedImages`. However, setting `generateNormalizedImagePerPage` can make indexing operation less performant by design (especially for large documents) since several images would have to be generated.
9292

9393
1. Optionally, adjust the width or height of the generated normalized images:
9494

@@ -123,7 +123,7 @@ When `imageAction` is set to a value other than *none*, the new `normalized_imag
123123
| originalWidth | The original width of the image before normalization. |
124124
| originalHeight | The original height of the image before normalization. |
125125
| rotationFromOriginal | Counter-clockwise rotation in degrees that occurred to create the normalized image. A value between 0 degrees and 360 degrees. This step reads the metadata from the image that is generated by a camera or scanner. Usually a multiple of 90 degrees. |
126-
| contentOffset | The character offset within the content field where the image was extracted from. This field is only applicable for files with embedded images. The *contentOffset* for images extracted from PDF documents is always at the end of the text on the page it was extracted from in the document. This means images appear after all text on that page, regardless of the original location of the image in the page. |
126+
| contentOffset | The character offset within the content field where the image was extracted from. This field is only applicable for files with embedded images. The `contentOffset` for images extracted from PDF documents is always at the end of the text on the page it was extracted from in the document. This means images appear after all text on that page, regardless of the original location of the image in the page. |
127127
| pageNumber | If the image was extracted or rendered from a PDF, this field contains the page number in the PDF it was extracted or rendered from, starting from 1. If the image isn't from a PDF, this field is 0. |
128128

129129
Sample value of `normalized_images`:
@@ -226,7 +226,7 @@ In a skillset, Image Analysis and OCR skill output is always text. Output text i
226226

227227
1. [Create or update a search index](/rest/api/searchservice/indexes/create-or-update) to add fields to accept the skill outputs.
228228

229-
In the following fields collection example, *content* is blob content. *Metadata_storage_name* contains the name of the file (make `retrievable` is set to *true*). *Metadata_storage_path* is the unique path of the blob and is the default document key. *Merged_content* is output from Text Merge (useful when images are embedded).
229+
In the following fields collection example, *content* is blob content. *Metadata_storage_name* contains the name of the file (set `retrievable` to *true*). *Metadata_storage_path* is the unique path of the blob and is the default document key. *Merged_content* is output from Text Merge (useful when images are embedded).
230230

231231
*Text* and *layoutText* are OCR skill outputs and must be a string collection in order to the capture all of the OCR-generated output for the entire document.
232232

@@ -411,17 +411,17 @@ The following workflow outlines the process of image extraction, analysis, mergi
411411

412412
1. Images in the queue are [normalized](#get-normalized-images) and passed into enriched documents as a [document/normalized_images](#get-normalized-images) node.
413413

414-
1. Image enrichments execute, using *"/document/normalized_images"* as input.
414+
1. Image enrichments execute, using `"/document/normalized_images"` as input.
415415

416416
1. Image outputs are passed into the enriched document tree, with each output as a separate node. Outputs vary by skill (text and layoutText for OCR; tags and captions for Image Analysis).
417417

418418
1. Optional but recommended if you want search documents to include both text and image-origin text together, [Text Merge](cognitive-search-skill-textmerger.md) runs, combining the text representation of those images with the raw text extracted from the file. Text chunks are consolidated into a single large string, where the text is inserted first in the string and then the OCR text output or image tags and captions.
419419

420-
The output of Text Merge is now the definitive text to analyze for any downstream skills that perform text processing. For example, if your skillset includes both OCR and Entity Recognition, the input to Entity Recognition should be *"document/merged_text"* (the targetName of the Text Merge skill output).
420+
The output of Text Merge is now the definitive text to analyze for any downstream skills that perform text processing. For example, if your skillset includes both OCR and Entity Recognition, the input to Entity Recognition should be `"document/merged_text"` (the targetName of the Text Merge skill output).
421421

422422
1. After all skills have executed, the enriched document is complete. In the last step, indexers refer to [output field mappings](#output-field-mappings) to send enriched content to individual fields in the search index.
423423

424-
The following example skillset creates a *merged_text* field containing the original text of your document with embedded OCRed text in place of embedded images. It also includes an Entity Recognition skill that uses *merged_text* as input.
424+
The following example skillset creates a `merged_text` field containing the original text of your document with embedded OCRed text in place of embedded images. It also includes an Entity Recognition skill that uses `merged_text` as input.
425425

426426
### Request body syntax
427427

@@ -492,7 +492,7 @@ The following example skillset creates a *merged_text* field containing the orig
492492
}
493493
```
494494

495-
Now that you have a *merged_text* field, you can map it as a searchable field in your indexer definition. All of the content of your files, including the text of the images, will be searchable.
495+
Now that you have a `merged_text` field, you can map it as a searchable field in your indexer definition. All of the content of your files, including the text of the images, will be searchable.
496496

497497
## Scenario: Visualize bounding boxes
498498

articles/search/cognitive-search-output-field-mapping.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ ms.date: 10/15/2024
1313

1414
# Map enriched output to fields in a search index in Azure AI Search
1515

16-
:::image type="content" source="media/cognitive-search-output-field-mapping/indexer-stages-output-field-mapping.pn" alt-text="Diagram of the Indexer Stages with Output Field Mappings highlighted.":::
16+
:::image type="content" source="media/cognitive-search-output-field-mapping/indexer-stages-output-field-mapping.png" alt-text="Diagram of the Indexer Stages with Output Field Mappings highlighted.":::
1717

1818
This article explains how to set up *output field mappings*, defining a data path between in-memory data generated during [skillset processing](cognitive-search-concept-intro.md), and target fields in a search index. During indexer execution, skills-generated information exists in memory only. To persist this information in a search index, you need to tell the indexer where to send the data.
1919

articles/search/cognitive-search-quickstart-blob.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ ms.date: 10/15/2024
1515

1616
# Quickstart: Create a skillset in the Azure portal
1717

18-
In this quickstart, you learn how a skillset in Azure AI Search adds Optical Character Recognition (OCR), image analysis, language detection, text translation, and entity recognition to generate text-searchable content in a search index.
18+
In this quickstart, you learn how a skillset in Azure AI Search adds optical character recognition (OCR), image analysis, language detection, text translation, and entity recognition to generate text-searchable content in a search index.
1919

2020
You can run the **Import data** wizard in the Azure portal to apply skills that create and transform textual content during indexing. Input is your raw data, usually blobs in Azure Storage. Output is a searchable index containing AI-generated image text, captions, and entities. Generated content is queryable in the portal using [**Search explorer**](search-explorer.md).
2121

@@ -27,7 +27,7 @@ To prepare, you create a few resources and upload sample files before running th
2727

2828
+ Create an [Azure AI Search service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices). You can use a free service for this quickstart.
2929

30-
+ Azure Storage account with Blob Storage.
30+
+ An Azure Storage account with Azure Blob Storage.
3131

3232
> [!NOTE]
3333
> This quickstart uses [Azure AI services](https://azure.microsoft.com/services/cognitive-services/) for the AI transformations. Because the workload is so small, Azure AI services is tapped behind the scenes for free processing for up to 20 transactions. You can complete this exercise without having to create an Azure AI multi-service resource.
@@ -190,7 +190,7 @@ If you used a free service, remember that you're limited to three indexes, index
190190

191191
## Next step
192192

193-
You can create skillsets using the portal, .NET SDK, or REST API. To further your knowledge, try the REST API using a REST client and more sample data.
193+
You can create skillsets using the portal, .NET SDK, or REST API. To further your knowledge, try the REST API by using a REST client and more sample data.
194194

195195
> [!div class="nextstepaction"]
196-
> [Tutorial: Extract text and structure from JSON blobs using REST APIs ](cognitive-search-tutorial-blob.md)
196+
> [Tutorial: Use skillsets to generate searchable content in Azure AI Search](cognitive-search-tutorial-blob.md)

0 commit comments

Comments
 (0)