Updated OCR module (and other minor computer vision fixes)

GraemeMalcolm · GraemeMalcolm · commit 6e1cbf2536df · 2025-05-02T13:29:57.000-07:00
diff --git a/learn-pr/paths/create-computer-vision-solutions-azure-ai/index.yml b/learn-pr/paths/create-computer-vision-solutions-azure-ai/index.yml
@@ -29,10 +29,10 @@ subjects:
 - artificial-intelligence
 modules:
 - learn.wwl.analyze-images
+- learn.wwl.read-text-images-documents-with-computer-vision-service
+- learn.wwl.detect-analyze-recognize-faces
 - learn.wwl.classify-images
 - learn.wwl.detect-objects-images
-- learn.wwl.detect-analyze-recognize-faces
-- learn.wwl.read-text-images-documents-with-computer-vision-service
 - learn.wwl.analyze-video
 - learn.wwl.develop-generative-ai-vision-apps
 - learn.wwl.generate-images-azure-openai
diff --git a/learn-pr/wwl-data-ai/analyze-images/includes/3-analyze-image.md b/learn-pr/wwl-data-ai/analyze-images/includes/3-analyze-image.md
@@ -24,9 +24,8 @@ client = ImageAnalysisClient(
 
 result = client.analyze(
     image_data=<IMAGE_DATA_BYTES>, # Binary data from your image file
-    visual_features=[VisualFeatures.CAPTION, VisualFeatures.READ],
+    visual_features=[VisualFeatures.CAPTION, VisualFeatures.TAGS],
     gender_neutral_caption=True,
-    language="en",
 )
 ```
 
@@ -58,7 +57,7 @@ ImageAnalysisClient client = new ImageAnalysisClient(
 
 ImageAnalysisResult result = client.Analyze(
     <IMAGE_DATA_BYTES>, // Binary data from your image file
-    VisualFeatures.Caption | VisualFeatures.Read,
+    VisualFeatures.Caption | VisualFeatures.Tags,
     new ImageAnalysisOptions { GenderNeutralCaption = true });
 ```
 
@@ -76,7 +75,7 @@ Available visual features are contained in the `VisualFeatures` enumeration:
 - VisualFeatures.People: Returns the bounding box for detected people
 - VisualFeatures.SmartCrops: Returns the bounding box of the specified aspect ratio for the area of interest
 - VisualFeatures.Read: Extracts readable text
-- 
+
 ::: zone-end
 
 Specifying the visual features you want analyzed in the image determines what information the response will contain. Most responses will contain a bounding box (if a location in the image is reasonable) or a confidence score (for features such as tags or captions).
diff --git a/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/4-use-read-api.yml b/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/4-use-read-api.yml
@@ -1,18 +1,19 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.read-text-images-documents-with-computer-vision-service.using-read-api
-title: Use the Read API
-metadata:
-  title: Use the Read API
-  description: Use the Read API
-  author: wwlpublish
-  ms.author: berryivor
-  ms.date: 02/05/2024
-  ms.topic: unit
-  ms.collection:
-    - wwl-ai-copilot
-azureSandbox: false
-labModal: false
-durationInMinutes: 3
-content: |
-  [!include[](includes/4-use-read-api.md)]
-
+### YamlMime:ModuleUnit
+uid: learn.wwl.read-text-images-documents-with-computer-vision-service.using-read-api
+title: Use the Read API
+metadata:
+  title: Use the Read API
+  description: Use the Read API
+  author: wwlpublish
+  ms.author: berryivor
+  ms.date: 02/05/2024
+  ms.topic: unit
+  ms.collection:
+    - wwl-ai-copilot
+  zone_pivot_groups: dev-lang-csharp-python
+azureSandbox: false
+labModal: false
+durationInMinutes: 6
+content: |
+  [!include[](includes/4-use-read-api.md)]
+
diff --git a/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/5-exercise.yml b/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/5-exercise.yml
@@ -1,19 +1,16 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.read-text-images-documents-with-computer-vision-service.exercise
-title: Exercise - Read text in images
-metadata:
-  title: Exercise - Read text in images
-  description: Exercise - Read text in images
-  author: wwlpublish
-  ms.author: berryivor
-  ms.date: 02/05/2024
-  ms.topic: unit
-  ms.collection:
-    - wwl-ai-copilot
-azureSandbox: false
-labId: 152408
-labModal: true
-durationInMinutes: 20
-content: |
-  [!include[](includes/5-exercise.md)]
-
+### YamlMime:ModuleUnit
+uid: learn.wwl.read-text-images-documents-with-computer-vision-service.exercise
+title: Exercise - Read text in images
+metadata:
+  title: Exercise - Read text in images
+  description: Exercise - Read text in images
+  author: wwlpublish
+  ms.author: berryivor
+  ms.date: 02/05/2024
+  ms.topic: unit
+  ms.collection:
+    - wwl-ai-copilot
+durationInMinutes: 30
+content: |
+  [!include[](includes/5-exercise.md)]
+
diff --git a/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/6-knowledge-check.yml b/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/6-knowledge-check.yml
@@ -1,53 +1,49 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.read-text-images-documents-with-computer-vision-service.knowledge-check
-title: Module assessment
-metadata:
-  title: Module assessment
-  description: Knowledge Check
-  author: wwlpublish
-  ms.author: berryivor
-  ms.date: 02/05/2024
-  ms.topic: unit
-  ms.collection:
-    - wwl-ai-copilot
-azureSandbox: false
-labModal: false
-durationInMinutes: 4
-content: |
-  [!include[](includes/6-knowledge-check.md)]
-quiz:
-  questions:
-  - content: "Which API would be best for this scenario? You need to read a large number of files with high accuracy. The text is short sections of handwritten text, some in English and some of it is in multiple languages."
-    choices:
-    - content: "A custom Language API"
-      isCorrect: false
-      explanation: "Incorrect: Azure AI Language custom models aren't able to perform OCR."
-    - content: "Document Intelligence API"
-      isCorrect: false
-      explanation: "Incorrect: Document Intelligence is the best choice for large amounts of structured text and multiple languages, however isn't the best choice for shorter, unstructured handwritten text."
-    - content: "Image Analysis API"
-      isCorrect: true
-      explanation: "Correct: The Image Analysis service OCR feature is best suited for short sections of handwritten text."
-  - content: "What levels of division are the OCR results returned?"
-    choices:
-    - content: "Only total content and pages of text."
-      isCorrect: false
-      explanation: "Incorrect: Results contain blocks, words and lines, as well as bounding boxes for each word and line."
-    - content: "Blocks, words and lines of text."
-      isCorrect: true
-      explanation: "Correct: Results contain blocks, words and lines, as well as bounding boxes for each word and line."
-    - content: "Total content, image tags, pages, words and lines of text."
-      isCorrect: false
-      explanation: "Incorrect: Results contain blocks, words and lines, as well as bounding boxes for each word and line."
-  - content: "You've scanned a letter into PDF format and need to extract the text it contains. What should you do?"
-    choices:
-    - content: "Use the Azure AI Custom Vision service"
-      isCorrect: false
-      explanation: "Incorrect: The Azure AI Custom Vision service is used to build and deploy image identification applications by applying labels to classes or objects."
-    - content: "Use the Image Analysis API of the Azure AI Vision service."
-      isCorrect: false
-      explanation: "Incorrect: The Image Analysis API isn't well suited to process PDF formatted files."
-    - content: "Use the Document Intelligence API."
-      isCorrect: true
-      explanation: "Correct: The Document Intelligence API can be used to process PDF formatted files."
-
+### YamlMime:ModuleUnit
+uid: learn.wwl.read-text-images-documents-with-computer-vision-service.knowledge-check
+title: Module assessment
+metadata:
+  title: Module assessment
+  description: Knowledge Check
+  author: wwlpublish
+  ms.author: berryivor
+  ms.date: 02/05/2024
+  ms.topic: unit
+  ms.collection:
+    - wwl-ai-copilot
+durationInMinutes: 3
+quiz:
+  questions:
+  - content: "Which service should you use to locate and read text in signs within a photograph of a street."
+    choices:
+    - content: "Azure AI Language"
+      isCorrect: false
+      explanation: "Incorrect: Azure AI Language aren't able to perform OCR."
+    - content: "Azure AI Document Intelligence"
+      isCorrect: false
+      explanation: "Incorrect: Azure Document Intelligence is designed to extract text from documents and forms."
+    - content: "Azure AI Vision"
+      isCorrect: true
+      explanation: "Correct: The Image Analysis feature on Azure AI Vision includes OCR capabilities that can extract text from images."
+  - content: "Which visual feature enumeration should you use to return OCR results from an image analysis call?"
+    choices:
+    - content: "VisualFeatures.Caption"
+      isCorrect: false
+      explanation: "Incorrect: The VisualFeatures.Caption enumeration returns a suggested caption for the image."
+    - content: "VisualFeatures.Read"
+      isCorrect: true
+      explanation: "Correct: The VisualFeatures.Read enumeration returns text and its location in the image."
+    - content: "VisualFeatures.Tags"
+      isCorrect: false
+      explanation: "Incorrect: The VisualFeatures.Tags enumeration returns suggested tags to help categorize the image."
+  - content: "Text location information in an image is returned at which levels by the Azure AI Vision image analysis API?"
+    choices:
+    - content: "The location of individual *words* only."
+      isCorrect: false
+      explanation: "Incorrect: The location and text of individual words are returned, but that's not the only level."
+    - content: "A single *block* containing all of the text in the image."
+      isCorrect: false
+      explanation: "Incorrect:  single block is returned, but it includes smaller location areas for the text detected in the image."
+    - content: "A *block* containing the location of *lines* of text as well as individual *words*."
+      isCorrect: true
+      explanation: "Correct: The image analysis OCR results include a block in which each line of text is located, and within each line the location of each word is returned."
+
diff --git a/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/includes/2-options-read-text.md b/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/includes/2-options-read-text.md
@@ -1,14 +1,19 @@
-Azure AI provides two different features that read text from documents and images, one in the Azure AI Vision Service, the other in Azure AI Document Intelligence. There is overlap in what each service provides, however each is optimized for results depending on what the input is.
+There are multiple Azure AI services that read text from documents and images, each optimized for results depending on the input and the specific requirements of your application.
 
-- **Image Analysis** Optical character recognition (OCR):
-    - Use this feature for general, unstructured documents with smaller amount of text, or images that contain text.
-    - Results are returned immediately (synchronous) from a single API call.
-    - Has functionality for analyzing images past extracting text, including object detection, describing or categorizing an image, generating smart-cropped thumbnails and more.
-    - Examples include: street signs, handwritten notes, and store signs.
-- **Document Intelligence**:
-    - Use this service to read small to large volumes of text from images and PDF documents.
-    - This service uses context and structure of the document to improve accuracy.
-    - The initial function call returns an asynchronous operation ID, which must be used in a subsequent call to retrieve the results.
-    - Examples include: receipts, articles, and invoices.
+- **Azure AI Vision** includes an *image analysis* capability that supports *optical character recognition* (OCR). Consider using Azure AI Vision in the following scenarios:
+    - **Text location and extraction from scanned documents**: Azure AI Vision is a great solution for general, unstructured documents that have been scanned as images. For example, reading text in labels, menus, or business cards.
+    - **Finding and reading text in photographs**: Examples include photo's that include street signs and store names.
+    - **Digital asset management (DAM)**: Azure AI Vision includes functionality for analyzing images beyond extracting text; including object detection, describing or categorizing an image, generating smart-cropped thumbnails and more. These capabilities make it a useful service when you need to catalog, index, or analyze large volumes of digital image-based content.
+- **Azure AI Document Intelligence** is a service that is specifically designed to extract information from complex digital documents. Azure AI Document Intelligence is designed for extracting text, key-value pairs, tables, and structures from documents automatically and accurately. Key considerations for choosing Azure AI Document Intelligence include:
+    - **Form Processing**: Azure AI Document Intelligence is specifically designed to extract data from forms, invoices, receipts, and other structured documents.
+    - **Prebuilt Models**: Azure AI Document Intelligence provides prebuilt models for common document types to reduce complexity and integrate into workflows or applications.
+    - **Custom Models**: Creating custom models tailored to your specific documents, makes Azure AI Document Intelligence a flexible solution that can be used in many business scenarios.
+- **Azure AI Content Understanding** is a service that you can use to analyze and extract information from multiple kinds of content; including documents, images, audio streams, and video.It is suitable for:
+    - **Multimodal content extraction**: Extracting content and structured fields from documents, forms, audio, video, and images.
+    - **Custom content analysis scenarios**: Support for customizable analyzers enables you to extract specific content or fields tailored to business needs.
 
-You can access both technologies via the REST API or a client library. In this module, we'll focus on the OCR feature in **Image Analysis**. If you'd like to learn more about **Document Intelligence**, [reading this module](/training/modules/use-prebuilt-form-recognizer-models/?azure-portal=true) will provide a good introduction.
+> [!NOTE]
+> In the rest of this module, we'll focus on the OCR image analysis feature in **Azure AI Vision**. To learn more about Azure AI Document Intelligence and Azure AI Content understanding, consider completing the following training modules:
+>
+> - [Plan an Azure AI Document Intelligence solution](/training/modules/plan-form-recognizer-solution/)
+> - [Analyze content with Azure AI Content Understanding](/training/modules/analyze-content-ai/)
diff --git a/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/includes/4-use-read-api.md b/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/includes/4-use-read-api.md
@@ -1,30 +1,62 @@
-To use the Read OCR feature, call the **ImageAnalysis** function (REST API or equivalent SDK method), passing the image URL or binary data, and optionally specifying a gender neutral caption or the language the text is written in (with a default value of **en** for English).
+To use Azure AI Vision for image analysis, including optical character recognition, you must provision an Azure AI Vision resource in an Azure subscription. The resource can be:
 
-To make an OCR request to **ImageAnalysis**, specify the visual feature as `READ`.
+- An **Azure AI Services** multi-service resource (either deployed as part of an Azure AI Foundry hub and project, or as a standalone resource).
+- A **Computer Vision** resource.
 
-**C#**
+To use your deployed resource in an application, you must connect to its *endpoint* using either key-based authentication or Microsoft Entra ID authentication. You can find the endpoint for your resource in the Azure portal, or if you're working in an Azure AI Foundry project, in the Azure AI Foundry portal. The endpoint is in the form of a URL, and typically looks something like this:
 
-```csharp
-ImageAnalysisResult result = client.Analyze(
-    <image-to-analyze>,
-    VisualFeatures.Read);
+```
+https://<resource_name>.cognitiveservices.azure.com/
 ```
 
-**Python**
+After establishing a connection, you can use the OCR feature by calling the **ImageAnalysis** function (via the REST API or with an equivalent SDK method), passing the image URL or binary data, and optionally specifying the language the text is written in (with a default value of **en** for English).
+
+```rest
+https://<endpoint>/computervision/imageanalysis:analyze?features=read&...
+```
+
+::: zone pivot="python"
+
+To use the Azure AI Vision Python SDK to extract text from an image, install the **azure-ai-vision-imageanalysis** package. Then, in your code, use either key-based authentication or Microsoft Entra ID authentication to connect an **ImageAnalysisClient** object to an Azure AI Vision resource. To find and read text in an image, call the **analyze** (or **analyze_from_url**) method, specifying the **VisualFeatures.READ** enumeration.
 
 ```python
+from azure.ai.vision.imageanalysis import ImageAnalysisClient
+from azure.ai.vision.imageanalysis.models import VisualFeatures
+from azure.core.credentials import AzureKeyCredential
+
+client = ImageAnalysisClient(
+    endpoint="<YOUR_RESOURCE_ENDPOINT>",
+    credential=AzureKeyCredential("<YOUR_AUTHORIZATION_KEY>")
+)
+
 result = client.analyze(
-    image_url=<image_to_analyze>,
-    visual_features=[VisualFeatures.READ]
+    image_data=<IMAGE_DATA_BYTES>, # Binary data from your image file
+    visual_features=[VisualFeatures.READ],
+    language="en",
 )
 ```
 
-If using the REST API, specify the feature as `read`.
+::: zone-end
 
-```rest
-https://<endpoint>/computervision/imageanalysis:analyze?features=read&...
+::: zone pivot="csharp"
+
+To use the Azure AI Vision .NET SDK to extract text from an image, install the **Azure.AI.Vision.ImageAnalysis** package. Then, in your code, use either key-based authentication or Microsoft Entra ID authentication to connect an **ImageAnalysisClient** object to an Azure AI Vision resource. To find and read text in an image, call the **Analyze** method, specifying the **VisualFeatures.Read** enumeration.
+
+```csharp
+using Azure.AI.Vision.ImageAnalysis;
+
+ImageAnalysisClient client = new ImageAnalysisClient(
+    "<YOUR_RESOURCE_ENDPOINT>",
+    new AzureKeyCredential("<YOUR_AUTHORIZATION_KEY>"));
+
+ImageAnalysisResult result = client.Analyze(
+    <IMAGE_DATA_BYTES>, // Binary data from your image file
+    VisualFeatures.Read,
+    new ImageAnalysisOptions { Language = t"en" });
 ```
 
+::: zone-end
+
 The results of the Read OCR function are returned synchronously, either as JSON or the language specific object of a similar structure. These results are broken down in *blocks* (with the current service only using one block), then *lines*, and then *words*. Additionally, the text values are included at both the *line* and *word* levels, making it easier to read entire lines of text if you don't need to extract text at the individual *word* level.
 
 ```JSON
diff --git a/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/includes/5-exercise.md b/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/includes/5-exercise.md
@@ -1,8 +1,13 @@
-[!INCLUDE [Lab note](../../../includes/wwl/lab-note.md)]
+Now it's your turn to try using the OCR capabilities of Azure AI Vision.
 
-If you're completing this exercise on your own computer, follow these [exercise instructions](https://microsoftlearning.github.io/mslearn-ai-vision/Instructions/Exercises/05-ocr.html?azure-portal=true).
+In this exercise, you use the Azure AI Vision SDK to develop a client application that extracts text from images.
 
-When you finish the exercise, end the lab to close the VM. Don't forget to come back and complete the knowledge check to earn points for completing this module!
+> [!NOTE]
+> To complete this lab, you need an **[Azure subscription](https://azure.microsoft.com/free?azure-portal=true)** in which you have administrative access.
+
+Launch the exercise and follow the instructions.
+
+[![Button to launch exercise.](../media/launch-exercise.png)](https://go.microsoft.com/fwlink/?linkid=2320100&azure-portal=true)
 
 > [!TIP]
-> After completing the exercise, if you've finished exploring Azure AI Services, delete the Azure resources that you created during the exercise.
+> After completing the exercise, if you've finished exploring Azure AI services, delete the Azure resources that you created during the exercise.
diff --git a/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/includes/6-knowledge-check.md b/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/includes/6-knowledge-check.md
diff --git a/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/includes/7-summary.md b/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/includes/7-summary.md
@@ -1,7 +1,3 @@
-In this module, you learned how to:
+In this module, you learned how to provision an Azure AI Vision resource and use it from a client application to extract text from images.
 
-- Read text from images with **ImageAnalysis** READ feature
-- Use the Azure AI Vision service with SDKs and the REST API
-- Develop an application that can read printed and handwritten text
-
-For more information, see the [OCR documentation](/azure/ai-services/computer-vision/concept-ocr).
+To learn more about using the Azure AI Vision service for OCR, see the [OCR - Optical Character Recognition](/azure/ai-services/computer-vision/overview-ocr) in the Azure AI Vision documentation.
diff --git a/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/index.yml b/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/index.yml
diff --git a/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/media/launch-exercise.png b/learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/media/launch-exercise.png