Skip to content

Commit 6e1cbf2

Browse files
committed
Updated OCR module (and other minor computer vision fixes)
1 parent ad99070 commit 6e1cbf2

File tree

12 files changed

+206
-182
lines changed

12 files changed

+206
-182
lines changed

learn-pr/paths/create-computer-vision-solutions-azure-ai/index.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,10 +29,10 @@ subjects:
2929
- artificial-intelligence
3030
modules:
3131
- learn.wwl.analyze-images
32+
- learn.wwl.read-text-images-documents-with-computer-vision-service
33+
- learn.wwl.detect-analyze-recognize-faces
3234
- learn.wwl.classify-images
3335
- learn.wwl.detect-objects-images
34-
- learn.wwl.detect-analyze-recognize-faces
35-
- learn.wwl.read-text-images-documents-with-computer-vision-service
3636
- learn.wwl.analyze-video
3737
- learn.wwl.develop-generative-ai-vision-apps
3838
- learn.wwl.generate-images-azure-openai

learn-pr/wwl-data-ai/analyze-images/includes/3-analyze-image.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,8 @@ client = ImageAnalysisClient(
2424

2525
result = client.analyze(
2626
image_data=<IMAGE_DATA_BYTES>, # Binary data from your image file
27-
visual_features=[VisualFeatures.CAPTION, VisualFeatures.READ],
27+
visual_features=[VisualFeatures.CAPTION, VisualFeatures.TAGS],
2828
gender_neutral_caption=True,
29-
language="en",
3029
)
3130
```
3231

@@ -58,7 +57,7 @@ ImageAnalysisClient client = new ImageAnalysisClient(
5857

5958
ImageAnalysisResult result = client.Analyze(
6059
<IMAGE_DATA_BYTES>, // Binary data from your image file
61-
VisualFeatures.Caption | VisualFeatures.Read,
60+
VisualFeatures.Caption | VisualFeatures.Tags,
6261
new ImageAnalysisOptions { GenderNeutralCaption = true });
6362
```
6463

@@ -76,7 +75,7 @@ Available visual features are contained in the `VisualFeatures` enumeration:
7675
- VisualFeatures.People: Returns the bounding box for detected people
7776
- VisualFeatures.SmartCrops: Returns the bounding box of the specified aspect ratio for the area of interest
7877
- VisualFeatures.Read: Extracts readable text
79-
-
78+
8079
::: zone-end
8180

8281
Specifying the visual features you want analyzed in the image determines what information the response will contain. Most responses will contain a bounding box (if a location in the image is reasonable) or a confidence score (for features such as tags or captions).
Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,19 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.read-text-images-documents-with-computer-vision-service.using-read-api
3-
title: Use the Read API
4-
metadata:
5-
title: Use the Read API
6-
description: Use the Read API
7-
author: wwlpublish
8-
ms.author: berryivor
9-
ms.date: 02/05/2024
10-
ms.topic: unit
11-
ms.collection:
12-
- wwl-ai-copilot
13-
azureSandbox: false
14-
labModal: false
15-
durationInMinutes: 3
16-
content: |
17-
[!include[](includes/4-use-read-api.md)]
18-
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.read-text-images-documents-with-computer-vision-service.using-read-api
3+
title: Use the Read API
4+
metadata:
5+
title: Use the Read API
6+
description: Use the Read API
7+
author: wwlpublish
8+
ms.author: berryivor
9+
ms.date: 02/05/2024
10+
ms.topic: unit
11+
ms.collection:
12+
- wwl-ai-copilot
13+
zone_pivot_groups: dev-lang-csharp-python
14+
azureSandbox: false
15+
labModal: false
16+
durationInMinutes: 6
17+
content: |
18+
[!include[](includes/4-use-read-api.md)]
19+
Lines changed: 16 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,16 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.read-text-images-documents-with-computer-vision-service.exercise
3-
title: Exercise - Read text in images
4-
metadata:
5-
title: Exercise - Read text in images
6-
description: Exercise - Read text in images
7-
author: wwlpublish
8-
ms.author: berryivor
9-
ms.date: 02/05/2024
10-
ms.topic: unit
11-
ms.collection:
12-
- wwl-ai-copilot
13-
azureSandbox: false
14-
labId: 152408
15-
labModal: true
16-
durationInMinutes: 20
17-
content: |
18-
[!include[](includes/5-exercise.md)]
19-
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.read-text-images-documents-with-computer-vision-service.exercise
3+
title: Exercise - Read text in images
4+
metadata:
5+
title: Exercise - Read text in images
6+
description: Exercise - Read text in images
7+
author: wwlpublish
8+
ms.author: berryivor
9+
ms.date: 02/05/2024
10+
ms.topic: unit
11+
ms.collection:
12+
- wwl-ai-copilot
13+
durationInMinutes: 30
14+
content: |
15+
[!include[](includes/5-exercise.md)]
16+
Lines changed: 49 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,49 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.read-text-images-documents-with-computer-vision-service.knowledge-check
3-
title: Module assessment
4-
metadata:
5-
title: Module assessment
6-
description: Knowledge Check
7-
author: wwlpublish
8-
ms.author: berryivor
9-
ms.date: 02/05/2024
10-
ms.topic: unit
11-
ms.collection:
12-
- wwl-ai-copilot
13-
azureSandbox: false
14-
labModal: false
15-
durationInMinutes: 4
16-
content: |
17-
[!include[](includes/6-knowledge-check.md)]
18-
quiz:
19-
questions:
20-
- content: "Which API would be best for this scenario? You need to read a large number of files with high accuracy. The text is short sections of handwritten text, some in English and some of it is in multiple languages."
21-
choices:
22-
- content: "A custom Language API"
23-
isCorrect: false
24-
explanation: "Incorrect: Azure AI Language custom models aren't able to perform OCR."
25-
- content: "Document Intelligence API"
26-
isCorrect: false
27-
explanation: "Incorrect: Document Intelligence is the best choice for large amounts of structured text and multiple languages, however isn't the best choice for shorter, unstructured handwritten text."
28-
- content: "Image Analysis API"
29-
isCorrect: true
30-
explanation: "Correct: The Image Analysis service OCR feature is best suited for short sections of handwritten text."
31-
- content: "What levels of division are the OCR results returned?"
32-
choices:
33-
- content: "Only total content and pages of text."
34-
isCorrect: false
35-
explanation: "Incorrect: Results contain blocks, words and lines, as well as bounding boxes for each word and line."
36-
- content: "Blocks, words and lines of text."
37-
isCorrect: true
38-
explanation: "Correct: Results contain blocks, words and lines, as well as bounding boxes for each word and line."
39-
- content: "Total content, image tags, pages, words and lines of text."
40-
isCorrect: false
41-
explanation: "Incorrect: Results contain blocks, words and lines, as well as bounding boxes for each word and line."
42-
- content: "You've scanned a letter into PDF format and need to extract the text it contains. What should you do?"
43-
choices:
44-
- content: "Use the Azure AI Custom Vision service"
45-
isCorrect: false
46-
explanation: "Incorrect: The Azure AI Custom Vision service is used to build and deploy image identification applications by applying labels to classes or objects."
47-
- content: "Use the Image Analysis API of the Azure AI Vision service."
48-
isCorrect: false
49-
explanation: "Incorrect: The Image Analysis API isn't well suited to process PDF formatted files."
50-
- content: "Use the Document Intelligence API."
51-
isCorrect: true
52-
explanation: "Correct: The Document Intelligence API can be used to process PDF formatted files."
53-
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.read-text-images-documents-with-computer-vision-service.knowledge-check
3+
title: Module assessment
4+
metadata:
5+
title: Module assessment
6+
description: Knowledge Check
7+
author: wwlpublish
8+
ms.author: berryivor
9+
ms.date: 02/05/2024
10+
ms.topic: unit
11+
ms.collection:
12+
- wwl-ai-copilot
13+
durationInMinutes: 3
14+
quiz:
15+
questions:
16+
- content: "Which service should you use to locate and read text in signs within a photograph of a street."
17+
choices:
18+
- content: "Azure AI Language"
19+
isCorrect: false
20+
explanation: "Incorrect: Azure AI Language aren't able to perform OCR."
21+
- content: "Azure AI Document Intelligence"
22+
isCorrect: false
23+
explanation: "Incorrect: Azure Document Intelligence is designed to extract text from documents and forms."
24+
- content: "Azure AI Vision"
25+
isCorrect: true
26+
explanation: "Correct: The Image Analysis feature on Azure AI Vision includes OCR capabilities that can extract text from images."
27+
- content: "Which visual feature enumeration should you use to return OCR results from an image analysis call?"
28+
choices:
29+
- content: "VisualFeatures.Caption"
30+
isCorrect: false
31+
explanation: "Incorrect: The VisualFeatures.Caption enumeration returns a suggested caption for the image."
32+
- content: "VisualFeatures.Read"
33+
isCorrect: true
34+
explanation: "Correct: The VisualFeatures.Read enumeration returns text and its location in the image."
35+
- content: "VisualFeatures.Tags"
36+
isCorrect: false
37+
explanation: "Incorrect: The VisualFeatures.Tags enumeration returns suggested tags to help categorize the image."
38+
- content: "Text location information in an image is returned at which levels by the Azure AI Vision image analysis API?"
39+
choices:
40+
- content: "The location of individual *words* only."
41+
isCorrect: false
42+
explanation: "Incorrect: The location and text of individual words are returned, but that's not the only level."
43+
- content: "A single *block* containing all of the text in the image."
44+
isCorrect: false
45+
explanation: "Incorrect: single block is returned, but it includes smaller location areas for the text detected in the image."
46+
- content: "A *block* containing the location of *lines* of text as well as individual *words*."
47+
isCorrect: true
48+
explanation: "Correct: The image analysis OCR results include a block in which each line of text is located, and within each line the location of each word is returned."
49+
Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,19 @@
1-
Azure AI provides two different features that read text from documents and images, one in the Azure AI Vision Service, the other in Azure AI Document Intelligence. There is overlap in what each service provides, however each is optimized for results depending on what the input is.
1+
There are multiple Azure AI services that read text from documents and images, each optimized for results depending on the input and the specific requirements of your application.
22

3-
- **Image Analysis** Optical character recognition (OCR):
4-
- Use this feature for general, unstructured documents with smaller amount of text, or images that contain text.
5-
- Results are returned immediately (synchronous) from a single API call.
6-
- Has functionality for analyzing images past extracting text, including object detection, describing or categorizing an image, generating smart-cropped thumbnails and more.
7-
- Examples include: street signs, handwritten notes, and store signs.
8-
- **Document Intelligence**:
9-
- Use this service to read small to large volumes of text from images and PDF documents.
10-
- This service uses context and structure of the document to improve accuracy.
11-
- The initial function call returns an asynchronous operation ID, which must be used in a subsequent call to retrieve the results.
12-
- Examples include: receipts, articles, and invoices.
3+
- **Azure AI Vision** includes an *image analysis* capability that supports *optical character recognition* (OCR). Consider using Azure AI Vision in the following scenarios:
4+
- **Text location and extraction from scanned documents**: Azure AI Vision is a great solution for general, unstructured documents that have been scanned as images. For example, reading text in labels, menus, or business cards.
5+
- **Finding and reading text in photographs**: Examples include photo's that include street signs and store names.
6+
- **Digital asset management (DAM)**: Azure AI Vision includes functionality for analyzing images beyond extracting text; including object detection, describing or categorizing an image, generating smart-cropped thumbnails and more. These capabilities make it a useful service when you need to catalog, index, or analyze large volumes of digital image-based content.
7+
- **Azure AI Document Intelligence** is a service that is specifically designed to extract information from complex digital documents. Azure AI Document Intelligence is designed for extracting text, key-value pairs, tables, and structures from documents automatically and accurately. Key considerations for choosing Azure AI Document Intelligence include:
8+
- **Form Processing**: Azure AI Document Intelligence is specifically designed to extract data from forms, invoices, receipts, and other structured documents.
9+
- **Prebuilt Models**: Azure AI Document Intelligence provides prebuilt models for common document types to reduce complexity and integrate into workflows or applications.
10+
- **Custom Models**: Creating custom models tailored to your specific documents, makes Azure AI Document Intelligence a flexible solution that can be used in many business scenarios.
11+
- **Azure AI Content Understanding** is a service that you can use to analyze and extract information from multiple kinds of content; including documents, images, audio streams, and video.It is suitable for:
12+
- **Multimodal content extraction**: Extracting content and structured fields from documents, forms, audio, video, and images.
13+
- **Custom content analysis scenarios**: Support for customizable analyzers enables you to extract specific content or fields tailored to business needs.
1314

14-
You can access both technologies via the REST API or a client library. In this module, we'll focus on the OCR feature in **Image Analysis**. If you'd like to learn more about **Document Intelligence**, [reading this module](/training/modules/use-prebuilt-form-recognizer-models/?azure-portal=true) will provide a good introduction.
15+
> [!NOTE]
16+
> In the rest of this module, we'll focus on the OCR image analysis feature in **Azure AI Vision**. To learn more about Azure AI Document Intelligence and Azure AI Content understanding, consider completing the following training modules:
17+
>
18+
> - [Plan an Azure AI Document Intelligence solution](/training/modules/plan-form-recognizer-solution/)
19+
> - [Analyze content with Azure AI Content Understanding](/training/modules/analyze-content-ai/)

learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/includes/4-use-read-api.md

Lines changed: 45 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,62 @@
1-
To use the Read OCR feature, call the **ImageAnalysis** function (REST API or equivalent SDK method), passing the image URL or binary data, and optionally specifying a gender neutral caption or the language the text is written in (with a default value of **en** for English).
1+
To use Azure AI Vision for image analysis, including optical character recognition, you must provision an Azure AI Vision resource in an Azure subscription. The resource can be:
22

3-
To make an OCR request to **ImageAnalysis**, specify the visual feature as `READ`.
3+
- An **Azure AI Services** multi-service resource (either deployed as part of an Azure AI Foundry hub and project, or as a standalone resource).
4+
- A **Computer Vision** resource.
45

5-
**C#**
6+
To use your deployed resource in an application, you must connect to its *endpoint* using either key-based authentication or Microsoft Entra ID authentication. You can find the endpoint for your resource in the Azure portal, or if you're working in an Azure AI Foundry project, in the Azure AI Foundry portal. The endpoint is in the form of a URL, and typically looks something like this:
67

7-
```csharp
8-
ImageAnalysisResult result = client.Analyze(
9-
<image-to-analyze>,
10-
VisualFeatures.Read);
8+
```
9+
https://<resource_name>.cognitiveservices.azure.com/
1110
```
1211

13-
**Python**
12+
After establishing a connection, you can use the OCR feature by calling the **ImageAnalysis** function (via the REST API or with an equivalent SDK method), passing the image URL or binary data, and optionally specifying the language the text is written in (with a default value of **en** for English).
13+
14+
```rest
15+
https://<endpoint>/computervision/imageanalysis:analyze?features=read&...
16+
```
17+
18+
::: zone pivot="python"
19+
20+
To use the Azure AI Vision Python SDK to extract text from an image, install the **azure-ai-vision-imageanalysis** package. Then, in your code, use either key-based authentication or Microsoft Entra ID authentication to connect an **ImageAnalysisClient** object to an Azure AI Vision resource. To find and read text in an image, call the **analyze** (or **analyze_from_url**) method, specifying the **VisualFeatures.READ** enumeration.
1421

1522
```python
23+
from azure.ai.vision.imageanalysis import ImageAnalysisClient
24+
from azure.ai.vision.imageanalysis.models import VisualFeatures
25+
from azure.core.credentials import AzureKeyCredential
26+
27+
client = ImageAnalysisClient(
28+
endpoint="<YOUR_RESOURCE_ENDPOINT>",
29+
credential=AzureKeyCredential("<YOUR_AUTHORIZATION_KEY>")
30+
)
31+
1632
result = client.analyze(
17-
image_url=<image_to_analyze>,
18-
visual_features=[VisualFeatures.READ]
33+
image_data=<IMAGE_DATA_BYTES>, # Binary data from your image file
34+
visual_features=[VisualFeatures.READ],
35+
language="en",
1936
)
2037
```
2138

22-
If using the REST API, specify the feature as `read`.
39+
::: zone-end
2340

24-
```rest
25-
https://<endpoint>/computervision/imageanalysis:analyze?features=read&...
41+
::: zone pivot="csharp"
42+
43+
To use the Azure AI Vision .NET SDK to extract text from an image, install the **Azure.AI.Vision.ImageAnalysis** package. Then, in your code, use either key-based authentication or Microsoft Entra ID authentication to connect an **ImageAnalysisClient** object to an Azure AI Vision resource. To find and read text in an image, call the **Analyze** method, specifying the **VisualFeatures.Read** enumeration.
44+
45+
```csharp
46+
using Azure.AI.Vision.ImageAnalysis;
47+
48+
ImageAnalysisClient client = new ImageAnalysisClient(
49+
"<YOUR_RESOURCE_ENDPOINT>",
50+
new AzureKeyCredential("<YOUR_AUTHORIZATION_KEY>"));
51+
52+
ImageAnalysisResult result = client.Analyze(
53+
<IMAGE_DATA_BYTES>, // Binary data from your image file
54+
VisualFeatures.Read,
55+
new ImageAnalysisOptions { Language = t"en" });
2656
```
2757

58+
::: zone-end
59+
2860
The results of the Read OCR function are returned synchronously, either as JSON or the language specific object of a similar structure. These results are broken down in *blocks* (with the current service only using one block), then *lines*, and then *words*. Additionally, the text values are included at both the *line* and *word* levels, making it easier to read entire lines of text if you don't need to extract text at the individual *word* level.
2961

3062
```JSON
Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
1-
[!INCLUDE [Lab note](../../../includes/wwl/lab-note.md)]
1+
Now it's your turn to try using the OCR capabilities of Azure AI Vision.
22

3-
If you're completing this exercise on your own computer, follow these [exercise instructions](https://microsoftlearning.github.io/mslearn-ai-vision/Instructions/Exercises/05-ocr.html?azure-portal=true).
3+
In this exercise, you use the Azure AI Vision SDK to develop a client application that extracts text from images.
44

5-
When you finish the exercise, end the lab to close the VM. Don't forget to come back and complete the knowledge check to earn points for completing this module!
5+
> [!NOTE]
6+
> To complete this lab, you need an **[Azure subscription](https://azure.microsoft.com/free?azure-portal=true)** in which you have administrative access.
7+
8+
Launch the exercise and follow the instructions.
9+
10+
[![Button to launch exercise.](../media/launch-exercise.png)](https://go.microsoft.com/fwlink/?linkid=2320100&azure-portal=true)
611

712
> [!TIP]
8-
> After completing the exercise, if you've finished exploring Azure AI Services, delete the Azure resources that you created during the exercise.
13+
> After completing the exercise, if you've finished exploring Azure AI services, delete the Azure resources that you created during the exercise.

learn-pr/wwl-data-ai/read-text-images-documents-with-computer-vision-service/includes/6-knowledge-check.md

Lines changed: 0 additions & 5 deletions
This file was deleted.
Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,3 @@
1-
In this module, you learned how to:
1+
In this module, you learned how to provision an Azure AI Vision resource and use it from a client application to extract text from images.
22

3-
- Read text from images with **ImageAnalysis** READ feature
4-
- Use the Azure AI Vision service with SDKs and the REST API
5-
- Develop an application that can read printed and handwritten text
6-
7-
For more information, see the [OCR documentation](/azure/ai-services/computer-vision/concept-ocr).
3+
To learn more about using the Azure AI Vision service for OCR, see the [OCR - Optical Character Recognition](/azure/ai-services/computer-vision/overview-ocr) in the Azure AI Vision documentation.

0 commit comments

Comments
 (0)