Skip to content

Commit 4b9e968

Browse files
authored
Merge pull request #50490 from sherzyang/main
Add new module with acrolinx fixes.
2 parents 9305918 + 7533130 commit 4b9e968

23 files changed

+306
-0
lines changed
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-information-extraction.introduction
3+
title: Introduction
4+
metadata:
5+
title: Introduction
6+
description: "Introduction"
7+
ms.date: 5/9/2025
8+
author: wwlpublish
9+
ms.author: sheryang
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 1
14+
content: |
15+
[!include[](includes/1-introduction.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-information-extraction.overview
3+
title: Overview
4+
metadata:
5+
title: Overview
6+
description: "Overview"
7+
ms.date: 5/9/2025
8+
author: wwlpublish
9+
ms.author: sheryang
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 2
14+
content: |
15+
[!include[](includes/2-overview.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-information-extraction.vision-extraction
3+
title: Understand the extraction of data from images
4+
metadata:
5+
title: Understand the extraction of data from images
6+
description: "Understand how machine learning enables the extraction of data from images."
7+
ms.date: 5/9/2025
8+
author: wwlpublish
9+
ms.author: sheryang
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 3
14+
content: |
15+
[!include[](includes/3-vision-extraction.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-information-extraction.form-extraction
3+
title: Understand the extraction of data from forms
4+
metadata:
5+
title: Understand the extraction of data from forms
6+
description: "Understand how machine learning enables data extraction from forms."
7+
ms.date: 5/9/2025
8+
author: wwlpublish
9+
ms.author: sheryang
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 3
14+
content: |
15+
[!include[](includes/4-form-extraction.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-information-extraction.multimodal-extraction
3+
title: Understand multimodal data extraction
4+
metadata:
5+
title: Understand multimodal data extraction
6+
description: "Understand different techniques that enable multimodal data extraction."
7+
ms.date: 5/9/2025
8+
author: wwlpublish
9+
ms.author: sheryang
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 3
14+
content: |
15+
[!include[](includes/5-multimodal-extraction.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-information-extraction.knowledge-mining
3+
title: Understand data extraction for knowledge mining
4+
metadata:
5+
title: Understand data extraction for knowledge mining
6+
description: "Understand data extraction for knowledge mining."
7+
ms.date: 5/9/2025
8+
author: wwlpublish
9+
ms.author: sheryang
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 3
14+
content: |
15+
[!include[](includes/6-knowledge-mining.md)]
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-information-extraction.knowledge-check
3+
title: Module assessment
4+
metadata:
5+
title: Module assessment
6+
description: "Knowledge check"
7+
ms.date: 5/9/2025
8+
author: wwlpublish
9+
ms.author: sheryang
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 3
14+
quiz:
15+
title: "Check your knowledge"
16+
questions:
17+
- content: "What is the primary role of machine learning in information extraction?"
18+
choices:
19+
- content: "To store extracted data in a database."
20+
isCorrect: false
21+
explanation: ""
22+
- content: "To convert structured data into unstructured formats."
23+
isCorrect: false
24+
explanation: ""
25+
- content: "To transform content into numerical data and predict fields and values."
26+
isCorrect: true
27+
explanation: ""
28+
- content: "Which of the following best describes a “field” in the context of data extraction?"
29+
choices:
30+
- content: "A visual marker used to highlight important text"
31+
isCorrect: false
32+
explanation: ""
33+
- content: "A key that identifies the type of data being extracted"
34+
isCorrect: true
35+
explanation: ""
36+
- content: "A storage location for raw content"
37+
isCorrect: false
38+
explanation: ""
39+
- content: "How does generative AI enhance the data extraction process?"
40+
choices:
41+
- content: "By allowing users to define custom fields and generate values from unstructured content"
42+
isCorrect: true
43+
explanation: ""
44+
- content: "By converting JSON data into images"
45+
isCorrect: false
46+
explanation: ""
47+
- content: "By generating new documents from scratch"
48+
isCorrect: false
49+
explanation: ""
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-information-extraction.summary
3+
title: Summary
4+
metadata:
5+
title: Summary
6+
description: "Summary"
7+
ms.date: 5/9/2025
8+
author: wwlpublish
9+
ms.author: sheryang
10+
ms.topic: unit
11+
ms.custom:
12+
- N/A
13+
durationInMinutes: 1
14+
content: |
15+
[!include[](includes/8-summary.md)]
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Today's organizations deal with all kinds of content such as documents, video, audio, images, and text. A common task in these organizations includes identifying and storing key information from the content into databases.
2+
3+
Consider some of these use cases:
4+
- A manufacturer has images of each of its products. The images need to be analyzed for defects and anomalies.
5+
- A business works with a high volume of invoices, contracts, and reports with charts. Key data and summaries from the documents need to be extracted and logged.
6+
- Many hours of customer calls are recorded for quality purposes. The audio needs to be transcribed, summarized, and analyzed for sentiment.
7+
- A streaming catalog contains a large volume of video. Important moments in each video need to be tagged with metadata based on their content.
8+
9+
Manually processing such content can be slow and potentially error-prone. **AI-powered information extraction** encompasses capabilities that extract meaning from content. In this module, you explore core concepts related to information extraction.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
**AI-powered information extraction** and analysis enables organizations to gain actionable insights from data that might otherwise be locked up in documents, images, audio files, or other assets. Insights can come from structured and unstructured content. **Structured content** is information stored in a consistent format. Some examples include invoices, tax forms, and tables. **Unstructured content** is information that isn't in a predefined format. Some examples include emails, audio recordings, images, and videos.
2+
3+
## Information extraction processes
4+
5+
In general, information extraction processes follow these steps:
6+
7+
|**Step** | **Description** |
8+
|-|-|
9+
| **Source Identification** | Determine where the information resides and if it needs to be digitized.|
10+
| **Extraction** | Leverages many techniques based on machine learning to understand and extract data from digitized content. |
11+
| **Transformation & Structuring** | Extracted data is transformed into structured formats like JSON or tables.|
12+
| **Storage & Integration**| The processed data is then stored in databases, data lakes, or analytics platforms for further use.|
13+
14+
Both the type of content and type of insights needed from that content inform which techniques are necessary for information extraction. In this module we will take a look at the extraction of information with AI:
15+
16+
- From images
17+
- From forms
18+
- From multiple modalities
19+
- For knowledge mining
20+
21+
In many ways, the techniques used for images, forms, multiple modalities, and knowledge mining build upon each other.

0 commit comments

Comments
 (0)