You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/content-understanding/tutorial/RAG-tutorial.md
+7-11Lines changed: 7 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,20 +58,19 @@ To implement data extraction in Content Understanding, follow these steps:
58
58
59
59
1.**Create an Analyzer:** Define an analyzer using REST APIs or our Python code samples. Optionally, include a field schema to specify the metadata to be extracted.
60
60
2.**Perform Content Extraction:** Use the analyzer to process files and extract structured content.
61
-
3.**Enhance with Field Extraction:** Add AI-generated fields to enrich the extracted content with additional metadata.
61
+
3.**(Optional) Enhance with Field Extraction:** Add AI-generated fields to enrich the extracted content with additional metadata.
62
62
63
63
## Creating an Analyzer
64
64
Analyzers are reusable components in Content Understanding that streamline the data extraction process. Once an analyzer is created, it can be used repeatedly to process files and extract content or fields based on predefined schemas. An analyzer acts as a blueprint for how data should be processed, ensuring consistency and efficiency across multiple files and content types.
65
65
66
66
The following code samples demonstrate how to create analyzers for each modality, specifying the structured data to be extracted, such as key fields, summaries, or classifications. These analyzers will serve as the foundation for extracting and enriching content in your RAG solution.
67
67
68
-
Starting off with the schema details for each modality:
68
+
**Starting off with the schema details for each modality:**
69
69
70
70
# [Document](#tab/document)
71
71
72
72
To create a custom analyzer, you need to define a field schema that describes the structured data you want to extract. In the following example, we define a schema for extracting basic information from an invoice document.
73
73
74
-
First, create a JSON file named `request_body.json` with the following content:
75
74
```json
76
75
{
77
76
"description": "Sample invoice analyzer",
@@ -112,9 +111,9 @@ First, create a JSON file named `request_body.json` with the following content:
112
111
113
112
# [Image](#tab/image)
114
113
115
-
To create a custom analyzer, you need to define a field schema that describes the structured data you want to extract. In the following example, we define a schema for identifying detects in images of metal plates.
114
+
To create a custom analyzer, you need to define a field schema that describes the structured data you want to extract. In the following example, we define a schema for identifying chart types in an image.
115
+
116
116
117
-
First, create a JSON file named `request_body.json` with the following content:
118
117
```json
119
118
{
120
119
"description": "Sample chart analyzer",
@@ -138,7 +137,6 @@ First, create a JSON file named `request_body.json` with the following content:
138
137
139
138
To create a custom analyzer, you need to define a field schema that describes the structured data you want to extract. In the following example, we define a schema for extracting basic information from call transcripts.
140
139
141
-
First, create a JSON file named `request_body.json` with the following content:
142
140
```json
143
141
{
144
142
"description": "Sample call transcript analyzer",
@@ -178,7 +176,6 @@ First, create a JSON file named `request_body.json` with the following content:
178
176
179
177
To create a custom analyzer, you need to define a field schema that describes the structured data you want to extract. In the following example, we define a schema for extracting basic information from marketing videos.
180
178
181
-
First, create a JSON file named `request_body.json` with the following content:
182
179
```json
183
180
{
184
181
"description": "Sample marketing video analyzer",
@@ -201,7 +198,7 @@ First, create a JSON file named `request_body.json` with the following content:
201
198
202
199
---
203
200
204
-
Load all environment variables and libraries from Langchain
201
+
#### Load all environment variables and necessary libraries from Langchain
Create analyzers using the schema definition from above
250
+
#### Create analyzers
254
251
255
252
```python
256
253
from pathlib import Path
@@ -312,7 +309,6 @@ for analyzer in analyzer_configs:
312
309
```
313
310
---
314
311
315
-
316
312
## Perform Content and Field Analysis
317
313
**Content extraction** is the first step in the RAG implementation process. It transforms raw multimodal data—such as documents, images, audio, and video—into structured, searchable formats. This foundational step ensures that the content is organized and ready for indexing and retrieval. Content extraction provides the baseline for indexing and retrieval but may not fully address domain-specific needs or provide deeper contextual insights.
318
314
[Learn more]() about content extraction capabilities for each modality.
@@ -324,7 +320,7 @@ With the analyzers created for each modality, we can now process files to extrac
0 commit comments