Skip to content

Commit af38001

Browse files
committed
clean up formatting
1 parent 570febe commit af38001

File tree

1 file changed

+7
-11
lines changed

1 file changed

+7
-11
lines changed

articles/ai-services/content-understanding/tutorial/RAG-tutorial.md

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -58,20 +58,19 @@ To implement data extraction in Content Understanding, follow these steps:
5858

5959
1. **Create an Analyzer:** Define an analyzer using REST APIs or our Python code samples. Optionally, include a field schema to specify the metadata to be extracted.
6060
2. **Perform Content Extraction:** Use the analyzer to process files and extract structured content.
61-
3. **Enhance with Field Extraction:** Add AI-generated fields to enrich the extracted content with additional metadata.
61+
3. **(Optional) Enhance with Field Extraction:** Add AI-generated fields to enrich the extracted content with additional metadata.
6262

6363
## Creating an Analyzer
6464
Analyzers are reusable components in Content Understanding that streamline the data extraction process. Once an analyzer is created, it can be used repeatedly to process files and extract content or fields based on predefined schemas. An analyzer acts as a blueprint for how data should be processed, ensuring consistency and efficiency across multiple files and content types.
6565

6666
The following code samples demonstrate how to create analyzers for each modality, specifying the structured data to be extracted, such as key fields, summaries, or classifications. These analyzers will serve as the foundation for extracting and enriching content in your RAG solution.
6767

68-
Starting off with the schema details for each modality:
68+
**Starting off with the schema details for each modality:**
6969

7070
# [Document](#tab/document)
7171

7272
To create a custom analyzer, you need to define a field schema that describes the structured data you want to extract. In the following example, we define a schema for extracting basic information from an invoice document.
7373

74-
First, create a JSON file named `request_body.json` with the following content:
7574
```json
7675
{
7776
"description": "Sample invoice analyzer",
@@ -112,9 +111,9 @@ First, create a JSON file named `request_body.json` with the following content:
112111

113112
# [Image](#tab/image)
114113

115-
To create a custom analyzer, you need to define a field schema that describes the structured data you want to extract. In the following example, we define a schema for identifying detects in images of metal plates.
114+
To create a custom analyzer, you need to define a field schema that describes the structured data you want to extract. In the following example, we define a schema for identifying chart types in an image.
115+
116116

117-
First, create a JSON file named `request_body.json` with the following content:
118117
```json
119118
{
120119
"description": "Sample chart analyzer",
@@ -138,7 +137,6 @@ First, create a JSON file named `request_body.json` with the following content:
138137

139138
To create a custom analyzer, you need to define a field schema that describes the structured data you want to extract. In the following example, we define a schema for extracting basic information from call transcripts.
140139

141-
First, create a JSON file named `request_body.json` with the following content:
142140
```json
143141
{
144142
"description": "Sample call transcript analyzer",
@@ -178,7 +176,6 @@ First, create a JSON file named `request_body.json` with the following content:
178176

179177
To create a custom analyzer, you need to define a field schema that describes the structured data you want to extract. In the following example, we define a schema for extracting basic information from marketing videos.
180178

181-
First, create a JSON file named `request_body.json` with the following content:
182179
```json
183180
{
184181
"description": "Sample marketing video analyzer",
@@ -201,7 +198,7 @@ First, create a JSON file named `request_body.json` with the following content:
201198

202199
---
203200

204-
Load all environment variables and libraries from Langchain
201+
#### Load all environment variables and necessary libraries from Langchain
205202

206203
``` python
207204

@@ -250,7 +247,7 @@ sys.path.append(str(parent_dir))
250247
```
251248
---
252249

253-
Create analyzers using the schema definition from above
250+
#### Create analyzers
254251

255252
``` python
256253
from pathlib import Path
@@ -312,7 +309,6 @@ for analyzer in analyzer_configs:
312309
```
313310
---
314311

315-
316312
## Perform Content and Field Analysis
317313
**Content extraction** is the first step in the RAG implementation process. It transforms raw multimodal data—such as documents, images, audio, and video—into structured, searchable formats. This foundational step ensures that the content is organized and ready for indexing and retrieval. Content extraction provides the baseline for indexing and retrieval but may not fully address domain-specific needs or provide deeper contextual insights.
318314
[Learn more]() about content extraction capabilities for each modality.
@@ -324,7 +320,7 @@ With the analyzers created for each modality, we can now process files to extrac
324320

325321
---
326322

327-
## Analyze files
323+
#### Analyze files
328324

329325
``` python
330326

0 commit comments

Comments
 (0)