|
9 | 9 | "This notebook demonstrates how to use Azure AI Content Understanding service to:\n",
|
10 | 10 | "1. Create a classifier to categorize documents\n",
|
11 | 11 | "2. Create a custom analyzer to extract specific fields\n",
|
12 |
| - "3. Combine classifier and analyzer for intelligent document processing\n", |
| 12 | + "3. Combine classifier and analyzers to classify, optionally split, and analyze documents in a flexible processing pipeline\n", |
| 13 | + "\n", |
| 14 | + "If you’d like to learn more before getting started, see the official documentation:\n", |
| 15 | + "[Understanding Classifiers in Azure AI Services](https://learn.microsoft.com/en-us/azure/ai-services/content-understanding/concepts/classifier)\n", |
13 | 16 | "\n",
|
14 | 17 | "## Prerequisites\n",
|
15 |
| - "- Azure subscription with access to Azure AI services\n", |
16 |
| - "- Python 3.8 or higher\n", |
17 |
| - "- A PDF document for testing (sample included)\n" |
| 18 | + "1. Ensure Azure AI service is configured following [steps](../README.md#configure-azure-ai-service-resource)\n", |
| 19 | + "2. Install the required packages to run the sample.\n" |
18 | 20 | ]
|
19 | 21 | },
|
20 | 22 | {
|
|
129 | 131 | "\n",
|
130 | 132 | "The classifier schema defines:\n",
|
131 | 133 | "- **Categories**: Document types to classify (e.g., Legal, Medical)\n",
|
132 |
| - "- **Split Mode**: How to split multi-page documents\n", |
133 |
| - " - `\"auto\"`: Automatically split based on content\n", |
134 |
| - " - `\"none\"`: Don't split\n", |
135 |
| - " - `\"perPage\"`: Split every page" |
| 134 | + " - **description (Optional)**: An optional field used to provide additional context or hints for categorizing or splitting documents. This can be helpful when the category name alone isn’t descriptive enough. If the category name is already clear and self-explanatory, this field can be omitted.\n", |
| 135 | + "- **splitMode Options**: Defines how multi-page documents should be split before classification or analysis.\n", |
| 136 | + " - `\"auto\"`: Automatically split based on content. \n", |
| 137 | + " For example, if two categories are defined as “invoice” and “application form”:\n", |
| 138 | + " - A PDF with only one invoice will be classified as a single document.\n", |
| 139 | + " - A PDF containing two invoices and one application form will be automatically split into three classified sections.\n", |
| 140 | + " - `\"none\"`: No splitting. \n", |
| 141 | + " The entire multi-page document is treated as a single unit for classification and analysis.\n", |
| 142 | + " - `\"perPage\"`: Split by page. \n", |
| 143 | + " Each page is treated as a separate document. This is useful when you’ve built custom analyzers designed to operate on a per-page basis." |
136 | 144 | ]
|
137 | 145 | },
|
138 | 146 | {
|
|
171 | 179 | "source": [
|
172 | 180 | "## 5. Initialize Content Understanding Client\n",
|
173 | 181 | "\n",
|
174 |
| - "Create the client that will communicate with Azure AI services." |
| 182 | + "Create the client that will communicate with Azure AI services.\n", |
| 183 | + "\n", |
| 184 | + "⚠️ Important:\n", |
| 185 | + "You must update the code below to match your Azure authentication method.\n", |
| 186 | + "Look for the `# IMPORTANT` comments and modify those sections accordingly.\n", |
| 187 | + "If you skip this step, the sample may not run correctly." |
175 | 188 | ]
|
176 | 189 | },
|
177 | 190 | {
|
|
0 commit comments