|
9 | 9 | "> #################################################################################\n",
|
10 | 10 | ">\n",
|
11 | 11 | "> **Note:** Pro mode is currently available only for `document` data. \n",
|
12 |
| - "> Supported file types: pdf, tiff, jpg, jpeg, png, bmp, heif\n", |
| 12 | + "> [Supported file types](https://learn.microsoft.com/en-us/azure/ai-services/content-understanding/service-limits#document-and-text): pdf, tiff, jpg, jpeg, png, bmp, heif\n", |
13 | 13 | ">\n",
|
14 | 14 | "> #################################################################################\n",
|
15 | 15 | "\n",
|
|
63 | 63 | "source": [
|
64 | 64 | "# Define paths for analyzer template, input documents, and reference documents\n",
|
65 | 65 | "analyzer_template = \"../analyzer_templates/invoice_contract_verification_pro_mode.json\"\n",
|
66 |
| - "input_docs = \"../data/invoice_contract_verification/input_docs\"\n", |
67 |
| - "reference_docs = \"../data/invoice_contract_verification/reference_docs/\"" |
| 66 | + "input_docs = \"../data/field_extraction_pro_mode/invoice_contract_verification/input_docs\"\n", |
| 67 | + "reference_docs = \"../data/field_extraction_pro_mode/invoice_contract_verification/reference_docs\"" |
| 68 | + ] |
| 69 | + }, |
| 70 | + { |
| 71 | + "cell_type": "markdown", |
| 72 | + "metadata": {}, |
| 73 | + "source": [ |
| 74 | + "> Let's take a look at the analyzer template" |
| 75 | + ] |
| 76 | + }, |
| 77 | + { |
| 78 | + "cell_type": "code", |
| 79 | + "execution_count": null, |
| 80 | + "metadata": {}, |
| 81 | + "outputs": [], |
| 82 | + "source": [ |
| 83 | + "import json\n", |
| 84 | + "with open(analyzer_template, \"r\") as file:\n", |
| 85 | + " print(json.dumps(json.load(file), indent=2))" |
| 86 | + ] |
| 87 | + }, |
| 88 | + { |
| 89 | + "cell_type": "markdown", |
| 90 | + "metadata": {}, |
| 91 | + "source": [ |
| 92 | + "> In the analyzer, `\"mode\"` needs to be in `\"pro\"`. The defined field - \"PaymentTermsInconsistencies\" is a `\"generate\"` field and is asked to reason about inconsistency, and will be able to use referenced documents to be uploaded in [reference docs](../data/field_extraction_pro_mode/invoice_contract_verification/reference_docs)" |
68 | 93 | ]
|
69 | 94 | },
|
70 | 95 | {
|
|
89 | 114 | "outputs": [],
|
90 | 115 | "source": [
|
91 | 116 | "import logging\n",
|
92 |
| - "import json\n", |
93 | 117 | "import os\n",
|
94 | 118 | "import sys\n",
|
95 | 119 | "from pathlib import Path\n",
|
|
120 | 144 | " token_provider=token_provider,\n",
|
121 | 145 | " # IMPORTANT: Uncomment this if using subscription key\n",
|
122 | 146 | " # subscription_key=AZURE_AI_API_KEY,\n",
|
123 |
| - " # x_ms_useragent=\"azure-ai-content-understanding-python/pro_mode\", # This header is used for sample usage telemetry, please comment out this line if you want to opt out.\n", |
| 147 | + " x_ms_useragent=\"azure-ai-content-understanding-python/pro_mode\", # This header is used for sample usage telemetry, please comment out this line if you want to opt out.\n", |
124 | 148 | ")"
|
125 | 149 | ]
|
126 | 150 | },
|
|
201 | 225 | "metadata": {},
|
202 | 226 | "outputs": [],
|
203 | 227 | "source": [
|
| 228 | + "from IPython.display import FileLink, display\n", |
| 229 | + "\n", |
204 | 230 | "response = client.begin_analyze(CUSTOM_ANALYZER_ID, file_location=input_docs)\n",
|
205 | 231 | "result_json = client.poll_result(response, timeout_seconds=360)\n",
|
206 | 232 | "\n",
|
207 |
| - "logging.info(json.dumps(result_json, indent=2))" |
| 233 | + "# Create the output directory if it doesn't exist\n", |
| 234 | + "output_dir = \"output\"\n", |
| 235 | + "os.makedirs(output_dir, exist_ok=True)\n", |
| 236 | + "\n", |
| 237 | + "output_path = os.path.join(output_dir, f\"{CUSTOM_ANALYZER_ID}_result.json\")\n", |
| 238 | + "with open(output_path, \"w\", encoding=\"utf-8\") as file:\n", |
| 239 | + " json.dump(result_json, file, indent=2)\n", |
| 240 | + "\n", |
| 241 | + "logging.info(\"Full analyzer result saved to:\")\n", |
| 242 | + "display(FileLink(output_path))" |
| 243 | + ] |
| 244 | + }, |
| 245 | + { |
| 246 | + "cell_type": "markdown", |
| 247 | + "metadata": {}, |
| 248 | + "source": [ |
| 249 | + "> Let's check the extracted fields with Pro mode " |
| 250 | + ] |
| 251 | + }, |
| 252 | + { |
| 253 | + "cell_type": "code", |
| 254 | + "execution_count": null, |
| 255 | + "metadata": {}, |
| 256 | + "outputs": [], |
| 257 | + "source": [ |
| 258 | + "fields = result_json[\"result\"][\"contents\"][0][\"fields\"]\n", |
| 259 | + "print(json.dumps(fields, indent=2))" |
| 260 | + ] |
| 261 | + }, |
| 262 | + { |
| 263 | + "cell_type": "markdown", |
| 264 | + "metadata": {}, |
| 265 | + "source": [ |
| 266 | + "> As seen in the field `PaymentTermsInconsistencies`, for example, the purchase contract has detailed payment terms that were agreed to prior to the service. However, the implied payment terms on the invoice conflict with this. Pro mode was able to identify the corresponding contract for this invoice from the reference documents and then analyze the contract together with the invoice to discover this inconsistency." |
208 | 267 | ]
|
209 | 268 | },
|
210 | 269 | {
|
|
0 commit comments