Skip to content

Commit 7895a5d

Browse files
revise for pro mode notebook
1 parent 32f4a3e commit 7895a5d

File tree

6 files changed

+65
-6
lines changed

6 files changed

+65
-6
lines changed

notebooks/field_extraction_pro_mode.ipynb

Lines changed: 65 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"> #################################################################################\n",
1010
">\n",
1111
"> **Note:** Pro mode is currently available only for `document` data. \n",
12-
"> Supported file types: pdf, tiff, jpg, jpeg, png, bmp, heif\n",
12+
"> [Supported file types](https://learn.microsoft.com/en-us/azure/ai-services/content-understanding/service-limits#document-and-text): pdf, tiff, jpg, jpeg, png, bmp, heif\n",
1313
">\n",
1414
"> #################################################################################\n",
1515
"\n",
@@ -63,8 +63,33 @@
6363
"source": [
6464
"# Define paths for analyzer template, input documents, and reference documents\n",
6565
"analyzer_template = \"../analyzer_templates/invoice_contract_verification_pro_mode.json\"\n",
66-
"input_docs = \"../data/invoice_contract_verification/input_docs\"\n",
67-
"reference_docs = \"../data/invoice_contract_verification/reference_docs/\""
66+
"input_docs = \"../data/field_extraction_pro_mode/invoice_contract_verification/input_docs\"\n",
67+
"reference_docs = \"../data/field_extraction_pro_mode/invoice_contract_verification/reference_docs\""
68+
]
69+
},
70+
{
71+
"cell_type": "markdown",
72+
"metadata": {},
73+
"source": [
74+
"> Let's take a look at the analyzer template"
75+
]
76+
},
77+
{
78+
"cell_type": "code",
79+
"execution_count": null,
80+
"metadata": {},
81+
"outputs": [],
82+
"source": [
83+
"import json\n",
84+
"with open(analyzer_template, \"r\") as file:\n",
85+
" print(json.dumps(json.load(file), indent=2))"
86+
]
87+
},
88+
{
89+
"cell_type": "markdown",
90+
"metadata": {},
91+
"source": [
92+
"> In the analyzer, `\"mode\"` needs to be in `\"pro\"`. The defined field - \"PaymentTermsInconsistencies\" is a `\"generate\"` field and is asked to reason about inconsistency, and will be able to use referenced documents to be uploaded in [reference docs](../data/field_extraction_pro_mode/invoice_contract_verification/reference_docs)"
6893
]
6994
},
7095
{
@@ -89,7 +114,6 @@
89114
"outputs": [],
90115
"source": [
91116
"import logging\n",
92-
"import json\n",
93117
"import os\n",
94118
"import sys\n",
95119
"from pathlib import Path\n",
@@ -120,7 +144,7 @@
120144
" token_provider=token_provider,\n",
121145
" # IMPORTANT: Uncomment this if using subscription key\n",
122146
" # subscription_key=AZURE_AI_API_KEY,\n",
123-
" # x_ms_useragent=\"azure-ai-content-understanding-python/pro_mode\", # This header is used for sample usage telemetry, please comment out this line if you want to opt out.\n",
147+
" x_ms_useragent=\"azure-ai-content-understanding-python/pro_mode\", # This header is used for sample usage telemetry, please comment out this line if you want to opt out.\n",
124148
")"
125149
]
126150
},
@@ -201,10 +225,45 @@
201225
"metadata": {},
202226
"outputs": [],
203227
"source": [
228+
"from IPython.display import FileLink, display\n",
229+
"\n",
204230
"response = client.begin_analyze(CUSTOM_ANALYZER_ID, file_location=input_docs)\n",
205231
"result_json = client.poll_result(response, timeout_seconds=360)\n",
206232
"\n",
207-
"logging.info(json.dumps(result_json, indent=2))"
233+
"# Create the output directory if it doesn't exist\n",
234+
"output_dir = \"output\"\n",
235+
"os.makedirs(output_dir, exist_ok=True)\n",
236+
"\n",
237+
"output_path = os.path.join(output_dir, f\"{CUSTOM_ANALYZER_ID}_result.json\")\n",
238+
"with open(output_path, \"w\", encoding=\"utf-8\") as file:\n",
239+
" json.dump(result_json, file, indent=2)\n",
240+
"\n",
241+
"logging.info(\"Full analyzer result saved to:\")\n",
242+
"display(FileLink(output_path))"
243+
]
244+
},
245+
{
246+
"cell_type": "markdown",
247+
"metadata": {},
248+
"source": [
249+
"> Let's check the extracted fields with Pro mode "
250+
]
251+
},
252+
{
253+
"cell_type": "code",
254+
"execution_count": null,
255+
"metadata": {},
256+
"outputs": [],
257+
"source": [
258+
"fields = result_json[\"result\"][\"contents\"][0][\"fields\"]\n",
259+
"print(json.dumps(fields, indent=2))"
260+
]
261+
},
262+
{
263+
"cell_type": "markdown",
264+
"metadata": {},
265+
"source": [
266+
"> As seen in the field `PaymentTermsInconsistencies`, for example, the purchase contract has detailed payment terms that were agreed to prior to the service. However, the implied payment terms on the invoice conflict with this. Pro mode was able to identify the corresponding contract for this invoice from the reference documents and then analyze the contract together with the invoice to discover this inconsistency."
208267
]
209268
},
210269
{

0 commit comments

Comments
 (0)