Skip to content

Commit 48a9a0e

Browse files
committed
Added the review suggesions
1 parent b7b4d92 commit 48a9a0e

File tree

1 file changed

+40
-64
lines changed

1 file changed

+40
-64
lines changed

samples/04_gis_analysts_data_scientists/leveraging_multimodal_inputs_for_information_extraction_task_using_a_third-party_llm .ipynb renamed to samples/04_gis_analysts_data_scientists/use_multimodal_inputs_for_information_extraction_task_using_a_third-party_llm .ipynb

Lines changed: 40 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -4,35 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# Leveraging Multimodal Inputs for Information Extraction Task Using a Third-Party Large Language Model"
8-
]
9-
},
10-
{
11-
"cell_type": "markdown",
12-
"metadata": {},
13-
"source": [
14-
"<!-- <h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
15-
"<div class=\"toc\">\n",
16-
"<ul class=\"toc-item\">\n",
17-
"<li><span><a href=\"#Introduction\" data-toc-modified-id=\"Introduction-1\">Introduction</a></span></li>\n",
18-
"<li><span><a href=\"#Prerequisites\" data-toc-modified-id=\"Prerequisites-2\">Prerequisites</a></span></li>\n",
19-
"<li><span><a href=\"#Imports\" data-toc-modified-id=\"Imports-3\">Imports</a></span></li>\n",
20-
"<li><span><a href=\"#Data-preparation\" data-toc-modified-id=\"Data-preparation-4\">Data preparation</a></span></li>\n",
21-
"<li><span><a href=\"#EntityRecognizer-model\" data-toc-modified-id=\"EntityRecognizer-model-5\">EntityRecognizer model</a></span></li>\n",
22-
"<ul class=\"toc-item\">\n",
23-
"<li><span><a href=\"#Finding-optimum-learning-rate\" data-toc-modified-id=\"Finding-optimum-learning-rate-5.1\">Finding optimum learning rate</a></span> \n",
24-
"<li><span><a href=\"#Model-training\" data-toc-modified-id=\"Model-training-5.2\">Model training</a></span>\n",
25-
"<li><span><a href=\"#Evaluate-model-performance\" data-toc-modified-id=\"Evaluate-model-performance-5.3\">Evaluate model performance</a></span>\n",
26-
"<li><span><a href=\"#Validate-results\" data-toc-modified-id=\"Validate-results-5.4\">Validate results</a></span></li>\n",
27-
"<li><span><a href=\"#Save-and-load-trained-models\" data-toc-modified-id=\"Save-and-load-trained-models-5.5\">Save and load trained models</a></span></li>\n",
28-
"</ul>\n",
29-
"<li><span><a href=\"#Model-inference\" data-toc-modified-id=\"Model-inference-6\">Model inference</a></span></li>\n",
30-
"<li><span><a href=\"#Publishing-the-results-as-feature-layer\" data-toc-modified-id=\"Publishing-the-results-as-feature-layer-7\">Publishing the results as feature layer</a></span></li>\n",
31-
"<li><span><a href=\"#Visualize-crime-incident-on-map\" data-toc-modified-id=\"Visualize-crime-incident-on-map- 8\">Visualize crime incident on map</a></span></li>\n",
32-
"<li><span><a href=\"#Create-a-hot-spot-map-of-crime-densities\" data-toc-modified-id=\"Create-a-hot-spot-map-of-crime-densities-9\">Create a hot spot map of crime densities</a></span></li>\n",
33-
"<li><span><a href=\"#Conclusion\" data-toc-modified-id=\"Conclusion-10\">Conclusion</a></span></li>\n",
34-
"<li><span><a href=\"#References\" data-toc-modified-id=\"References-11\">References</a></span></li>\n",
35-
"</ul></div> -->"
7+
"# Use multimodal inputs for information extraction with a third-party large language model"
368
]
379
},
3810
{
@@ -79,11 +51,9 @@
7951
"\n",
8052
"As businesses increasingly digitize operations, vast amounts of transactional data—such as sales receipts—are often stored as scanned images or photos. Extracting meaningful, structured information from these image-based documents can support a variety of analytical and operational workflows, such as sales tracking, sales management, and customer insights.\n",
8153
"\n",
82-
"Recent advancements in large language models (LLMs) have opened new possibilities for interpreting and extracting information from such inputs with greater accuracy and flexibility. The **GeoAI toolbox** in ArcGIS Pro supports integration of third-party language models, allowing users to process and analyze text using external AI services. Custom third-party models can be wrapped in ESRI Deep Learning Package (.dlpk) files and used within GeoAI tools and the arcgis.learn API. In this sample, we demonstrate how one such model**GPT-4o** from OpenAIused with the **Process Text Using AI Model** tool to extract relevant entities from receipt images. Third-party model support in **ArcGIS Pro** and the `arcgis.learn` API allows users to bring in AI modelswhether hosted by providers like OpenAI, Azure, etc., built from open-source code, or fine-tuned for a specific taskto enhance natural language processing directly within ArcGIS workflows.\n",
54+
"Recent advancements in large language models (LLMs) have opened new possibilities for interpreting and extracting information from such inputs with greater accuracy and flexibility. The **GeoAI toolbox** in ArcGIS Pro supports integration of third-party language models, allowing users to process and analyze text using external AI services. Custom third-party models can be wrapped in ESRI Deep Learning Package (.dlpk) files and used within GeoAI tools and the arcgis.learn API. In this sample, we demonstrate how one such model, **GPT-4o** from OpenAI, can be used with the **Process Text Using AI Model** tool to extract relevant entities from receipt images. Third-party model support in **ArcGIS Pro** and the `arcgis.learn` API allows users to bring in AI models (whether hosted by providers like OpenAI, Azure, etc., built from open-source code, or fine-tuned for a specific task) to enhance natural language processing directly within ArcGIS workflows.\n",
8355
"\n",
84-
"\n",
85-
"\n",
86-
"For this use case, we use a set of sales receipt images and perform entity extraction to identify key pieces of information. The extracted entities can be used for downstream tasks such as sales analytics, customer profiling, inventory management, or tax auditing. The model will extracts the following entities:\n",
56+
"For this use case, we use a set of sales receipt images and perform entity extraction to identify key pieces of information. The extracted entities can be used for downstream tasks like sales analytics, customer profiling, inventory management, or tax auditing. The model extracts the following entities:\n",
8757
"\n",
8858
"- Sale Date \n",
8959
"- Customer Name \n",
@@ -103,7 +73,7 @@
10373
"source": [
10474
"# Dataset \n",
10575
"\n",
106-
"For this use case, we use a sample set of images of printed sales receipts from various retail outlets such as Walmart, Costco Wholesale, etc.\n"
76+
"For this use case, we use a sample set of images of printed sales receipts from various retail outlets like Walmart, Costco Wholesale, etc.\n"
10777
]
10878
},
10979
{
@@ -140,8 +110,8 @@
140110
"metadata": {},
141111
"source": [
142112
"# Prerequisites\n",
143-
"- Refer to the section **\"Install deep learning dependencies of arcgis.learn module\"** [on this page](https://developers.arcgis.com/python/guide/install-and-set-up/#Install-deep-learning-dependencies) for detailed documentation on installation of the dependencies.\n",
144-
"- To learn more about how to create and integrate third-party model, refer [Use third party language models with ArcGIS](https://developers.arcgis.com/python/latest/guide/use-third-party-language-models-with-arcgis/)"
113+
"- Refer to the section **\"Install deep learning dependencies\"** [on this page](https://developers.arcgis.com/python/guide/install-and-set-up/#Install-deep-learning-dependencies) for detailed documentation on installation of the dependencies.\n",
114+
"- To learn more about how to create and integrate third-party models, refer to [Use third party language models with ArcGIS](https://developers.arcgis.com/python/latest/guide/use-third-party-language-models-with-arcgis/)"
145115
]
146116
},
147117
{
@@ -150,7 +120,7 @@
150120
"source": [
151121
"# Create the Third-Party Deep Learning Package (.dlpk)\n",
152122
"\n",
153-
"The first step in using a third-party language model is to prepare a Deep Learning Package file (`.dlpk`). This package includes your custom NLP Python function to interact with external models, along with an Esri Model Definition (`.emd`) file. In this use case, we use a third-party hosted model—**GPT-4o**—to extract key entities from scanned sales receipt images. Please note that if you use a web-hosted LLM, the data processed will be sent to the LLM provider. Use these models only if you trust their source.\n",
123+
"The first step in using a third-party language model is to prepare a Deep Learning Package file (`.dlpk`). This package includes your custom NLP Python function to interact with external models, along with an Esri Model Definition (`.emd`) file. In this use case, we use a third-party hosted model**(GPT-4o)** to extract key entities from scanned sales receipt images. Please note that if you use a web-hosted LLM, the data processed will be sent to the LLM provider. Use these models only if you trust their source.\n",
154124
"\n",
155125
"### Components of the Third-Party Deep Learning Package (.dlpk)\n",
156126
"\n",
@@ -265,21 +235,14 @@
265235
},
266236
{
267237
"cell_type": "code",
268-
"execution_count": null,
238+
"execution_count": 1,
269239
"metadata": {},
270240
"outputs": [],
271241
"source": [
272-
"import ast\n",
273-
"from math import e\n",
274242
"import arcpy\n",
275243
"import json\n",
276244
"import base64\n",
277-
"import pandas as pd\n",
278-
"import requests\n",
279-
"from pydantic import BaseModel\n",
280-
"from typing import Optional, List, Dict, Tuple\n",
281245
"from arcgis.features import FeatureSet\n",
282-
"from concurrent.futures import ThreadPoolExecutor\n",
283246
"import keyring\n",
284247
"from arcgis.learn import AIServiceConnection\n"
285248
]
@@ -290,7 +253,7 @@
290253
"source": [
291254
"### Define the ```__init__``` function\n",
292255
"\n",
293-
"The `__init__` method initializes instance variables such as `name`, `description`, and other attributes essential for the NLP function."
256+
"The `__init__` method initializes instance variables like `name`, `description`, and other attributes essential for the NLP function."
294257
]
295258
},
296259
{
@@ -309,13 +272,13 @@
309272
"metadata": {},
310273
"source": [
311274
"### Define ```initialize``` function\n",
312-
"The initialize method is called at the start of the custom Python NLP function, within this function we will set up the necessary variables. It accepts two parameters via `kwargs`:\n",
275+
"The initialize method is called at the start of the custom Python NLP function. Within this function, we will set up the necessary variables. It accepts two parameters via `kwargs`:\n",
313276
"\n",
314277
"#### Parameters in `kwargs`\n",
315278
"- **`model`**: The path to the ESRI Model Definition (.emd) file.\n",
316-
"- **`device`**: The name of the device (either GPU or CPU), which is particularly important for on-premises models.\n",
279+
"- **`device`**: The name of the device (either GPU or CPU). This is particularly important for on-premises models.\n",
317280
"\n",
318-
"`initialize` reads the ESRI Model Definition (.emd) file and configures the essential variables needed for inference.\n"
281+
"The `initialize` method reads the ESRI Model Definition (.emd) file and configures the essential variables needed for inference. This method is called only once, making it the ideal place to load any resources or dependencies required throughout the inference process.\n"
319282
]
320283
},
321284
{
@@ -338,7 +301,7 @@
338301
"source": [
339302
"### Define the ```getParameterInfo``` function\n",
340303
"\n",
341-
"This function is designed to collect parameters from the user through the GeoAI tools. For our use case, it gathers the text prompt used to instruct the language model on which entities to extract from the input receipt text, as well as the connection file required to authenticate with the third-party model provider. Refer to the section [getParameterInfo](https://developers.arcgis.com/python/latest/guide/use-third-party-language-models-with-arcgis/) to get a detail explanation of the ```getParameterInfo``` function."
304+
"The ```getParameterInfo``` function defines the parameters that will be exposed in the GeoAI tool, including their data types and allowable values. For our use case, it gathers the text prompt used to instruct the language model on which entities to extract from the input receipt text, as well as the connection file required to authenticate with the third-party model provider. Refer to the [getParameterInfo](https://developers.arcgis.com/python/latest/guide/use-third-party-language-models-with-arcgis/) section to get a detailed explanation of the ```getParameterInfo``` function."
342305
]
343306
},
344307
{
@@ -398,10 +361,10 @@
398361
"cell_type": "markdown",
399362
"metadata": {},
400363
"source": [
401-
"The `getConfiguration` method sets up and manages the parameters required by the NLP function. It receives keyword arguments (`kwargs`) that contain the values provided by the user—either through the GeoAI tool interface or programmatically.\n",
364+
"The `getConfiguration` method sets up and manages the parameters required by the NLP function. It receives keyword arguments (`kwargs`) that contain the values provided by the user—either through the GeoAI tool interface or programmatically in the ```getParameterInfo``` function.\n",
402365
"\n",
403366
"This method is responsible for:\n",
404-
"- Extracting user-specified values (e.g., prompt, AI connection file path).\n",
367+
"- Extracting input from the tool (e.g., prompt, AI connection file path).\n",
405368
"- Storing these values in class-level variables or a configuration dictionary.\n",
406369
"- Controlling how the model processes the input and generates the output based on the updated parameters.\n",
407370
"\n",
@@ -412,9 +375,9 @@
412375
"cell_type": "markdown",
413376
"metadata": {},
414377
"source": [
415-
"For our use case, we are using a large language model (LLM) provided by an external AI service**GPT-4o from OpenAI**. To enable communication between our custom Python NLP function and the OpenAI endpoint, we need a connection file. You can refer to the *[Create AI Service Connection File](#create-ai-service-connection-file)* section for instructions on how to create one.\n",
378+
"For our use case, we are using a large language model (LLM) provided by an external AI service, **GPT-4o from OpenAI**. To enable communication between our custom Python NLP function and the OpenAI endpoint, we need a connection file. You can refer to the *[Create AI Service Connection File](#create-ai-service-connection-file)* section for instructions on how to create one.\n",
416379
"\n",
417-
"This connection file securely stores the required credentials, which will be retrieved and used to authenticate and initialize the third-party model. These saved credentials will be accessed from the connection file and set within the current class context in the following function.\n"
380+
"This connection file securely stores the required credentials that will be retrieved and used to authenticate and initialize the third-party model. These saved credentials will be accessed from the connection file and set within the current class context in the following function.\n"
418381
]
419382
},
420383
{
@@ -427,11 +390,11 @@
427390
" # Get ai_connection_file parameter value \n",
428391
" connection_file_path = kwargs.get(\"ai_connection_file\", None)\n",
429392
" # Read the connection file using AIServiceConnection class inside arcgis.learn\n",
430-
" conn = AIServiceConnection(connection_file_path)\n",
431-
" # Set the values\n",
432-
" conn_dict = conn.get_dict()\n",
433-
" # self.API_KEY = conn_dict['serviceProviderProperties']['token']\n",
434-
" # self.model = conn_dict['serviceProviderProperties']['model']\n",
393+
" from arcgis.learn import AIServiceConnection\n",
394+
" con = AIServiceConnection(ais_filename)\n",
395+
" conn_param_v = con.get_dict()\n",
396+
" self.API_KEY = conn_param_v[\"authenticationSecrets\"][\"token\"]\n",
397+
" self.model = conn_param_v[\"serviceProviderProperties\"][\"model\"]\n",
435398
"\n",
436399
" # Get prompt parameter value \n",
437400
" self.prompt_txt = kwargs.get(\"prompt\", \"\")\n",
@@ -446,6 +409,13 @@
446409
"### Define the ```predict``` function"
447410
]
448411
},
412+
{
413+
"cell_type": "markdown",
414+
"metadata": {},
415+
"source": [
416+
"The ```predict``` method performs inference, that is, it generates predictions with the NLP model. This method is passed a FeatureSet containing the input features (or rows in the case of a table) and kwargs containing the field name which contains the input strings. This method returns the results in the form of a FeatureSet object."
417+
]
418+
},
449419
{
450420
"cell_type": "code",
451421
"execution_count": null,
@@ -578,14 +548,14 @@
578548
"\n",
579549
"\n",
580550
"- **Zip the Folder**: \n",
581-
"Compress the folder into a ZIP archive. \n",
551+
"Compress the files into a ZIP archive. \n",
582552
"Rename the `.zip` file to match the `.emd` file name, but with the `.dlpk` extension.\n",
583553
"\n",
584554
"**Example final file name:**\n",
585555
"\n",
586556
"```LLMEntityExtractor.dlpk```\n",
587557
"\n",
588-
"This `.dlpk` file is now ready for use with the ```Process Text Using AI Model``` inside ArcGIS Pro."
558+
"This `.dlpk` file is now ready for use with the ```Process Text Using AI Model``` tool inside ArcGIS Pro."
589559
]
590560
},
591561
{
@@ -666,11 +636,17 @@
666636
"source": [
667637
"# Conclusion\n",
668638
"\n",
669-
"This guide demonstrates how powerful AI models like GPT-4o can be seamlessly integrated into ArcGIS Pro through the Process text Using AI Model tool to extract structured information from unstructured receipt images. By using the \"Process Text Using AI Model\" tool and a custom LLM wrapped in a Deep Learning Package (.dlpk), users can automate the extraction of key entities such as sale date, customer name, product details, and payment method.\n",
639+
"This guide demonstrates how powerful AI models like GPT-4o can be seamlessly integrated into ArcGIS Pro through the ```Process text Using AI Model``` tool to extract structured information from unstructured receipt images. By using the ```Process Text Using AI Model``` tool and a custom LLM wrapped in a Deep Learning Package (.dlpk), users can automate the extraction of key entities like sale date, customer name, product details, and payment method.\n",
670640
"\n",
671-
"This approach reduces the need for manual data entry or annotation while delivering high-quality, analysis-ready outputs that can support a range of GIS and business workflows—including sales analytics, inventory tracking, and customer insights. With built-in support for third-party AI services, ArcGIS Pro enables users to bring the latest in NLP and machine learning directly into their geospatial data pipelines.\n",
672-
"\n"
641+
"This approach reduces the need for manual data entry or annotation, while delivering high-quality, analysis-ready outputs that can support a range of GIS and business workflows like sales analytics, inventory tracking, and customer insights. With built-in support for third-party AI services, ArcGIS Pro enables users to bring the latest in NLP and machine learning directly into their geospatial data pipelines.\n"
673642
]
643+
},
644+
{
645+
"cell_type": "code",
646+
"execution_count": null,
647+
"metadata": {},
648+
"outputs": [],
649+
"source": []
674650
}
675651
],
676652
"metadata": {

0 commit comments

Comments
 (0)