Skip to content

Commit 862f21e

Browse files
committed
suggestions added by reviewer- rahul
1 parent 60dda12 commit 862f21e

File tree

1 file changed

+23
-88
lines changed

1 file changed

+23
-88
lines changed

samples/04_gis_analysts_data_scientists/information_extraction_from_cheshire_fire_incident_reports_using_mistral_language_model.ipynb

Lines changed: 23 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@
5151
"source": [
5252
"As text data continues to grow rapidly, extracting meaningful insights from large amounts of information is more important than ever. Large language models (LLMs) have emerged as powerful tools for processing unstructured data, significantly enhancing the accuracy and efficiency of information extraction. One of the key tasks that can be performed using large language models is entity extraction, which involves identifying and classifying entities—such as names, organizations, locations, dates, and other specific details—within a text.\n",
5353
"\n",
54-
"In this sample, we will explore how information extraction works using the Mistral language model in the `EntityRecognizer` class of the arcgis.learn API with the Cheshire fire incident reports dataset. The Cheshire fire dataset typically includes incident reports detailing fire incidents in Cheshire, covering information like locations, times, types of incidents, and response actions. This data can be valuable for analysis in understanding patterns, improving response strategies, and enhancing safety measures.\n",
54+
"In this sample, we will explore how information extraction works using the Mistral language model in the `EntityRecognizer` class of the arcgis.learn API with the Cheshire fire incident reports dataset. The Cheshire fire dataset includes incident reports detailing fire incidents in Cheshire, covering information like locations, times, types of incidents, and response actions. This data can be valuable for analysis in understanding patterns, improving response strategies, and enhancing safety measures.\n",
5555
"\n",
5656
"Key entities to extract from fire incident reports include:\n",
5757
"- **Address**\n",
@@ -180,27 +180,6 @@
180180
"filepath = training_data.download(file_name=training_data.name)"
181181
]
182182
},
183-
{
184-
"cell_type": "code",
185-
"execution_count": 5,
186-
"id": "34b2cf65-b18d-40c8-b0bf-83b65acb5b5a",
187-
"metadata": {},
188-
"outputs": [
189-
{
190-
"data": {
191-
"text/plain": [
192-
"'C:\\\\Users\\\\sur11226\\\\AppData\\\\Local\\\\Temp\\\\information_extraction_from_cheshire_fire_incident_reports_using_mistral_language_model.zip'"
193-
]
194-
},
195-
"execution_count": 5,
196-
"metadata": {},
197-
"output_type": "execute_result"
198-
}
199-
],
200-
"source": [
201-
"filepath"
202-
]
203-
},
204183
{
205184
"cell_type": "code",
206185
"execution_count": 6,
@@ -225,21 +204,10 @@
225204
},
226205
{
227206
"cell_type": "code",
228-
"execution_count": 8,
207+
"execution_count": 1,
229208
"id": "059f765d-07c6-4354-91dd-fe4d26f188ca",
230209
"metadata": {},
231-
"outputs": [
232-
{
233-
"data": {
234-
"text/plain": [
235-
"'C:\\\\Users\\\\sur11226\\\\AppData\\\\Local\\\\Temp\\\\information_extraction_from_cheshire_fire_incident_reports_using_mistral_language_model'"
236-
]
237-
},
238-
"execution_count": 8,
239-
"metadata": {},
240-
"output_type": "execute_result"
241-
}
242-
],
210+
"outputs": [],
243211
"source": [
244212
"os.path.splitext(filepath)[0]"
245213
]
@@ -433,37 +401,11 @@
433401
"source": [
434402
"## EntityRecognizer model\n",
435403
"\n",
436-
"`EntityRecognizer` model in `arcgis.learn` can be used with spaCy's [EntityRecognizer](https://spacy.io/api/entityrecognizer), [Hugging Face Transformers](https://huggingface.co/transformers/v3.0.2/index.html) or with larze language model backbones. For this sample use case we will use the Mistral model backbone to extract entities from te text.\n",
404+
"`EntityRecognizer` model in `arcgis.learn` can be used with [Hugging Face Transformers](https://huggingface.co/transformers/v3.0.2/index.html) or with large language model backbones. For this sample use case we will use the Mistral model backbone to extract entities from the text.\n",
437405
"\n",
438406
"Run the command below to see what backbones are supported for the **entity recognition** task."
439407
]
440408
},
441-
{
442-
"cell_type": "code",
443-
"execution_count": 19,
444-
"id": "b8d41792-0630-422d-acbb-6a315caf0107",
445-
"metadata": {},
446-
"outputs": [
447-
{
448-
"name": "stdout",
449-
"output_type": "stream",
450-
"text": [
451-
"('mistral',)\n"
452-
]
453-
}
454-
],
455-
"source": [
456-
"print(EntityRecognizer.available_backbone_models(\"llm\"))"
457-
]
458-
},
459-
{
460-
"cell_type": "code",
461-
"execution_count": null,
462-
"id": "a186b92b-5e53-4248-954a-7c08cfa66794",
463-
"metadata": {},
464-
"outputs": [],
465-
"source": []
466-
},
467409
{
468410
"cell_type": "code",
469411
"execution_count": 11,
@@ -524,6 +466,21 @@
524466
" )"
525467
]
526468
},
469+
{
470+
"cell_type": "markdown",
471+
"id": "848bb3ac-ee27-4b61-a98a-e3f24be767ac",
472+
"metadata": {},
473+
"source": [
474+
"The Mistral model will automatically infer the classes from the dataset. The list of inferred class names is as follows:\n",
475+
"\n",
476+
"- Address \n",
477+
"- Date_and_Time \n",
478+
"- Incident_Type \n",
479+
"- Number_of_Engines \n",
480+
"- Title \n",
481+
"- Time_spent_at_incident "
482+
]
483+
},
527484
{
528485
"cell_type": "markdown",
529486
"id": "1bea5234-52f3-43c3-9ff3-4575efd1f44b",
@@ -537,7 +494,7 @@
537494
"id": "a519c182-9fe5-4309-9879-3b54f233f641",
538495
"metadata": {},
539496
"source": [
540-
"The Mistral model utilizes in-context learning to generate predictions. Unlike traditional models that depend on lengthy training cycles, it can adapt using just a few examples and a prompt. By incorporating this information into the input, the Mistral model gains a better understanding and can make more accurate predictions without needing retraining."
497+
"The Mistral model utilizes in-context learning to generate predictions. Unlike traditional models that depend on lengthy training cycles, it can understand the task using just a few examples and a prompt. By incorporating this information into the input, the Mistral model gains a better understanding and can make more accurate predictions without needing retraining."
541498
]
542499
},
543500
{
@@ -983,28 +940,6 @@
983940
"Now we can use the trained model to extract entities from new text documents using `extract_entities()` method. This method expects the folder path of where new text document are located, or a list of text documents."
984941
]
985942
},
986-
{
987-
"cell_type": "code",
988-
"execution_count": 19,
989-
"id": "5cc7d161-794c-441a-a8a6-fd2eca430c2e",
990-
"metadata": {},
991-
"outputs": [
992-
{
993-
"data": {
994-
"text/plain": [
995-
"('C:\\\\Users\\\\sur11226\\\\AppData\\\\Local\\\\Temp\\\\information_extraction_from_cheshire_fire_incident_reports_using_mistral_language_model',\n",
996-
" '.zip')"
997-
]
998-
},
999-
"execution_count": 19,
1000-
"metadata": {},
1001-
"output_type": "execute_result"
1002-
}
1003-
],
1004-
"source": [
1005-
"os.path.splitext(filepath)"
1006-
]
1007-
},
1008943
{
1009944
"cell_type": "code",
1010945
"execution_count": 20,
@@ -1251,9 +1186,9 @@
12511186
],
12521187
"metadata": {
12531188
"kernelspec": {
1254-
"display_name": "Python [conda env:conda-dl_10_Sept_1] *",
1189+
"display_name": "Python [conda env:conda-arcgis_13_DEC_24] *",
12551190
"language": "python",
1256-
"name": "conda-env-conda-dl_10_Sept_1-py"
1191+
"name": "conda-env-conda-arcgis_13_DEC_24-py"
12571192
},
12581193
"language_info": {
12591194
"codemirror_mode": {
@@ -1265,7 +1200,7 @@
12651200
"name": "python",
12661201
"nbconvert_exporter": "python",
12671202
"pygments_lexer": "ipython3",
1268-
"version": "3.11.9"
1203+
"version": "3.11.10"
12691204
}
12701205
},
12711206
"nbformat": 4,

0 commit comments

Comments
 (0)