Skip to content

Commit 4dc1e80

Browse files
authored
Edit field extraction sample for clarity. Remove enable_face_identification flag. (#5)
1 parent 70cc67c commit 4dc1e80

File tree

3 files changed

+49
-50
lines changed

3 files changed

+49
-50
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ You can run this repo virtually by using GitHub Codespaces, which will open a we
3535

3636
Navigate to the `notebooks` directory and select the sample notebook you are interested in. Since Codespaces is pre-configured with the necessary environment, you can directly execute each step in the notebook.
3737

38-
### Notes
38+
## Notes
3939

4040
* **Trademarks** - This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos is subject to those third-party’s policies.
4141

notebooks/field_extraction.ipynb

Lines changed: 46 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -4,23 +4,23 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# Extract custom fields in your file"
7+
"# Extract Custom Fields from Your File"
88
]
99
},
1010
{
1111
"cell_type": "markdown",
1212
"metadata": {},
1313
"source": [
14-
"This notebook demonstrates how to use analyzers to extract custom fields from your input file."
14+
"This notebook demonstrates how to use analyzers to extract custom fields from your input files."
1515
]
1616
},
1717
{
1818
"cell_type": "markdown",
1919
"metadata": {},
2020
"source": [
2121
"## Prerequisites\n",
22-
"1. Follow steps in [README](../README.md#Configure-Azure-AI-Service-resource) to create `.env` file to configure your Azure AI Service.\n",
23-
"1. Install packages needed to run the sample"
22+
"1. Follow the steps in the [README](../README.md#Configure-Azure-AI-Service-resource) to create a `.env` file and configure your Azure AI Service.\n",
23+
"2. Install the required packages to run the sample."
2424
]
2525
},
2626
{
@@ -36,59 +36,65 @@
3636
"cell_type": "markdown",
3737
"metadata": {},
3838
"source": [
39-
"## Analyzer template examples"
39+
"## Analyzer Templates"
4040
]
4141
},
4242
{
4343
"cell_type": "markdown",
4444
"metadata": {},
4545
"source": [
46-
"Below is a collection of analyzer template examples designed to extract fields from various input file types.\n",
46+
"Below is a collection of analyzer templates designed to extract fields from various input file types.\n",
4747
"\n",
48-
"These templates are highly customizable, allowing you to modify them to suit your specific needs. For additional verified templates from Microsoft, please visit [HERE](../analyzer_templates/README.md)."
48+
"These templates are highly customizable, allowing you to modify them to suit your specific needs. For additional verified templates from Microsoft, please visit [here](../analyzer_templates/README.md)."
4949
]
5050
},
5151
{
5252
"cell_type": "code",
53-
"execution_count": 13,
53+
"execution_count": null,
5454
"metadata": {},
5555
"outputs": [],
5656
"source": [
57-
"extraction_samples = {\n",
58-
" \"sample_invoice\": ('../analyzer_templates/invoice.json', '../data/invoice.pdf'),\n",
59-
" \"sample_chart\": ('../analyzer_templates/image_chart.json', '../data/pieChart.jpg'),\n",
60-
" \"sample_call_transcript\": ('../analyzer_templates/call_transcript.json', '../data/callCenterRecording.mp3'),\n",
61-
" \"sample_marketing_video\": ('../analyzer_templates/marketing_video.json', '../data/video.mp4')\n",
57+
"extraction_templates = {\n",
58+
" \"invoice\": ('../analyzer_templates/invoice.json', '../data/invoice.pdf' ),\n",
59+
" \"chart\": ('../analyzer_templates/image_chart.json', '../data/pieChart.jpg' ),\n",
60+
" \"call_transcript\": ('../analyzer_templates/call_transcript.json', '../data/callCenterRecording.mp3'),\n",
61+
" \"marketing_video\": ('../analyzer_templates/marketing_video.json', '../data/video.mp4' )\n",
6262
"}"
6363
]
6464
},
6565
{
6666
"cell_type": "markdown",
6767
"metadata": {},
6868
"source": [
69-
"Set the target to the sample analyzer that you want to try."
69+
"Specify the analyzer template you want to use and provide a name for the analyzer to be created based on the template."
7070
]
7171
},
7272
{
7373
"cell_type": "code",
74-
"execution_count": 14,
74+
"execution_count": null,
7575
"metadata": {},
7676
"outputs": [],
7777
"source": [
78-
"target_sample = \"sample_invoice\""
78+
"import uuid\n",
79+
"\n",
80+
"ANALYZER_TEMPLATE = \"invoice\"\n",
81+
"ANALYZER_ID = \"field-extraction-sample-\" + str(uuid.uuid4())\n",
82+
"\n",
83+
"(analyzer_template_path, analyzer_sample_file_path) = extraction_templates[ANALYZER_TEMPLATE]"
7984
]
8085
},
8186
{
8287
"cell_type": "markdown",
8388
"metadata": {},
8489
"source": [
85-
"## Create Azure content understanding client\n",
86-
">The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is utility Class which contain the functions to interact with the Content Understanding server. Before Content Understanding SDK release, we can regard it as a lightweight SDK. Fill the constant **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, **AZURE_AI_API_KEY** with the information from your Azure AI Service."
90+
"## Create Azure AI Content Understanding Client\n",
91+
"\n",
92+
"> The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is a utility class containing functions to interact with the Content Understanding API. Before the official release of the Content Understanding SDK, it can be regarded as a lightweight SDK.\n"
8793
]
8894
},
8995
{
9096
"cell_type": "code",
91-
"execution_count": 15,
97+
"execution_count": null,
9298
"metadata": {},
9399
"outputs": [],
94100
"source": [
@@ -98,19 +104,22 @@
98104
"import sys\n",
99105
"from dotenv import find_dotenv, load_dotenv\n",
100106
"\n",
101-
"# import utility package from python samples root directory\n",
107+
"load_dotenv(find_dotenv())\n",
108+
"logging.basicConfig(level=logging.INFO)\n",
109+
"\n",
110+
"AZURE_AI_ENDPOINT = os.getenv(\"AZURE_AI_ENDPOINT\")\n",
111+
"AZURE_AI_API_KEY = os.getenv(\"AZURE_AI_API_KEY\")\n",
112+
"AZURE_AI_API_VERSION = os.getenv(\"AZURE_AI_API_VERSION\", \"2024-12-01-preview\")\n",
113+
"\n",
114+
"# Import utility package from python samples root directory\n",
102115
"py_samples_root_dir = os.path.abspath(os.path.join(os.getcwd(), \"..\"))\n",
103116
"sys.path.append(py_samples_root_dir)\n",
104117
"from python.content_understanding_client import AzureContentUnderstandingClient\n",
105118
"\n",
106-
"load_dotenv(find_dotenv())\n",
107-
"logging.basicConfig(level=logging.INFO)\n",
108-
"\n",
109119
"client = AzureContentUnderstandingClient(\n",
110-
" endpoint=os.getenv(\"AZURE_AI_ENDPOINT\"),\n",
111-
" api_version=os.getenv(\"AZURE_AI_API_VERSION\", \"2024-12-01-preview\"),\n",
112-
" subscription_key=os.getenv(\"AZURE_AI_API_KEY\"),\n",
113-
" api_token=os.getenv(\"AZURE_AI_API_TOKEN\"),\n",
120+
" endpoint=AZURE_AI_ENDPOINT,\n",
121+
" subscription_key=AZURE_AI_API_KEY,\n",
122+
" api_version=AZURE_AI_API_VERSION,\n",
114123
" x_ms_useragent=\"azure-ai-content-understanding-python/field_extraction\",\n",
115124
")"
116125
]
@@ -119,13 +128,12 @@
119128
"cell_type": "markdown",
120129
"metadata": {},
121130
"source": [
122-
"## Create analyzer with defined schema\n",
123-
"Before creating the custom fields analyzer, you should fill the constant ANALYZER_ID with a business-related name. Here we randomly generate a name for demo purpose."
131+
"## Create Analyzer from the Template"
124132
]
125133
},
126134
{
127135
"cell_type": "code",
128-
"execution_count": 16,
136+
"execution_count": null,
129137
"metadata": {},
130138
"outputs": [
131139
{
@@ -188,20 +196,17 @@
188196
}
189197
],
190198
"source": [
191-
"import uuid\n",
192-
"ANALYZER_ID = \"extraction-sample-\" + str(uuid.uuid4())\n",
193-
"\n",
194-
"response = client.begin_create_analyzer(ANALYZER_ID, analyzer_schema_path=extraction_samples[target_sample][0])\n",
199+
"response = client.begin_create_analyzer(ANALYZER_ID, analyzer_template_path)\n",
195200
"result = client.poll_result(response)\n",
196201
"\n",
197-
"logging.info(json.dumps(result, indent=2))"
202+
"print(json.dumps(result, indent=2))"
198203
]
199204
},
200205
{
201206
"cell_type": "markdown",
202207
"metadata": {},
203208
"source": [
204-
"## Use created analyzer to extract document content\n"
209+
"## Extract Fields Using the Analyzer"
205210
]
206211
},
207212
{
@@ -213,7 +218,7 @@
213218
},
214219
{
215220
"cell_type": "code",
216-
"execution_count": 17,
221+
"execution_count": null,
217222
"metadata": {},
218223
"outputs": [
219224
{
@@ -362,19 +367,18 @@
362367
}
363368
],
364369
"source": [
365-
"response = client.begin_analyze(ANALYZER_ID, file_location=extraction_samples[target_sample][1])\n",
370+
"response = client.begin_analyze(ANALYZER_ID, file_location=analyzer_sample_file_path)\n",
366371
"result = client.poll_result(response)\n",
367372
"\n",
368-
"logging.info(json.dumps(result, indent=2))"
373+
"json.dumps(result, indent=2)"
369374
]
370375
},
371376
{
372377
"cell_type": "markdown",
373378
"metadata": {},
374379
"source": [
375-
"## Delete exist analyzer in Content Understanding Service\n",
376-
"This snippet is not required, but it's only used to prevent the testing analyzer from residing in your service. The custom fields analyzer could be stored in your service for reusing by subsequent business in real usage scenarios.\n",
377-
"\n"
380+
"## Clean Up\n",
381+
"Optionally, delete the sample analyzer from your resource. In typical usage scenarios, you would analyze multiple files using the same analyzer."
378382
]
379383
},
380384
{

python/content_understanding_client.py

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ def __init__(self,
1212
api_version: str,
1313
subscription_key: str = None,
1414
api_token: str = None,
15-
enable_face_identification: bool = False,
1615
x_ms_useragent: str = "cu-sample-code"):
1716
if not subscription_key and not api_token:
1817
raise ValueError(
@@ -25,8 +24,7 @@ def __init__(self,
2524
self._endpoint = endpoint.rstrip("/")
2625
self._api_version = api_version
2726
self._logger = logging.getLogger(__name__)
28-
self._headers = self._get_headers(subscription_key, api_token,
29-
enable_face_identification, x_ms_useragent)
27+
self._headers = self._get_headers(subscription_key, api_token, x_ms_useragent)
3028

3129
def _get_analyzer_url(self, endpoint, api_version, analyzer_id):
3230
return f"{endpoint}/contentunderstanding/analyzers/{analyzer_id}?api-version={api_version}" # noqa
@@ -45,8 +43,7 @@ def _get_training_data_config(self, storage_container_sas_url,
4543
"prefix": storage_container_path_prefix,
4644
}
4745

48-
def _get_headers(self, subscription_key, api_token,
49-
enable_face_identification, x_ms_useragent):
46+
def _get_headers(self, subscription_key, api_token, x_ms_useragent):
5047
""" Returns the headers for the HTTP requests.
5148
Args:
5249
subscription_key (str): The subscription key for the service.
@@ -61,8 +58,6 @@ def _get_headers(self, subscription_key, api_token,
6158
"Authorization": f"Bearer {api_token}"
6259
}
6360
headers["x-ms-useragent"] = x_ms_useragent
64-
if enable_face_identification:
65-
headers["cogsvc-videoanalysis-face-identification-enable"] = "true"
6661
return headers
6762

6863
def get_all_analyzers(self):

0 commit comments

Comments
 (0)