|
4 | 4 | "cell_type": "markdown",
|
5 | 5 | "metadata": {},
|
6 | 6 | "source": [
|
7 |
| - "# Extract Custom Fields from Your Pretranscribed File" |
| 7 | + "# Extract Custom Fields from Your Pre-transcribed File" |
8 | 8 | ]
|
9 | 9 | },
|
10 | 10 | {
|
11 | 11 | "cell_type": "markdown",
|
12 | 12 | "metadata": {},
|
13 | 13 | "source": [
|
14 |
| - "This notebook demonstrates how to use analyzers to extract custom fields from your transcription input files." |
| 14 | + "This notebook demonstrates how to use analyzers to extract custom fields from your pre-transcribed input files." |
15 | 15 | ]
|
16 | 16 | },
|
17 | 17 | {
|
18 | 18 | "cell_type": "markdown",
|
19 | 19 | "metadata": {},
|
20 | 20 | "source": [
|
21 | 21 | "## Prerequisites\n",
|
22 |
| - "1. Ensure Azure AI service is configured following [steps](../README.md#configure-azure-ai-service-resource)\n", |
| 22 | + "1. Ensure your Azure AI service is configured by following the [configuration steps](../README.md#configure-azure-ai-service-resource).\n", |
23 | 23 | "2. Install the required packages to run the sample."
|
24 | 24 | ]
|
25 | 25 | },
|
|
45 | 45 | "source": [
|
46 | 46 | "Below is a collection of analyzer templates designed to extract fields from various input file types.\n",
|
47 | 47 | "\n",
|
48 |
| - "These templates are highly customizable, allowing you to modify them to suit your specific needs. For additional verified templates from Microsoft, please visit [here](../analyzer_templates/README.md)." |
| 48 | + "These templates are highly customizable, allowing you to adapt them to your specific requirements. For additional verified templates provided by Microsoft, please visit [here](../analyzer_templates/)." |
49 | 49 | ]
|
50 | 50 | },
|
51 | 51 | {
|
|
65 | 65 | "cell_type": "markdown",
|
66 | 66 | "metadata": {},
|
67 | 67 | "source": [
|
68 |
| - "Specify the analyzer template you want to use and provide a name for the analyzer to be created based on the template." |
| 68 | + "Specify the analyzer template to use and assign a unique name for the analyzer that will be created from the template." |
69 | 69 | ]
|
70 | 70 | },
|
71 | 71 | {
|
|
88 | 88 | "source": [
|
89 | 89 | "## Create Azure AI Content Understanding Client\n",
|
90 | 90 | "\n",
|
91 |
| - "> The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is a utility class containing functions to interact with the Content Understanding API. Before the official release of the Content Understanding SDK, it can be regarded as a lightweight SDK. Fill the constant **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, **AZURE_AI_API_KEY** with the information from your Azure AI Service.\n", |
| 91 | + "> The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is a utility class providing functions to interact with the Content Understanding API. Before the official release of the Content Understanding SDK, this class can be considered a lightweight SDK.\n", |
| 92 | + "\n", |
| 93 | + "> Fill in the constants **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, and **AZURE_AI_API_KEY** with your Azure AI Service credentials.\n", |
92 | 94 | "\n",
|
93 | 95 | "> ⚠️ Important:\n",
|
94 |
| - "You must update the code below to match your Azure authentication method.\n", |
| 96 | + "Make sure to update the code below to match your chosen Azure authentication method.\n", |
95 | 97 | "Look for the `# IMPORTANT` comments and modify those sections accordingly.\n",
|
96 |
| - "If you skip this step, the sample may not run correctly.\n", |
| 98 | + "Skipping this step may prevent the sample from running correctly.\n", |
97 | 99 | "\n",
|
98 |
| - "> ⚠️ Note: Using a subscription key works, but using a token provider with Azure Active Directory (AAD) is much safer and is highly recommended for production environments." |
| 100 | + "> ⚠️ Note: While subscription key authentication works, it is strongly recommended to use a token provider with Azure Active Directory (AAD) for improved security in production environments." |
99 | 101 | ]
|
100 | 102 | },
|
101 | 103 | {
|
|
115 | 117 | "load_dotenv(find_dotenv())\n",
|
116 | 118 | "logging.basicConfig(level=logging.INFO)\n",
|
117 | 119 | "\n",
|
118 |
| - "# For authentication, you can use either token-based auth or subscription key, and only one of them is required\n", |
| 120 | + "# For authentication, you may use either token-based auth or a subscription key; only one is required.\n", |
119 | 121 | "AZURE_AI_ENDPOINT = os.getenv(\"AZURE_AI_ENDPOINT\")\n",
|
120 |
| - "# IMPORTANT: Replace with your actual subscription key or set up in \".env\" file if not using token auth\n", |
| 122 | + "# IMPORTANT: Replace with your actual subscription key or configure it in the \".env\" file if not using token authentication.\n", |
121 | 123 | "AZURE_AI_API_KEY = os.getenv(\"AZURE_AI_API_KEY\")\n",
|
122 | 124 | "AZURE_AI_API_VERSION = os.getenv(\"AZURE_AI_API_VERSION\", \"2025-05-01-preview\")\n",
|
123 | 125 | "\n",
|
124 |
| - "# Add the parent directory to the path to use shared modules\n", |
| 126 | + "# Add the parent directory to the system path to access shared modules\n", |
125 | 127 | "parent_dir = Path(Path.cwd()).parent\n",
|
126 | 128 | "sys.path.append(str(parent_dir))\n",
|
127 | 129 | "from python.content_understanding_client import AzureContentUnderstandingClient\n",
|
|
134 | 136 | " api_version=AZURE_AI_API_VERSION,\n",
|
135 | 137 | " # IMPORTANT: Comment out token_provider if using subscription key\n",
|
136 | 138 | " token_provider=token_provider,\n",
|
137 |
| - " # IMPORTANT: Uncomment this if using subscription key\n", |
| 139 | + " # IMPORTANT: Uncomment the following line if using subscription key\n", |
138 | 140 | " # subscription_key=AZURE_AI_API_KEY,\n",
|
139 |
| - " # x_ms_useragent=\"azure-ai-content-understanding-python/field_extraction\", # This header is used for sample usage telemetry, please comment out this line if you want to opt out.\n", |
| 141 | + " # x_ms_useragent=\"azure-ai-content-understanding-python/field_extraction\", # This header is used for sample usage telemetry. Please comment out if you want to opt out.\n", |
140 | 142 | ")"
|
141 | 143 | ]
|
142 | 144 | },
|
|
170 | 172 | "cell_type": "markdown",
|
171 | 173 | "metadata": {},
|
172 | 174 | "source": [
|
173 |
| - "After the analyzer is successfully created, we can use it to analyze our input files." |
| 175 | + "Once the analyzer is successfully created, you can use it to analyze your input files." |
174 | 176 | ]
|
175 | 177 | },
|
176 | 178 | {
|
|
181 | 183 | "source": [
|
182 | 184 | "from python.extension.transcripts_processor import TranscriptsProcessor\n",
|
183 | 185 | "\n",
|
184 |
| - "test_file_path=analyzer_sample_file_path\n", |
| 186 | + "test_file_path = analyzer_sample_file_path\n", |
185 | 187 | "\n",
|
186 | 188 | "transcripts_processor = TranscriptsProcessor()\n",
|
187 | 189 | "webvtt_output, webvtt_output_file_path = transcripts_processor.convert_file(test_file_path)\n",
|
188 | 190 | "\n",
|
189 | 191 | "if \"WEBVTT\" not in webvtt_output:\n",
|
190 | 192 | " print(\"Error: The output is not in WebVTT format.\")\n",
|
191 |
| - "else: \n", |
| 193 | + "else:\n", |
192 | 194 | " response = client.begin_analyze(CUSTOM_ANALYZER_ID, file_location=webvtt_output_file_path)\n",
|
193 | 195 | " print(\"Response:\", response)\n",
|
194 | 196 | " result_json = client.poll_result(response)\n",
|
|
201 | 203 | "metadata": {},
|
202 | 204 | "source": [
|
203 | 205 | "## Clean Up\n",
|
204 |
| - "Optionally, delete the sample analyzer from your resource. In typical usage scenarios, you would analyze multiple files using the same analyzer." |
| 206 | + "Optionally, delete the sample analyzer from your Azure resource. In typical usage scenarios, you would analyze multiple files using the same analyzer." |
205 | 207 | ]
|
206 | 208 | },
|
207 | 209 | {
|
|
0 commit comments