-
Notifications
You must be signed in to change notification settings - Fork 28
Review main-notebooks/conversational_field_extraction.ipynb
#54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,22 +4,22 @@ | |
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Extract Custom Fields from Your Pretranscribed File" | ||
"# Extract Custom Fields from Your Pre-transcribed File" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"This notebook demonstrates how to use analyzers to extract custom fields from your transcription input files." | ||
"This notebook demonstrates how to use analyzers to extract custom fields from your pre-transcribed input files." | ||
] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Prerequisites\n", | ||
"1. Ensure Azure AI service is configured following [steps](../README.md#configure-azure-ai-service-resource)\n", | ||
"1. Ensure your Azure AI service is configured by following the [configuration steps](../README.md#configure-azure-ai-service-resource).\n", | ||
"2. Install the required packages to run the sample." | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
] | ||
}, | ||
|
@@ -45,7 +45,7 @@ | |
"source": [ | ||
"Below is a collection of analyzer templates designed to extract fields from various input file types.\n", | ||
"\n", | ||
"These templates are highly customizable, allowing you to modify them to suit your specific needs. For additional verified templates from Microsoft, please visit [here](../analyzer_templates/README.md)." | ||
"These templates are highly customizable, allowing you to adapt them to your specific requirements. For additional verified templates provided by Microsoft, please visit [here](../analyzer_templates/README.md)." | ||
] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
}, | ||
{ | ||
|
@@ -65,7 +65,7 @@ | |
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Specify the analyzer template you want to use and provide a name for the analyzer to be created based on the template." | ||
"Specify the analyzer template to use and assign a unique name for the analyzer that will be created from the template." | ||
] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
}, | ||
{ | ||
|
@@ -88,14 +88,16 @@ | |
"source": [ | ||
"## Create Azure AI Content Understanding Client\n", | ||
"\n", | ||
"> The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is a utility class containing functions to interact with the Content Understanding API. Before the official release of the Content Understanding SDK, it can be regarded as a lightweight SDK. Fill the constant **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, **AZURE_AI_API_KEY** with the information from your Azure AI Service.\n", | ||
"> The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is a utility class providing functions to interact with the Content Understanding API. Before the official release of the Content Understanding SDK, this class can be considered a lightweight SDK.\n", | ||
"\n", | ||
"> Fill in the constants **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, and **AZURE_AI_API_KEY** with your Azure AI Service credentials.\n", | ||
"\n", | ||
"> ⚠️ Important:\n", | ||
"You must update the code below to match your Azure authentication method.\n", | ||
"Make sure to update the code below to match your chosen Azure authentication method.\n", | ||
"Look for the `# IMPORTANT` comments and modify those sections accordingly.\n", | ||
"If you skip this step, the sample may not run correctly.\n", | ||
"Skipping this step may prevent the sample from running correctly.\n", | ||
"\n", | ||
"> ⚠️ Note: Using a subscription key works, but using a token provider with Azure Active Directory (AAD) is much safer and is highly recommended for production environments." | ||
"> ⚠️ Note: While subscription key authentication works, it is strongly recommended to use a token provider with Azure Active Directory (AAD) for improved security in production environments." | ||
] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
}, | ||
{ | ||
|
@@ -115,13 +117,13 @@ | |
"load_dotenv(find_dotenv())\n", | ||
"logging.basicConfig(level=logging.INFO)\n", | ||
"\n", | ||
"# For authentication, you can use either token-based auth or subscription key, and only one of them is required\n", | ||
"# For authentication, you may use either token-based auth or a subscription key; only one is required.\n", | ||
"AZURE_AI_ENDPOINT = os.getenv(\"AZURE_AI_ENDPOINT\")\n", | ||
"# IMPORTANT: Replace with your actual subscription key or set up in \".env\" file if not using token auth\n", | ||
"# IMPORTANT: Replace with your actual subscription key or configure it in the \".env\" file if not using token authentication.\n", | ||
"AZURE_AI_API_KEY = os.getenv(\"AZURE_AI_API_KEY\")\n", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
"AZURE_AI_API_VERSION = os.getenv(\"AZURE_AI_API_VERSION\", \"2025-05-01-preview\")\n", | ||
"\n", | ||
"# Add the parent directory to the path to use shared modules\n", | ||
"# Add the parent directory to the system path to access shared modules\n", | ||
"parent_dir = Path(Path.cwd()).parent\n", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
"sys.path.append(str(parent_dir))\n", | ||
"from python.content_understanding_client import AzureContentUnderstandingClient\n", | ||
|
@@ -134,9 +136,9 @@ | |
" api_version=AZURE_AI_API_VERSION,\n", | ||
" # IMPORTANT: Comment out token_provider if using subscription key\n", | ||
" token_provider=token_provider,\n", | ||
" # IMPORTANT: Uncomment this if using subscription key\n", | ||
" # IMPORTANT: Uncomment the following line if using subscription key\n", | ||
" # subscription_key=AZURE_AI_API_KEY,\n", | ||
" # x_ms_useragent=\"azure-ai-content-understanding-python/field_extraction\", # This header is used for sample usage telemetry, please comment out this line if you want to opt out.\n", | ||
" # x_ms_useragent=\"azure-ai-content-understanding-python/field_extraction\", # This header is used for sample usage telemetry. Comment out if you want to opt out.\n", | ||
")" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
] | ||
}, | ||
|
@@ -170,7 +172,7 @@ | |
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"After the analyzer is successfully created, we can use it to analyze our input files." | ||
"Once the analyzer is successfully created, you can use it to analyze your input files." | ||
] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
}, | ||
{ | ||
|
@@ -181,14 +183,14 @@ | |
"source": [ | ||
"from python.extension.transcripts_processor import TranscriptsProcessor\n", | ||
"\n", | ||
"test_file_path=analyzer_sample_file_path\n", | ||
"test_file_path = analyzer_sample_file_path\n", | ||
"\n", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
"transcripts_processor = TranscriptsProcessor()\n", | ||
"webvtt_output, webvtt_output_file_path = transcripts_processor.convert_file(test_file_path)\n", | ||
"\n", | ||
"if \"WEBVTT\" not in webvtt_output:\n", | ||
" print(\"Error: The output is not in WebVTT format.\")\n", | ||
"else: \n", | ||
"else:\n", | ||
" response = client.begin_analyze(CUSTOM_ANALYZER_ID, file_location=webvtt_output_file_path)\n", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
" print(\"Response:\", response)\n", | ||
" result_json = client.poll_result(response)\n", | ||
|
@@ -201,7 +203,7 @@ | |
"metadata": {}, | ||
"source": [ | ||
"## Clean Up\n", | ||
"Optionally, delete the sample analyzer from your resource. In typical usage scenarios, you would analyze multiple files using the same analyzer." | ||
"Optionally, delete the sample analyzer from your Azure resource. In typical usage scenarios, you would analyze multiple files using the same analyzer." | ||
] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
}, | ||
{ | ||
|
@@ -235,4 +237,4 @@ | |
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.