Skip to content

Review main-notebooks/conversational_field_extraction.ipynb #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 21 additions & 19 deletions notebooks/conversational_field_extraction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,22 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Extract Custom Fields from Your Pretranscribed File"
"# Extract Custom Fields from Your Pre-transcribed File"
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Consistency, Clarity]
    • change: Changed "Pretranscribed" to "Pre-transcribed" by adding a hyphen
    • rationale: The hyphen clarifies the compound adjective, ensuring consistent and clear terminology throughout the documentation
    • impact: Improves readability and maintains consistent formatting of compound terms in the documentation

},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook demonstrates how to use analyzers to extract custom fields from your transcription input files."
"This notebook demonstrates how to use analyzers to extract custom fields from your pre-transcribed input files."
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Clarity]
    • change: Replaced "your transcription input files" with "your pre-transcribed input files."
    • rationale: The revised phrase more accurately describes the type of input files expected, emphasizing that the transcription has already been completed.
    • impact: This change improves clarity by better setting user expectations regarding the nature of the input data, reducing potential confusion.

},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"1. Ensure Azure AI service is configured following [steps](../README.md#configure-azure-ai-service-resource)\n",
"1. Ensure your Azure AI service is configured by following the [configuration steps](../README.md#configure-azure-ai-service-resource).\n",
"2. Install the required packages to run the sample."
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Clarity, Grammar]
    • change: Rephrased the instruction from "Ensure Azure AI service is configured following [steps]" to "Ensure your Azure AI service is configured by following the [configuration steps]"
    • rationale: This change clarifies the sentence structure, adds possessive pronoun "your" for personalization, and makes the action explicit and easier to understand. It also improves grammatical flow by changing "following [steps]" to "by following the [configuration steps]."
    • impact: The updated instruction is clearer and more grammatically correct, enhancing reader comprehension and usability of the documentation.

]
},
Expand All @@ -45,7 +45,7 @@
"source": [
"Below is a collection of analyzer templates designed to extract fields from various input file types.\n",
"\n",
"These templates are highly customizable, allowing you to modify them to suit your specific needs. For additional verified templates from Microsoft, please visit [here](../analyzer_templates/README.md)."
"These templates are highly customizable, allowing you to adapt them to your specific requirements. For additional verified templates provided by Microsoft, please visit [here](../analyzer_templates/README.md)."
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Clarity]

    • change: Replaced "modify them to suit your specific needs" with "adapt them to your specific requirements."
    • rationale: The wording "adapt" and "specific requirements" is a clearer and more formal expression that precisely conveys customization in a professional context.
    • impact: Enhances the readability and professionalism of the documentation, making the customization capabilities easier to understand.
  • categories: [Clarity]

    • change: Changed "from Microsoft" to "provided by Microsoft."
    • rationale: Adding "provided by" makes the attribution to Microsoft more explicit and formal.
    • impact: Improves clarity regarding the source of the additional templates, which helps users trust and identify the origin of those resources.

},
{
Expand All @@ -65,7 +65,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Specify the analyzer template you want to use and provide a name for the analyzer to be created based on the template."
"Specify the analyzer template to use and assign a unique name for the analyzer that will be created from the template."
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Clarity, Grammar]
    • change: Reworded the sentence from "Specify the analyzer template you want to use and provide a name for the analyzer to be created based on the template." to "Specify the analyzer template to use and assign a unique name for the analyzer that will be created from the template."
    • rationale: Simplified the phrasing to make the instruction more direct and clear, and emphasized the need for the name to be unique.
    • impact: Improves readability and ensures users understand that the assigned name must be unique, reducing potential confusion.

},
{
Expand All @@ -88,14 +88,16 @@
"source": [
"## Create Azure AI Content Understanding Client\n",
"\n",
"> The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is a utility class containing functions to interact with the Content Understanding API. Before the official release of the Content Understanding SDK, it can be regarded as a lightweight SDK. Fill the constant **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, **AZURE_AI_API_KEY** with the information from your Azure AI Service.\n",
"> The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is a utility class providing functions to interact with the Content Understanding API. Before the official release of the Content Understanding SDK, this class can be considered a lightweight SDK.\n",
"\n",
"> Fill in the constants **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, and **AZURE_AI_API_KEY** with your Azure AI Service credentials.\n",
"\n",
"> ⚠️ Important:\n",
"You must update the code below to match your Azure authentication method.\n",
"Make sure to update the code below to match your chosen Azure authentication method.\n",
"Look for the `# IMPORTANT` comments and modify those sections accordingly.\n",
"If you skip this step, the sample may not run correctly.\n",
"Skipping this step may prevent the sample from running correctly.\n",
"\n",
"> ⚠️ Note: Using a subscription key works, but using a token provider with Azure Active Directory (AAD) is much safer and is highly recommended for production environments."
"> ⚠️ Note: While subscription key authentication works, it is strongly recommended to use a token provider with Azure Active Directory (AAD) for improved security in production environments."
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Clarity, Grammar]

    • change: Revised the description of the AzureContentUnderstandingClient to say it "provides functions" instead of "containing functions," and rephrased the sentence about the SDK to be more concise.
    • rationale: To improve readability and eliminate awkward phrasing.
    • impact: Makes the introduction clearer and easier to understand.
  • categories: [Formatting, Clarity]

    • change: Split a long sentence about filling in constants into a separate line with clearer phrasing and added missing commas for consistency.
    • rationale: Improves readability by breaking information into digestible parts and enhancing clarity.
    • impact: Users can more easily identify and understand the required constants and how to fill them.
  • categories: [Clarity, Grammar]

    • change: Changed "You must update the code below" to "Make sure to update the code below," and rephrased subsequent sentences for a softer, more instructive tone.
    • rationale: This phrasing is less commanding and more user-friendly, improving engagement.
    • impact: Encourages users to carefully follow authentication instructions without sounding overly strict.
  • categories: [Clarity, Grammar]

    • change: Reworded the warning about skipping steps from "If you skip this step, the sample may not run correctly" to "Skipping this step may prevent the sample from running correctly."
    • rationale: The revised sentence is more direct and avoids unnecessary conditional phrasing.
    • impact: Enhances message clarity, making the consequence of skipping the step more apparent.
  • categories: [Clarity, Consistency]

    • change: Rephrased the note about authentication methods to emphasize that subscription key works but using AAD token provider is "strongly recommended" and "improves security" in production.
    • rationale: Provides a stronger security recommendation and clarifies the advantage of using AAD token provider.
    • impact: Helps users better understand best security practices and encourages adoption of safer authentication in production.

},
{
Expand All @@ -115,13 +117,13 @@
"load_dotenv(find_dotenv())\n",
"logging.basicConfig(level=logging.INFO)\n",
"\n",
"# For authentication, you can use either token-based auth or subscription key, and only one of them is required\n",
"# For authentication, you may use either token-based auth or a subscription key; only one is required.\n",
"AZURE_AI_ENDPOINT = os.getenv(\"AZURE_AI_ENDPOINT\")\n",
"# IMPORTANT: Replace with your actual subscription key or set up in \".env\" file if not using token auth\n",
"# IMPORTANT: Replace with your actual subscription key or configure it in the \".env\" file if not using token authentication.\n",
"AZURE_AI_API_KEY = os.getenv(\"AZURE_AI_API_KEY\")\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar, Clarity, Consistency]

    • change: Changed "can use either token-based auth or subscription key, and only one of them is required" to "may use either token-based auth or a subscription key; only one is required."
    • rationale: Improved sentence structure by replacing "can" with "may" for formality, adding an article before "subscription key," and using a semicolon to separate clauses for better readability.
    • impact: Enhances the professionalism and clarity of the authentication instructions, making them easier to understand.
  • categories: [Grammar, Clarity, Consistency]

    • change: Revised "# IMPORTANT: Replace with your actual subscription key or set up in ".env" file if not using token auth" to "# IMPORTANT: Replace with your actual subscription key or configure it in the ".env" file if not using token authentication."
    • rationale: Replaced informal phrase "set up" with "configure," added "it" for grammatical completeness, replaced "auth" with "authentication" for consistency and formality, and included the definite article "the" before ".env file."
    • impact: Provides clearer and more formal guidance, improving the professionalism and consistency of the documentation.

"AZURE_AI_API_VERSION = os.getenv(\"AZURE_AI_API_VERSION\", \"2025-05-01-preview\")\n",
"\n",
"# Add the parent directory to the path to use shared modules\n",
"# Add the parent directory to the system path to access shared modules\n",
"parent_dir = Path(Path.cwd()).parent\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Clarity]
    • change: Modified the comment from "Add the parent directory to the path to use shared modules" to "Add the parent directory to the system path to access shared modules"
    • rationale: The updated comment explicitly specifies "system path," making it clearer what path is being modified. Additionally, "access" is more precise than "use" in this context.
    • impact: Enhances the clarity of the comment, helping readers better understand the purpose of the code.

"sys.path.append(str(parent_dir))\n",
"from python.content_understanding_client import AzureContentUnderstandingClient\n",
Expand All @@ -134,9 +136,9 @@
" api_version=AZURE_AI_API_VERSION,\n",
" # IMPORTANT: Comment out token_provider if using subscription key\n",
" token_provider=token_provider,\n",
" # IMPORTANT: Uncomment this if using subscription key\n",
" # IMPORTANT: Uncomment the following line if using subscription key\n",
" # subscription_key=AZURE_AI_API_KEY,\n",
" # x_ms_useragent=\"azure-ai-content-understanding-python/field_extraction\", # This header is used for sample usage telemetry, please comment out this line if you want to opt out.\n",
" # x_ms_useragent=\"azure-ai-content-understanding-python/field_extraction\", # This header is used for sample usage telemetry. Comment out if you want to opt out.\n",
")"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Clarity, Grammar]

    • change: Changed "Uncomment this if using subscription key" to "Uncomment the following line if using subscription key"
    • rationale: Adding "the following line" clarifies exactly what should be uncommented, improving the instruction's specificity and grammatical correctness.
    • impact: Helps users more easily understand which line needs action, reducing potential confusion during setup.
  • categories: [Clarity, Grammar, Formatting]

    • change: Modified inline comment from "please comment out this line if you want to opt out." to "Comment out if you want to opt out." and added two spaces before the inline comment hash to separate it from the code.
    • rationale: The revised comment is more concise and uses sentence case, enhancing readability. The added spacing improves formatting and visual separation between code and comment.
    • impact: Makes the comment clearer and easier to read, facilitating better understanding of telemetry opt-out instructions.

]
},
Expand Down Expand Up @@ -170,7 +172,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"After the analyzer is successfully created, we can use it to analyze our input files."
"Once the analyzer is successfully created, you can use it to analyze your input files."
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Clarity, Consistency]
    • change: Changed "After the analyzer is successfully created, we can use it to analyze our input files." to "Once the analyzer is successfully created, you can use it to analyze your input files."
    • rationale: The change shifts from a passive collective voice ("we" and "our") to a more direct and consistent second-person instruction ("you" and "your"), making the sentence clearer and more engaging for the reader. "Once" is also a clearer temporal transition than "After" in this context.
    • impact: Enhances reader engagement and makes the instructions more direct and easier to follow, improving overall documentation clarity.

},
{
Expand All @@ -181,14 +183,14 @@
"source": [
"from python.extension.transcripts_processor import TranscriptsProcessor\n",
"\n",
"test_file_path=analyzer_sample_file_path\n",
"test_file_path = analyzer_sample_file_path\n",
"\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Formatting, Consistency]
    • change: Added spaces around the assignment operator in the statement test_file_path = analyzer_sample_file_path.
    • rationale: Ensures consistent spacing around operators as per common Python style guidelines (PEP 8).
    • impact: Improves code readability and maintains uniform formatting throughout the codebase.

"transcripts_processor = TranscriptsProcessor()\n",
"webvtt_output, webvtt_output_file_path = transcripts_processor.convert_file(test_file_path)\n",
"\n",
"if \"WEBVTT\" not in webvtt_output:\n",
" print(\"Error: The output is not in WebVTT format.\")\n",
"else: \n",
"else:\n",
" response = client.begin_analyze(CUSTOM_ANALYZER_ID, file_location=webvtt_output_file_path)\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Formatting]
    • change: Removed trailing spaces after the colon in the else: statement
    • rationale: Trailing spaces are unnecessary and can clutter the code, making it less clean
    • impact: Improves code cleanliness and adheres to standard formatting conventions, enhancing readability

" print(\"Response:\", response)\n",
" result_json = client.poll_result(response)\n",
Expand All @@ -201,7 +203,7 @@
"metadata": {},
"source": [
"## Clean Up\n",
"Optionally, delete the sample analyzer from your resource. In typical usage scenarios, you would analyze multiple files using the same analyzer."
"Optionally, delete the sample analyzer from your Azure resource. In typical usage scenarios, you would analyze multiple files using the same analyzer."
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Clarity, Consistency]
    • change: Added the word "Azure" before "resource" in the sentence.
    • rationale: Specifying "Azure resource" clarifies the context and ensures consistency by explicitly identifying the platform related to the resource.
    • impact: Improves user understanding by clearly indicating the environment, reducing potential ambiguity.

},
{
Expand Down Expand Up @@ -235,4 +237,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Formatting]
    • change: Added a trailing newline after a closing brace }
    • rationale: Ensures the file ends with a newline character, adhering to common formatting standards
    • impact: Improves compatibility with tools that expect files to end with a newline and enhances consistency across the codebase