Skip to content

Review main-notebooks/field_extraction_pro_mode.ipynb #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chienyuanchang
Copy link
Collaborator

Automated review and documentation improvements for notebooks/field_extraction_pro_mode.ipynb on branch main

LLM usage details:

  • Total tokens: 11406
  • Prompt tokens: 5883
  • Completion tokens: 5523
  • Used deployment: cu-samples-gpt-4.1-mini
  • API version: 2024-12-01-preview

Copy link
Collaborator Author

@chienyuanchang chienyuanchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated LLM code review (section-based).

LLM usage details:

  • Total tokens used: 16457.
  • Used deployment: cu-samples-gpt-4.1-mini
  • API version: 2024-12-01-preview

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Conduct complex analysis with Pro mode\n",
"# Conduct Complex Analysis with Pro Mode\n",
"\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Consistency]
    • change: Capitalized key words in the comment line to title case ("Conduct Complex Analysis with Pro Mode" instead of "Conduct complex analysis with Pro mode")
    • rationale: To maintain consistent capitalization style for headings or important comments, aligning with common title case conventions.
    • impact: Enhances readability and gives a more professional and uniform appearance to the documentation.

@@ -13,7 +13,7 @@
">\n",
"> #################################################################################\n",
"\n",
"This notebook demonstrates how to use **Pro mode** in Azure AI Content Understanding to enhance your analyzer with multiple inputs and optional reference data. Pro mode is designed for advanced use cases, particularly those requiring multi-step reasoning, and complex decision-making (for instance, identifying inconsistencies, drawing inferences, and making sophisticated decisions). Pro mode allows input from multiple content files and includes the option to provide reference data at analyzer creation time.\n",
"This notebook demonstrates how to use **Pro mode** in Azure AI Content Understanding to enhance your analyzer with multiple inputs and optional reference data. Pro mode is designed for advanced use cases, particularly those requiring multi-step reasoning and complex decision-making (for example, identifying inconsistencies, drawing inferences, and making sophisticated decisions). Pro mode allows input from multiple content files and includes the option to provide reference data at analyzer creation time.\n",
"\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar, Clarity]
    • change: Replaced the phrase "for instance," with "for example," and removed the comma after "reasoning"
    • rationale: "For example" is more standard in formal documentation, and removing the comma before "and complex decision-making" improves sentence flow and clarity.
    • impact: Enhances readability and professionalism of the documentation, making the explanation clearer and easier to follow.

" - Alternatively, set both `REFERENCE_DOC_STORAGE_ACCOUNT_NAME` and `REFERENCE_DOC_CONTAINER_NAME` so that the SAS URL can be generated automatically during a later step.\n",
" - Also, set `REFERENCE_DOC_PATH` to specify the folder path within the container where reference documents will be uploaded.\n",
" > ⚠️ Note: Reference documents are optional in Pro mode. You can run Pro mode using only input documents. For example, the service can reason across two or more input files without any reference data.\n",
"3. Install the required packages to run the sample."
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar, Clarity, Consistency]

    • change: Changed "Ensure Azure AI service is configured following [steps]" to "Ensure the Azure AI service is configured by following the [setup steps]"
    • rationale: Added "the" for grammatical correctness and replaced "steps" with "setup steps" for clearer reference.
    • impact: Improves sentence clarity and grammatical correctness, making instructions easier to understand.
  • categories: [Clarity, Consistency]

    • change: Updated the numbering of the steps from repeating "1." to sequential numbering "1.", "2.", "3."
    • rationale: Numbering steps sequentially aligns with standard list conventions and improves navigability.
    • impact: Enhances readability by clearly defining step order.
  • categories: [Grammar, Clarity]

    • change: Changed "please follow [Set env for reference doc]" to "follow [Set env for reference doc]"
    • rationale: Removed polite but unnecessary "please" to maintain direct instructional tone.
    • impact: Makes instructions more concise and professional.
  • categories: [Grammar, Clarity]

    • change: Rephrased bullet points for environment variable setup for improved readability and grammar (e.g., "You can either set..." to "You can set...", "Or set both..." to "Alternatively, set both...")
    • rationale: Streamlined language and replaced conjunctions for clearer, more formal instructions.
    • impact: Clarifies available options for setting environment variables, reducing potential user confusion.
  • categories: [Grammar, Clarity]

    • change: Added commas and small phrasing changes, such as "Also set" to "Also, set", and replaced "using just input documents" with "using only input documents"
    • rationale: Minor punctuation and word choice improvements to enhance readability and precision.
    • impact: Provides smoother reading experience and clearer instructions.
  • categories: [Clarity]

    • change: Changed "during one of the later steps" to "during a later step"
    • rationale: Simplifies phrasing for easier comprehension.
    • impact: Makes the timing of actions easier to understand for users.

"\n",
"> For example, if you're looking to analyze invoices to ensure they're consistent with a contractual agreement, you can supply the invoice and other relevant documents (for example, a purchase order) as inputs, and supply the contract files as reference data. The service applies reasoning to validate the input documents according to your schema, which might be to identify discrepancies to flag for further review."
"> For example, if you're analyzing invoices to ensure their consistency with a contractual agreement, you can supply the invoice and other relevant documents (e.g., a purchase order) as inputs, and provide the contract files as reference data. The service applies reasoning to validate the input documents against your schema, which might include identifying discrepancies to flag for further review."
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar, Clarity, Consistency]

    • change: Capitalized the main heading from "Analyzer template and local files setup" to "Analyzer Template and Local Files Setup".
    • rationale: Proper capitalization of section headers adheres to standard title case conventions for better readability and professionalism.
    • impact: Enhances the visual consistency and professionalism of the documentation.
  • categories: [Grammar, Clarity]

    • change: Revised the description of "analyzer_template" to include a comma after "sample" and changed phrasing for readability ("In this sample we define" to "In this sample, we define").
    • rationale: Adding a comma improves grammatical correctness and clarity.
    • impact: Makes the sentence easier to read and understand.
  • categories: [Clarity, Consistency]

    • change: Changed "We can have multiple input document files..." to "You can have multiple input document files..." and replaced "designate" with "specify".
    • rationale: Switching from "we" to "you" directly addresses the reader, making the documentation more user-focused. "Specify" is simpler and clearer than "designate".
    • impact: Improves reader engagement and comprehension.
  • categories: [Grammar, Clarity, Consistency]

    • change: Edited the "reference_docs(Optional)" description for grammatical correctness, consistency in phrasing, and clarity; e.g., adding spaces in "(Optional)", replacing passive forms with clearer active constructions, and capitalizing terms like "Azure Blob Storage".
    • rationale: Proper spacing, capitalization, and sentence restructuring aid clarity and maintain consistent style throughout the document.
    • impact: Provides a more polished, professional, and easily understandable explanation.
  • categories: [Clarity, Grammar]

    • change: Improved example sentence by changing "if you're looking to analyze" to "if you're analyzing", replacing "for example" with abbreviation "(e.g., )", and adjusting phrase structure for smoother flow.
    • rationale: The changes make the example more direct and concise while maintaining clarity. The abbreviation "(e.g., )" is standard for providing examples.
    • impact: Enhances readability and professionalism of the example, aiding user comprehension.

@@ -67,15 +67,15 @@
"analyzer_template = \"../analyzer_templates/invoice_contract_verification_pro_mode.json\"\n",
"input_docs = \"../data/field_extraction_pro_mode/invoice_contract_verification/input_docs\"\n",
"\n",
"# NOTE: Reference documents are optional in Pro mode. Can comment out below line if not using reference documents.\n",
"# NOTE: Reference documents are optional in Pro mode. Comment out the line below if not using reference documents.\n",
"reference_docs = \"../data/field_extraction_pro_mode/invoice_contract_verification/reference_docs\""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar, Clarity]
    • change: Changed "Can comment out below line if not using reference documents." to "Comment out the line below if not using reference documents."
    • rationale: The original sentence was less formal and somewhat ambiguous. The revised version is a direct instruction, improving grammatical correctness and clarity.
    • impact: This change provides clearer guidance to the reader, reducing potential confusion about whether and how to comment out the line.

@@ -455,16 +455,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Let's take a deeper look at `LineItemCorroboration` field in the result"
"### Examine `LineItemCorroboration` Field in Detail"
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Clarity, Formatting]
    • change: Modified the heading from "Let's take a deeper look at LineItemCorroboration field in the result" to "Examine LineItemCorroboration Field in Detail"
    • rationale: The updated heading is more concise and formal, removing conversational language to better suit technical documentation standards. Capitalization was adjusted for consistency with typical heading style.
    • impact: Improves readability and professionalism of the documentation, making it clearer and more direct for readers.

"> Multiple input documents are combined to produce one unified output. There is always one analysis result, and this is not a batch model where N input documents would yield N outputs."
"> The `ReportingOfficer` field is only present in the car accident report, while fields such as `VIN` come exclusively from the repair estimate document. This demonstrates how information is extracted from both documents to generate a single unified result.\n",
"> \n",
"> Multiple input documents are combined to produce one consolidated output. This is a single-analysis result, unlike a batch model where N input documents yield N outputs."
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Clarity, Grammar, Consistency]

    • change: Rephrased the sentence describing the presence of fields in respective documents, changing "We can see that the field ReportingOfficer is only available in the car accident report, while fields like VIN come solely from the repair estimate document" to "The ReportingOfficer field is only present in the car accident report, while fields such as VIN come exclusively from the repair estimate document."
    • rationale: The rephrasing removes informal language ("We can see") and improves precision by replacing "available" with "present" and "like" with "such as." It also changes "solely" to "exclusively" for stronger emphasis.
    • impact: This change improves the professionalism and clarity of the documentation, making it more authoritative and easier to understand.
  • categories: [Formatting, Clarity]

    • change: Adjusted paragraph breaks and added ">" markers to maintain blockquote formatting, changing a plain newline to "> \n".
    • rationale: This preserves the correct markdown blockquote style and maintains visual separation between thoughts.
    • impact: Improves readability and ensures consistent presentation of quoted text.
  • categories: [Clarity, Grammar]

    • change: Changed "combined to produce one unified output. There is always one analysis result, and this is not a batch model where N input documents would yield N outputs." to "combined to produce one consolidated output. This is a single-analysis result, unlike a batch model where N input documents yield N outputs."
    • rationale: Streamlines phrasing for conciseness and clarity; replacing "unified" with "consolidated" and merging sentences improves flow and directness.
    • impact: Enhances the reader’s comprehension by delivering information in a clearer and more straightforward manner.

@@ -481,14 +481,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"> In the `LineItemCorroboration` field, we see that each line item, generated from *repair estimate document*, is extracted with its corresponding information, claim status, and evidence. Items that are not covered by the policy, such as the Starbucks drink and hotel stay, are marked as suspicious, while damage repairs that are supported by the supplied documents in the claim and are permitted by the policy are confirmed."
"> Within the `LineItemCorroboration` field, each line item from the *repair estimate document* is extracted along with its claim status and evidence. Items not covered by the policy, such as a Starbucks drink and hotel stay, are marked as suspicious. Valid damage repairs supported by the claim documents and permitted under the policy are confirmed."
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Clarity, Grammar]

    • change: Revised sentence structure for smoother readability and conciseness; changed "we see that each line item, generated from repair estimate document, is extracted with its corresponding information, claim status, and evidence" to "each line item from the repair estimate document is extracted along with its claim status and evidence."
    • rationale: To eliminate awkward phrasing and passive voice, making the description more direct and easier to follow.
    • impact: Enhances the clarity of the explanation and improves the flow of information.
  • categories: [Clarity, Grammar]

    • change: Changed "Items that are not covered by the policy, such as the Starbucks drink and hotel stay, are marked as suspicious" to "Items not covered by the policy, such as a Starbucks drink and hotel stay, are marked as suspicious."
    • rationale: Removed unnecessary definite article "the" to reflect that these examples are generic, not specific, and improved sentence flow.
    • impact: Increases readability and precision of the example provided.
  • categories: [Clarity]

    • change: Replaced "damage repairs that are supported by the supplied documents in the claim and are permitted by the policy are confirmed" with "Valid damage repairs supported by the claim documents and permitted under the policy are confirmed."
    • rationale: Streamlined wording by replacing "supplied documents in the claim" with "claim documents" and used "valid damage repairs" for clearer meaning.
    • impact: Makes the statement more concise and easier to understand.

]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### [Optional] Delete the analyzer for second sample after use"
"### [Optional] Delete the Analyzer for the Second Sample After Use"
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar, Consistency, Formatting]
    • change: Capitalized "Analyzer," "Second Sample," and "After Use" in the comment string.
    • rationale: Ensured consistent capitalization for readability and to follow a style convention where key terms in comments are capitalized.
    • impact: Improves the clarity and professional appearance of the documentation by maintaining consistent formatting and grammar.

@@ -522,4 +522,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
} No newline at end of file
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Formatting]
    • change: Added a newline character after the closing brace (}) in the code section.
    • rationale: Ensures proper file formatting by ending the file with a newline, which is a common convention in many programming languages and tools.
    • impact: Improves compatibility with editors and tools that expect files to end with a newline, helping to avoid potential issues with file concatenation or version control diffs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant