-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Bugfix/aoai row mismatch #42466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Bugfix/aoai row mismatch #42466
Conversation
Add pyrit and not remove the other one
…row misalignment; add missing-rows unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request fixes a critical bug in the Azure AI evaluation system where evaluation results were not properly aligned with input data, leading to incorrect metrics being reported. The issue occurred because AOAI (Azure OpenAI) evaluation runs could return results in a different order than the original dataset or drop some rows entirely.
Key changes:
- Added row alignment validation by tracking expected row count
- Enhanced result processing to handle missing rows with proper padding
- Updated test cases to validate ordering preservation and missing row handling
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
test_aoai_evaluation_pagination.py |
Updated test cases to include expected_rows parameter and fixed test data to properly validate row ordering |
test_aoai_alignment_missing_rows.py |
New test file to validate proper handling of unordered AOAI results and missing row detection |
_evaluate_aoai.py |
Core fix implementing row alignment validation, missing row padding, and proper result ordering |
CHANGELOG.md |
Documentation of the bug fix |
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate_aoai.py
Outdated
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate_aoai.py
Outdated
Show resolved
Hide resolved
…te/_evaluate_aoai.py Co-authored-by: Copilot <[email protected]>
…te/_evaluate_aoai.py Co-authored-by: Copilot <[email protected]>
if expected is not None: | ||
pre_len = len(output_df) | ||
# Assumes original datasource_item_id space is 0..expected-1 | ||
output_df = output_df.reindex(range(expected)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know for which row we did not get result ?
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines