-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[EvaluationResult Convert]Counts only for primary metrics when multiple metrics and exclude errored counts for passed/failed #43878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces filtering logic to skip non-primary metrics when calculating AOAI evaluation summaries. The primary metric is defined as the first metric in the list for evaluators that produce multiple metrics.
- Added a new
_is_primary_metricfunction to determine if a metric is a primary metric - Modified
_calculate_aoai_evaluation_summaryto skip counting non-primary metrics - Reordered the
rouge_scoremetrics list to makerouge_f1_scorethe primary metric instead ofrouge_precision
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py | Added _is_primary_metric function and integrated primary metric filtering into _calculate_aoai_evaluation_summary |
| sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_constants.py | Updated documentation for _EvaluatorMetricMapping and reordered rouge_score metrics |
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines