Skip to content

Commit ee73a6e

Browse files
authored
[evaluation] chore: Enable tests on 3.13, disable tests on 3.14 (#43362)
* chore: Specify correct lower bound for pands on 3.14 * chore: Enable 3.13 * chore: Disable 3.14 * chore: Add 3.13 tests * chore: Move red-team dep install to dev-requirements * docs: Fix misc rst formatting issues * docs,fix: Remove admonitions with no matching blocks * fix: Add environment marker for redteam extra pyrit only supports python3.10 and up * chore: Bump min boudns for pyrit * fix: Fix pandas min bound for 3.13
1 parent 5d2658a commit ee73a6e

File tree

18 files changed

+64
-147
lines changed

18 files changed

+64
-147
lines changed

eng/tools/azure-sdk-tools/ci_tools/functions.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@
5555
"sdk/textanalytics/azure-ai-textanalytics",
5656
]
5757

58-
TEST_COMPATIBILITY_MAP = {"azure-ai-ml": ">=3.7", "azure-ai-evaluation": ">=3.9, !=3.13.*"}
58+
TEST_COMPATIBILITY_MAP = {"azure-ai-ml": ">=3.7"}
5959
TEST_PYTHON_DISTRO_INCOMPATIBILITY_MAP = {
6060
"azure-storage-blob": "pypy",
6161
"azure-storage-queue": "pypy",

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_aoai/aoai_grader.py

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,9 @@
1818

1919
@experimental
2020
class AzureOpenAIGrader:
21-
"""
22-
Base class for Azure OpenAI grader wrappers, recommended only for use by experienced OpenAI API users.
21+
"""Base class for Azure OpenAI grader wrappers.
22+
23+
Recommended only for use by experienced OpenAI API users.
2324
Combines a model configuration and any grader configuration
2425
into a singular object that can be used in evaluations.
2526
@@ -28,20 +29,16 @@ class AzureOpenAIGrader:
2829
evaluation results.
2930
3031
:param model_config: The model configuration to use for the grader.
31-
:type model_config: Union[
32-
~azure.ai.evaluation.AzureOpenAIModelConfiguration,
33-
~azure.ai.evaluation.OpenAIModelConfiguration
34-
]
32+
:type model_config: Union[~azure.ai.evaluation.AzureOpenAIModelConfiguration,
33+
~azure.ai.evaluation.OpenAIModelConfiguration]
3534
:param grader_config: The grader configuration to use for the grader. This is expected
3635
to be formatted as a dictionary that matches the specifications of the sub-types of
37-
the TestingCriterion alias specified in (OpenAI's SDK)[https://github.com/openai/openai-python/blob/ed53107e10e6c86754866b48f8bd862659134ca8/src/openai/types/eval_create_params.py#L151].
36+
the TestingCriterion alias specified in `OpenAI's SDK <https://github.com/openai/openai-python/blob/ed53107e10e6c86754866b48f8bd862659134ca8/src/openai/types/eval_create_params.py#L151>`_.
3837
:type grader_config: Dict[str, Any]
3938
:param credential: The credential to use to authenticate to the model. Only applicable to AzureOpenAI models.
4039
:type credential: ~azure.core.credentials.TokenCredential
4140
:param kwargs: Additional keyword arguments to pass to the grader.
4241
:type kwargs: Any
43-
44-
4542
"""
4643

4744
id = "azureai://built-in/evaluators/azure-openai/custom_grader"

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_aoai/label_grader.py

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,21 +14,18 @@
1414

1515
@experimental
1616
class AzureOpenAILabelGrader(AzureOpenAIGrader):
17-
"""
18-
Wrapper class for OpenAI's label model graders.
17+
"""Wrapper class for OpenAI's label model graders.
1918
2019
Supplying a LabelGrader to the `evaluate` method will cause an asynchronous request to evaluate
2120
the grader via the OpenAI API. The results of the evaluation will then be merged into the standard
2221
evaluation results.
2322
2423
:param model_config: The model configuration to use for the grader.
25-
:type model_config: Union[
26-
~azure.ai.evaluation.AzureOpenAIModelConfiguration,
27-
~azure.ai.evaluation.OpenAIModelConfiguration
28-
]
24+
:type model_config: Union[~azure.ai.evaluation.AzureOpenAIModelConfiguration,
25+
~azure.ai.evaluation.OpenAIModelConfiguration]
2926
:param input: The list of label-based testing criterion for this grader. Individual
3027
values of this list are expected to be dictionaries that match the format of any of the valid
31-
(TestingCriterionLabelModelInput)[https://github.com/openai/openai-python/blob/ed53107e10e6c86754866b48f8bd862659134ca8/src/openai/types/eval_create_params.py#L125C1-L125C32]
28+
`TestingCriterionLabelModelInput <https://github.com/openai/openai-python/blob/ed53107e10e6c86754866b48f8bd862659134ca8/src/openai/types/eval_create_params.py#L125C1-L125C32>`_
3229
subtypes.
3330
:type input: List[Dict[str, str]]
3431
:param labels: A list of strings representing the classification labels of this grader.
@@ -43,8 +40,6 @@ class AzureOpenAILabelGrader(AzureOpenAIGrader):
4340
:type credential: ~azure.core.credentials.TokenCredential
4441
:param kwargs: Additional keyword arguments to pass to the grader.
4542
:type kwargs: Any
46-
47-
4843
"""
4944

5045
id = "azureai://built-in/evaluators/azure-openai/label_grader"

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_aoai/python_grader.py

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,7 @@
1414

1515
@experimental
1616
class AzureOpenAIPythonGrader(AzureOpenAIGrader):
17-
"""
18-
Wrapper class for OpenAI's Python code graders.
17+
"""Wrapper class for OpenAI's Python code graders.
1918
2019
Enables custom Python-based evaluation logic with flexible scoring and
2120
pass/fail thresholds. The grader executes user-provided Python code
@@ -27,16 +26,13 @@ class AzureOpenAIPythonGrader(AzureOpenAIGrader):
2726
evaluation results.
2827
2928
:param model_config: The model configuration to use for the grader.
30-
:type model_config: Union[
31-
~azure.ai.evaluation.AzureOpenAIModelConfiguration,
32-
~azure.ai.evaluation.OpenAIModelConfiguration
33-
]
29+
:type model_config: Union[~azure.ai.evaluation.AzureOpenAIModelConfiguration,
30+
~azure.ai.evaluation.OpenAIModelConfiguration]
3431
:param name: The name of the grader.
3532
:type name: str
3633
:param image_tag: The image tag for the Python execution environment.
3734
:type image_tag: str
38-
:param pass_threshold: Score threshold for pass/fail classification.
39-
Scores >= threshold are considered passing.
35+
:param pass_threshold: Score threshold for pass/fail classification. Scores >= threshold are considered passing.
4036
:type pass_threshold: float
4137
:param source: Python source code containing the grade function.
4238
Must define: def grade(sample: dict, item: dict) -> float

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_aoai/score_model_grader.py

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,7 @@
1414

1515
@experimental
1616
class AzureOpenAIScoreModelGrader(AzureOpenAIGrader):
17-
"""
18-
Wrapper class for OpenAI's score model graders.
17+
"""Wrapper class for OpenAI's score model graders.
1918
2019
Enables continuous scoring evaluation with custom prompts and flexible
2120
conversation-style inputs. Supports configurable score ranges and
@@ -27,10 +26,8 @@ class AzureOpenAIScoreModelGrader(AzureOpenAIGrader):
2726
evaluation results.
2827
2928
:param model_config: The model configuration to use for the grader.
30-
:type model_config: Union[
31-
~azure.ai.evaluation.AzureOpenAIModelConfiguration,
32-
~azure.ai.evaluation.OpenAIModelConfiguration
33-
]
29+
:type model_config: Union[~azure.ai.evaluation.AzureOpenAIModelConfiguration,
30+
~azure.ai.evaluation.OpenAIModelConfiguration]
3431
:param input: The input messages for the grader. List of conversation
3532
messages with role and content.
3633
:type input: List[Dict[str, str]]

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_aoai/string_check_grader.py

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,18 +15,14 @@
1515

1616
@experimental
1717
class AzureOpenAIStringCheckGrader(AzureOpenAIGrader):
18-
"""
19-
Wrapper class for OpenAI's string check graders.
18+
"""Wrapper class for OpenAI's string check graders.
2019
2120
Supplying a StringCheckGrader to the `evaluate` method will cause an asynchronous request to evaluate
2221
the grader via the OpenAI API. The results of the evaluation will then be merged into the standard
2322
evaluation results.
2423
2524
:param model_config: The model configuration to use for the grader.
26-
:type model_config: Union[
27-
~azure.ai.evaluation.AzureOpenAIModelConfiguration,
28-
~azure.ai.evaluation.OpenAIModelConfiguration
29-
]
25+
:type model_config: Union[~azure.ai.evaluation.AzureOpenAIModelConfiguration,~azure.ai.evaluation.OpenAIModelConfiguration]
3026
:param input: The input text. This may include template strings.
3127
:type input: str
3228
:param name: The name of the grader.
@@ -39,8 +35,6 @@ class AzureOpenAIStringCheckGrader(AzureOpenAIGrader):
3935
:type credential: ~azure.core.credentials.TokenCredential
4036
:param kwargs: Additional keyword arguments to pass to the grader.
4137
:type kwargs: Any
42-
43-
4438
"""
4539

4640
id = "azureai://built-in/evaluators/azure-openai/string_check_grader"

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_aoai/text_similarity_grader.py

Lines changed: 5 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -15,32 +15,19 @@
1515

1616
@experimental
1717
class AzureOpenAITextSimilarityGrader(AzureOpenAIGrader):
18-
"""
19-
Wrapper class for OpenAI's string check graders.
18+
"""Wrapper class for OpenAI's string check graders.
2019
2120
Supplying a StringCheckGrader to the `evaluate` method will cause an asynchronous request to evaluate
2221
the grader via the OpenAI API. The results of the evaluation will then be merged into the standard
2322
evaluation results.
2423
2524
:param model_config: The model configuration to use for the grader.
2625
:type model_config: Union[
27-
~azure.ai.evaluation.AzureOpenAIModelConfiguration,
28-
~azure.ai.evaluation.OpenAIModelConfiguration
29-
]
26+
~azure.ai.evaluation.AzureOpenAIModelConfiguration,
27+
~azure.ai.evaluation.OpenAIModelConfiguration]
3028
:param evaluation_metric: The evaluation metric to use.
31-
:type evaluation_metric: Literal[
32-
"fuzzy_match",
33-
"bleu",
34-
"gleu",
35-
"meteor",
36-
"rouge_1",
37-
"rouge_2",
38-
"rouge_3",
39-
"rouge_4",
40-
"rouge_5",
41-
"rouge_l",
42-
"cosine",
43-
]
29+
:type evaluation_metric: Literal["fuzzy_match", "bleu", "gleu", "meteor", "rouge_1", "rouge_2", "rouge_3",
30+
"rouge_4", "rouge_5", "rouge_l", "cosine"]
4431
:param input: The text being graded.
4532
:type input: str
4633
:param pass_threshold: A float score where a value greater than or equal indicates a passing grade.
@@ -53,8 +40,6 @@ class AzureOpenAITextSimilarityGrader(AzureOpenAIGrader):
5340
:type credential: ~azure.core.credentials.TokenCredential
5441
:param kwargs: Additional keyword arguments to pass to the grader.
5542
:type kwargs: Any
56-
57-
5843
"""
5944

6045
id = "azureai://built-in/evaluators/azure-openai/text_similarity_grader"

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_bleu/_bleu.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ class BleuScoreEvaluator(EvaluatorBase):
4646
https://{resource_name}.services.ai.azure.com/api/projects/{project_name}
4747
4848
.. admonition:: Example with Threshold:
49+
4950
.. literalinclude:: ../samples/evaluation_samples_threshold.py
5051
:start-after: [START threshold_bleu_score_evaluator]
5152
:end-before: [END threshold_bleu_score_evaluator]

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_code_vulnerability/_code_vulnerability.py

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -56,23 +56,6 @@ class CodeVulnerabilityEvaluator(RaiServiceEvaluatorBase[Union[str, bool]]):
5656
:param kwargs: Additional arguments to pass to the evaluator.
5757
:type kwargs: Any
5858
59-
.. admonition:: Example:
60-
61-
.. literalinclude:: ../samples/evaluation_samples_evaluate.py
62-
:start-after: [START code_vulnerability_evaluator]
63-
:end-before: [END code_vulnerability_evaluator]
64-
:language: python
65-
:dedent: 8
66-
:caption: Initialize and call CodeVulnerabilityEvaluator with a query and response using azure.ai.evaluation.AzureAIProject.
67-
68-
.. literalinclude:: ../samples/evaluation_samples_evaluate_fdp.py
69-
:start-after: [START code_vulnerability_evaluator]
70-
:end-before: [END code_vulnerability_evaluator]
71-
:language: python
72-
:dedent: 8
73-
:caption: Initialize and call CodeVulnerabilityEvaluator using Azure AI Project URL in following format
74-
https://{resource_name}.services.ai.azure.com/api/projects/{project_name}
75-
7659
.. note::
7760
7861
If this evaluator is supplied to the `evaluate` function, the metric

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_groundedness/_groundedness.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,7 @@ def value(self) -> str:
3333

3434

3535
class GroundednessEvaluator(PromptyEvaluatorBase[Union[str, float]]):
36-
"""
37-
Evaluates groundedness score for a given query (optional), response, and context or a multi-turn conversation,
36+
"""Evaluates groundedness score for a given query (optional), response, and context or a multi-turn conversation,
3837
including reasoning.
3938
4039
The groundedness measure assesses the correspondence between claims in an AI-generated answer and the source
@@ -66,6 +65,7 @@ class GroundednessEvaluator(PromptyEvaluatorBase[Union[str, float]]):
6665
:caption: Initialize and call a GroundednessEvaluator.
6766
6867
.. admonition:: Example with Threshold:
68+
6969
.. literalinclude:: ../samples/evaluation_samples_threshold.py
7070
:start-after: [START threshold_groundedness_evaluator]
7171
:end-before: [END threshold_groundedness_evaluator]

0 commit comments

Comments
 (0)