Skip to content

Commit ba4cb37

Browse files
authored
Evaluation sample cleanup and update Readme (#44032)
1 parent 1aabd76 commit ba4cb37

33 files changed

+379
-307
lines changed

sdk/ai/azure-ai-projects/README.md

Lines changed: 70 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,11 @@ resources in your Microsoft Foundry Project. Use it to:
2121
* Model Context Protocol (MCP)
2222
* SharePoint
2323
* Web Search
24-
* **Get an OpenAI client** using `.get_openai_client()` method to run "Responses" and "Conversations" operations with your Agent.
24+
* **Get an OpenAI client** using `.get_openai_client()` method to run "Responses", "Conversations", and "Evals" operations with your Agent.
2525
* **Manage memory stores** for Agent conversations, using the `.memory_store` operations.
26-
* **Run Evaluations** to assess the performance of your generative AI application, using the `.evaluation_rules`,
26+
* **Explore additional evaluation tools** to assess the performance of your generative AI application, using the `.evaluation_rules`,
2727
`.evaluation_taxonomies`, `.evaluators`, `.insights`, and `.schedules` operations.
28-
* **Run Red Team operations** to identify risks associated with your generative AI application, using the ".red_teams" operations.
28+
* **Run Red Team scans** to identify risks associated with your generative AI application, using the ".red_teams" operations.
2929
* **Enumerate AI Models** deployed to your Foundry Project using the `.deployments` operations.
3030
* **Enumerate connected Azure resources** in your Foundry project using the `.connections` operations.
3131
* **Upload documents and create Datasets** to reference them using the `.datasets` operations.
@@ -561,6 +561,73 @@ Evaluation in Azure AI Project client library provides quantitative, AI-assisted
561561

562562
The code below shows some evaluation operations. Full list of sample can be found under "evaluation" folder in the [package samples][samples]
563563

564+
<!-- SNIPPET:sample_agent_evaluation.agent_evaluation_basic -->
565+
566+
```python
567+
with (
568+
DefaultAzureCredential() as credential,
569+
AIProjectClient(endpoint=endpoint, credential=credential) as project_client,
570+
project_client.get_openai_client() as openai_client,
571+
):
572+
agent = project_client.agents.create_version(
573+
agent_name=os.environ["AZURE_AI_AGENT_NAME"],
574+
definition=PromptAgentDefinition(
575+
model=os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"],
576+
instructions="You are a helpful assistant that answers general questions",
577+
),
578+
)
579+
print(f"Agent created (id: {agent.id}, name: {agent.name}, version: {agent.version})")
580+
581+
data_source_config = DataSourceConfigCustom(
582+
type="custom",
583+
item_schema={"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]},
584+
include_sample_schema=True,
585+
)
586+
testing_criteria = [
587+
{
588+
"type": "azure_ai_evaluator",
589+
"name": "violence_detection",
590+
"evaluator_name": "builtin.violence",
591+
"data_mapping": {"query": "{{item.query}}", "response": "{{item.response}}"},
592+
}
593+
]
594+
eval_object = openai_client.evals.create(
595+
name="Agent Evaluation",
596+
data_source_config=data_source_config,
597+
testing_criteria=testing_criteria, # type: ignore
598+
)
599+
print(f"Evaluation created (id: {eval_object.id}, name: {eval_object.name})")
600+
601+
data_source = {
602+
"type": "azure_ai_target_completions",
603+
"source": {
604+
"type": "file_content",
605+
"content": [
606+
{"item": {"query": "What is the capital of France?"}},
607+
{"item": {"query": "How do I reverse a string in Python?"}},
608+
],
609+
},
610+
"input_messages": {
611+
"type": "template",
612+
"template": [
613+
{"type": "message", "role": "user", "content": {"type": "input_text", "text": "{{item.query}}"}}
614+
],
615+
},
616+
"target": {
617+
"type": "azure_ai_agent",
618+
"name": agent.name,
619+
"version": agent.version, # Version is optional. Defaults to latest version if not specified
620+
},
621+
}
622+
623+
agent_eval_run: Union[RunCreateResponse, RunRetrieveResponse] = openai_client.evals.runs.create(
624+
eval_id=eval_object.id, name=f"Evaluation Run for Agent {agent.name}", data_source=data_source # type: ignore
625+
)
626+
print(f"Evaluation run created (id: {agent_eval_run.id})")
627+
```
628+
629+
<!-- END SNIPPET -->
630+
564631
### Deployments operations
565632

566633
The code below shows some Deployments operations, which allow you to enumerate the AI models deployed to your AI Foundry Projects. These models can be seen in the "Models + endpoints" tab in your AI Foundry Project. Full samples can be found under the "deployment" folder in the [package samples][samples].

sdk/ai/azure-ai-projects/samples/evaluations/agentic_evaluators/sample_coherence.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"""
88
DESCRIPTION:
99
Given an AIProjectClient, this sample demonstrates how to use the synchronous
10-
`openai.evals.*` methods to create, get and list eval group and and eval runs
10+
`openai.evals.*` methods to create, get and list evaluation and and eval runs
1111
using inline dataset content.
1212
1313
USAGE:
@@ -75,15 +75,15 @@ def main() -> None:
7575
}
7676
]
7777

78-
print("Creating Eval Group")
78+
print("Creating Evaluation")
7979
eval_object = client.evals.create(
8080
name="Test Coherence Evaluator with inline data",
81-
data_source_config=data_source_config,
82-
testing_criteria=testing_criteria, # type: ignore
81+
data_source_config=data_source_config,
82+
testing_criteria=testing_criteria, # type: ignore
8383
)
84-
print(f"Eval Group created")
84+
print(f"Evaluation created")
8585

86-
print("Get Eval Group by Id")
86+
print("Get Evaluation by Id")
8787
eval_object_response = client.evals.retrieve(eval_object.id)
8888
print("Eval Run Response:")
8989
pprint(eval_object_response)

sdk/ai/azure-ai-projects/samples/evaluations/agentic_evaluators/sample_fluency.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"""
88
DESCRIPTION:
99
Given an AIProjectClient, this sample demonstrates how to use the synchronous
10-
`openai.evals.*` methods to create, get and list eval group and and eval runs
10+
`openai.evals.*` methods to create, get and list evaluation and and eval runs
1111
using inline dataset content.
1212
1313
USAGE:
@@ -74,15 +74,15 @@ def main() -> None:
7474
}
7575
]
7676

77-
print("Creating Eval Group")
77+
print("Creating Evaluation")
7878
eval_object = client.evals.create(
7979
name="Test Fluency Evaluator with inline data",
8080
data_source_config=data_source_config,
8181
testing_criteria=testing_criteria, # type: ignore
8282
)
83-
print(f"Eval Group created")
83+
print(f"Evaluation created")
8484

85-
print("Get Eval Group by Id")
85+
print("Get Evaluation by Id")
8686
eval_object_response = client.evals.retrieve(eval_object.id)
8787
print("Eval Run Response:")
8888
pprint(eval_object_response)

sdk/ai/azure-ai-projects/samples/evaluations/agentic_evaluators/sample_generic_agentic_evaluator/agent_utils.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,15 +50,15 @@ def run_evaluator(
5050
}
5151
]
5252

53-
print("Creating Eval Group")
53+
print("Creating Evaluation")
5454
eval_object = client.evals.create(
5555
name=f"Test {evaluator_name} Evaluator with inline data",
5656
data_source_config=data_source_config,
5757
testing_criteria=testing_criteria, # type: ignore
5858
)
59-
print(f"Eval Group created")
59+
print(f"Evaluation created")
6060

61-
print("Get Eval Group by Id")
61+
print("Get Evaluation by Id")
6262
eval_object_response = client.evals.retrieve(eval_object.id)
6363
print("Eval Run Response:")
6464
pprint(eval_object_response)

sdk/ai/azure-ai-projects/samples/evaluations/agentic_evaluators/sample_generic_agentic_evaluator/sample_generic_agentic_evaluator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"""
88
DESCRIPTION:
99
Given an AIProjectClient, this sample demonstrates how to use the synchronous
10-
`openai.evals.*` methods to create, get and list eval group and and eval runs
10+
`openai.evals.*` methods to create, get and list evaluation and and eval runs
1111
for Any agentic evaluator using inline dataset content.
1212
1313
USAGE:

sdk/ai/azure-ai-projects/samples/evaluations/agentic_evaluators/sample_groundedness.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"""
88
DESCRIPTION:
99
Given an AIProjectClient, this sample demonstrates how to use the synchronous
10-
`openai.evals.*` methods to create, get and list eval group and and eval runs
10+
`openai.evals.*` methods to create, get and list evaluation and and eval runs
1111
for Groundedness evaluator using inline dataset content.
1212
1313
USAGE:
@@ -90,15 +90,15 @@ def main() -> None:
9090
}
9191
]
9292

93-
print("Creating Eval Group")
93+
print("Creating Evaluation")
9494
eval_object = client.evals.create(
9595
name="Test Groundedness Evaluator with inline data",
9696
data_source_config=data_source_config,
9797
testing_criteria=testing_criteria, # type: ignore
9898
)
99-
print(f"Eval Group created")
99+
print(f"Evaluation created")
100100

101-
print("Get Eval Group by Id")
101+
print("Get Evaluation by Id")
102102
eval_object_response = client.evals.retrieve(eval_object.id)
103103
print("Eval Run Response:")
104104
pprint(eval_object_response)

sdk/ai/azure-ai-projects/samples/evaluations/agentic_evaluators/sample_intent_resolution.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"""
88
DESCRIPTION:
99
Given an AIProjectClient, this sample demonstrates how to use the synchronous
10-
`openai.evals.*` methods to create, get and list eval group and and eval runs
10+
`openai.evals.*` methods to create, get and list evaluation and and eval runs
1111
for Intent Resolution evaluator using inline dataset content.
1212
1313
USAGE:
@@ -86,15 +86,15 @@ def main() -> None:
8686
}
8787
]
8888

89-
print("Creating Eval Group")
89+
print("Creating Evaluation")
9090
eval_object = client.evals.create(
9191
name="Test Intent Resolution Evaluator with inline data",
9292
data_source_config=data_source_config,
9393
testing_criteria=testing_criteria, # type: ignore
9494
)
95-
print(f"Eval Group created")
95+
print(f"Evaluation created")
9696

97-
print("Get Eval Group by Id")
97+
print("Get Evaluation by Id")
9898
eval_object_response = client.evals.retrieve(eval_object.id)
9999
print("Eval Run Response:")
100100
pprint(eval_object_response)

sdk/ai/azure-ai-projects/samples/evaluations/agentic_evaluators/sample_relevance.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"""
88
DESCRIPTION:
99
Given an AIProjectClient, this sample demonstrates how to use the synchronous
10-
`openai.evals.*` methods to create, get and list eval group and and eval runs
10+
`openai.evals.*` methods to create, get and list evaluation and and eval runs
1111
for Relevance evaluator using inline dataset content.
1212
1313
USAGE:
@@ -79,15 +79,15 @@ def main() -> None:
7979
}
8080
]
8181

82-
print("Creating Eval Group")
82+
print("Creating Evaluation")
8383
eval_object = client.evals.create(
8484
name="Test Relevance Evaluator with inline data",
8585
data_source_config=data_source_config,
8686
testing_criteria=testing_criteria, # type: ignore
8787
)
88-
print(f"Eval Group created")
88+
print(f"Evaluation created")
8989

90-
print("Get Eval Group by Id")
90+
print("Get Evaluation by Id")
9191
eval_object_response = client.evals.retrieve(eval_object.id)
9292
print("Eval Run Response:")
9393
pprint(eval_object_response)

sdk/ai/azure-ai-projects/samples/evaluations/agentic_evaluators/sample_response_completeness.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"""
88
DESCRIPTION:
99
Given an AIProjectClient, this sample demonstrates how to use the synchronous
10-
`openai.evals.*` methods to create, get and list eval group and and eval runs
10+
`openai.evals.*` methods to create, get and list evaluation and and eval runs
1111
for Response Completeness evaluator using inline dataset content.
1212
1313
USAGE:
@@ -77,15 +77,15 @@ def main() -> None:
7777
}
7878
]
7979

80-
print("Creating Eval Group")
80+
print("Creating Evaluation")
8181
eval_object = client.evals.create(
8282
name="Test Response Completeness Evaluator with inline data",
8383
data_source_config=data_source_config,
8484
testing_criteria=testing_criteria, # type: ignore
8585
)
86-
print(f"Eval Group created")
86+
print(f"Evaluation created")
8787

88-
print("Get Eval Group by Id")
88+
print("Get Evaluation by Id")
8989
eval_object_response = client.evals.retrieve(eval_object.id)
9090
print("Eval Run Response:")
9191
pprint(eval_object_response)

sdk/ai/azure-ai-projects/samples/evaluations/agentic_evaluators/sample_task_adherence.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"""
88
DESCRIPTION:
99
Given an AIProjectClient, this sample demonstrates how to use the synchronous
10-
`openai.evals.*` methods to create, get and list eval group and and eval runs
10+
`openai.evals.*` methods to create, get and list evaluation and and eval runs
1111
for Task Adherence evaluator using inline dataset content.
1212
1313
USAGE:
@@ -87,15 +87,15 @@ def main() -> None:
8787
}
8888
]
8989

90-
print("Creating Eval Group")
90+
print("Creating Evaluation")
9191
eval_object = client.evals.create(
9292
name="Test Task Adherence Evaluator with inline data",
9393
data_source_config=data_source_config,
9494
testing_criteria=testing_criteria, # type: ignore
9595
)
96-
print(f"Eval Group created")
96+
print(f"Evaluation created")
9797

98-
print("Get Eval Group by Id")
98+
print("Get Evaluation by Id")
9999
eval_object_response = client.evals.retrieve(eval_object.id)
100100
print("Eval Run Response:")
101101
pprint(eval_object_response)

0 commit comments

Comments
 (0)