You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Foundry project or Hubs based project. To learn more, see [Create a project](create-projects.md).
31
31
32
32
Two GitHub Actions are available for evaluating AI applications: **ai-agent-evals** and **genai-evals**.
33
33
@@ -45,8 +45,16 @@ The input of ai-agent-evals includes:
45
45
46
46
**Required:**
47
47
48
-
-`azure-aiproject-connection-string`: The connection string for the Azure AI project. This is used to connect to Azure OpenAI to simulate conversations with each agent, and to connect to the Azure AI evaluation SDK to perform the evaluation.
49
-
-`deployment-name`: the deployed model name.
48
+
# [Foundry project](#tab/foundry-project)
49
+
50
+
-`azure-ai-project-endpoint`: The endpoint of the Azure AI project. This is used to connect to your AI project to simulate conversations with each agent, and to connect to the Azure AI evaluation SDK to perform the evaluation.
51
+
52
+
# [Hub based project](#tab/hub-project)
53
+
54
+
-`azure-aiproject-connection-string`: The connection string of the Azure AI project. This is used to connect to your AI project to simulate conversations with each agent, and to connect to the Azure AI evaluation SDK to perform the evaluation.
55
+
56
+
---
57
+
-`deployment-name`: the deployed model name for evaluation judgement.
50
58
-`data-path`: Path to the input data file containing the conversation starters. Each conversation starter is sent to each agent for a pairwise comparison of evaluation results.
51
59
-`evaluators`: built-in evaluator names.
52
60
-`data`: a set of conversation starters/queries.
@@ -55,6 +63,7 @@ The input of ai-agent-evals includes:
55
63
- When only one `agent-id` is specified, the evaluation results include the absolute values for each metric along with the corresponding confidence intervals.
56
64
- When multiple `agent-ids` are specified, the results include absolute values for each agent and a statistical comparison against the designated baseline agent ID.
57
65
66
+
58
67
**Optional:**
59
68
60
69
-`api-version`: the API version of deployed model.
@@ -92,6 +101,48 @@ To use the GitHub Action, add the GitHub Action to your CI/CD workflows and spec
92
101
93
102
This example illustrates how Azure Agent AI Evaluation can be run when comparing different agents with agent IDs.
0 commit comments