You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Foundry project or Hubs based project. To learn more, see [Create a project](create-projects.md).
31
31
32
32
Two GitHub Actions are available for evaluating AI applications: **ai-agent-evals** and **genai-evals**.
33
33
@@ -41,12 +41,20 @@ Two GitHub Actions are available for evaluating AI applications: **ai-agent-eval
41
41
42
42
### AI agent evaluations input
43
43
44
-
The input of ai-agent-evals includes:
44
+
The required inputs of ai-agent-evals include:
45
45
46
46
**Required:**
47
47
48
-
-`azure-aiproject-connection-string`: The connection string for the Azure AI project. This is used to connect to Azure OpenAI to simulate conversations with each agent, and to connect to the Azure AI evaluation SDK to perform the evaluation.
49
-
-`deployment-name`: the deployed model name.
48
+
# [Foundry project](#tab/foundry-project)
49
+
50
+
-`azure-ai-project-endpoint`: The endpoint of the Azure AI project. This is used to connect to Azure OpenAI to simulate conversations with each agent, and to connect to the Azure AI evaluation SDK to perform the evaluation.
51
+
52
+
# [Hub based project](#tab/hub-project)
53
+
54
+
-`azure-aiproject-connection-string`: The connection string of the Azure AI project. This is used to connect to Azure OpenAI to simulate conversations with each agent, and to connect to the Azure AI evaluation SDK to perform the evaluation.
55
+
56
+
---
57
+
-`deployment-name`: the deployed model name for evaluation judgement.
50
58
-`data-path`: Path to the input data file containing the conversation starters. Each conversation starter is sent to each agent for a pairwise comparison of evaluation results.
51
59
-`evaluators`: built-in evaluator names.
52
60
-`data`: a set of conversation starters/queries.
@@ -55,6 +63,7 @@ The input of ai-agent-evals includes:
55
63
- When only one `agent-id` is specified, the evaluation results include the absolute values for each metric along with the corresponding confidence intervals.
56
64
- When multiple `agent-ids` are specified, the results include absolute values for each agent and a statistical comparison against the designated baseline agent ID.
57
65
66
+
58
67
**Optional:**
59
68
60
69
-`api-version`: the API version of deployed model.
@@ -87,6 +96,16 @@ Here's a sample of the dataset:
87
96
88
97
To use the GitHub Action, add the GitHub Action to your CI/CD workflows and specify the trigger criteria (for example, on commit) and file paths to trigger your automated workflows.
89
98
99
+
# [Foundry project](#tab/foundry-project)
100
+
101
+
Specify v2-beta.
102
+
103
+
# [Hub based project](#tab/hub-project)
104
+
105
+
Specify v1-beta.
106
+
107
+
---
108
+
90
109
> [!TIP]
91
110
> To minimize costs, you should avoid running evaluation on every commit.
0 commit comments