You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/concepts/architecture.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ At the top level, Azure AI Foundry provides access to the following resources:
28
28
29
29
When you use Azure AI Foundry portal, you can directly work with Azure OpenAI without an Azure Studio project. Or you can use Azure OpenAI through a project.
30
30
31
-
For more information, visit [Azure OpenAI in Azure AI Foundry portal](../azure-openai-in-ai-foundry.md).
31
+
For more information, visit [Azure OpenAI in Azure AI Foundry portal](../azure-openai-in-azure-ai-foundry.md).
32
32
33
33
-**Management center**: The management center streamlines governance and management of Azure AI Foundry resources such as hubs, projects, connected resources, and deployments.
34
34
@@ -108,7 +108,7 @@ Azure AI services including Azure OpenAI provide control plane endpoints for ope
108
108
109
109
To reduce the complexity of Azure RBAC management, Azure AI Foundry provides a *control plane proxy* that allows you to perform operations on connected Azure AI services and Azure OpenAI resources. Performing operations on these resources through the control plane proxy only requires Azure RBAC permissions on the hub. The Azure AI Foundry service then performs the call to the Azure AI services or Azure OpenAI control plane endpoint on your behalf.
110
110
111
-
For more information, see [Role-based access control in Azure AI Foundry portal](rbac-ai-foundry.md).
111
+
For more information, see [Role-based access control in Azure AI Foundry portal](rbac-azure-ai-foundry.md).
Copy file name to clipboardExpand all lines: articles/ai-foundry/concepts/connections.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,7 +54,7 @@ When you create a connection with an existing Azure storage account, you can cho
54
54
-**Identity-based**: Use your Microsoft Entra ID or managed identity to authenticate data access.
55
55
56
56
> [!TIP]
57
-
> When you use an identity-based connection, Azure role-based access control (Azure RBAC) determines who can access the connection. You must assign the correct Azure RBAC roles to your developers before they can use the connection. For more information, see [Scenario: Connections using Microsoft Entra ID](rbac-ai-foundry.md#scenario-connections-using-microsoft-entra-id-authentication).
57
+
> When you use an identity-based connection, Azure role-based access control (Azure RBAC) determines who can access the connection. You must assign the correct Azure RBAC roles to your developers before they can use the connection. For more information, see [Scenario: Connections using Microsoft Entra ID](rbac-azure-ai-foundry.md#scenario-connections-using-microsoft-entra-id-authentication).
58
58
59
59
60
60
The following table shows the supported Azure cloud-based storage services and authentication methods:
@@ -84,7 +84,7 @@ A Uniform Resource Identifier (URI) represents a storage location on your local
84
84
85
85
Connections allow you to securely store credentials, authenticate access, and consume data and information. Secrets associated with connections are securely persisted in the corresponding Azure Key Vault, adhering to robust security and compliance standards. As an administrator, you can audit both shared and project-scoped connections on a hub level.
86
86
87
-
Azure connections serve as key vault proxies, and interactions with connections are direct interactions with an Azure key vault. Azure AI Foundry connections store API keys securely, as secrets, in a key vault. The key vault [Azure role-based access control (Azure RBAC)](./rbac-ai-foundry.md) controls access to these connection resources. A connection references the credentials from the key vault storage location for further use. You won't need to directly deal with the credentials after they're stored in the hub's key vault. You have the option to store the credentials in the YAML file. A CLI command or SDK can override them. We recommend that you avoid credential storage in a YAML file, because a security breach could lead to a credential leak.
87
+
Azure connections serve as key vault proxies, and interactions with connections are direct interactions with an Azure key vault. Azure AI Foundry connections store API keys securely, as secrets, in a key vault. The key vault [Azure role-based access control (Azure RBAC)](./rbac-azure-ai-foundry.md) controls access to these connection resources. A connection references the credentials from the key vault storage location for further use. You won't need to directly deal with the credentials after they're stored in the hub's key vault. You have the option to store the credentials in the YAML file. A CLI command or SDK can override them. We recommend that you avoid credential storage in a YAML file, because a security breach could lead to a credential leak.
Copy file name to clipboardExpand all lines: articles/ai-foundry/concepts/evaluation-metrics-built-in.md
+108Lines changed: 108 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -293,6 +293,114 @@ Generation quality metrics are used to assess the overall quality of the content
293
293
294
294
:::image type="content" source="../media/evaluations/quality-evaluation-diagram.png" alt-text="Diagram of generation quality metric workflow." lightbox="../media/evaluations/quality-evaluation-diagram.png":::
295
295
296
+
### AI-assisted: Intent Resolution
297
+
298
+
| Score characteristics | Score details |
299
+
| ----- | --- |
300
+
| Score range | 1 to 5 where 1 is the lowest quality and 5 is the highest quality. |
301
+
| What is this metric? | Intent Resolution measures how well an agent identifies a user’s request, including how well it scopes the user’s intent, asks clarifying questions, and reminds end users of its scope of capabilities.|
302
+
| How does it work? | The metric is calculated by instructing a language model to follow the definition (in the description) and a set of grading rubrics, evaluate the user inputs, and output a score on a 5-point scale (higher means better quality). See the following definition and grading rubric. |
303
+
| When to use it? | The recommended scenario is evaluating agent’s ability to identify user intents from agent interactions. |
304
+
| What does it need as input? | Query, Response, Tool Definitions (optional) |
305
+
306
+
Our definition and grading rubrics to be used by the Large Language Model judge to score this metric:
307
+
308
+
**Definition:**
309
+
310
+
Intent Resolution assesses the quality of the response given in relation to a query from a user, specifically focusing on the agent’s ability to understand and resolve the user intent expressed in the query. There's also a field for tool definitions describing the functions, if any, that are accessible to the agent and that the agent might invoke in the response if necessary.
311
+
312
+
**Ratings:**
313
+
314
+
| Intent Resolution | Definition |
315
+
| ---|---|
316
+
| Intent Resolution 1: Response completely unrelated to user intent. | The agent's response doesn't address the query at all. |
317
+
| Intent Resolution 2: Response minimally relates to user intent. | The response shows a token attempt to address the query by mentioning a relevant keyword or concept, but it provides almost no useful or actionable information.|
318
+
| Intent Resolution 3: Response partially addresses the user intent but lacks complete details. | The response provides a basic idea related to the query by mentioning a few relevant elements, but it omits several key details and specifics needed for fully resolving the user's query. |
319
+
| Intent Resolution 4: Response addresses the user intent with moderate accuracy but has minor inaccuracies or omissions. | The response offers a moderately detailed answer that includes several specific elements relevant to the query, yet it still lacks some finer details or complete information. |
320
+
| Intent Resolution 5: Response directly addresses the user intent and fully resolves it. | The response provides a complete, detailed, and accurate answer that fully resolves the user's query with all necessary information and precision. |
321
+
322
+
### AI-assisted: Tool Call Accuracy
323
+
324
+
| Score characteristics | Score details |
325
+
| ----- | --- |
326
+
| Score range | 1 to 5 where 1 is the lowest quality and 5 is the highest quality. |
327
+
| What is this metric? | Tool Call Accuracy measures an agent’s ability to select appropriate tools, extract, and process correct parameters from previous steps of the agentic workflow. It detects whether each tool call made is accurate (binary) and reports back the average scores, which can be interpreted as a passing rate across tool calls made. |
328
+
| How does it work? | The metric is calculated by instructing a language model to follow the definition (in the description) and a set of grading rubrics, evaluate the user inputs, and output a score on a 5-point scale (higher means better quality). See the following definition and grading rubric. |
329
+
| When to use it? | The recommended scenario is evaluating agent’s ability to select the right tools and parameters from agentic interactions. |
330
+
| What does it need as input? | Query, Response, or Tool Calls, Tool Definitions |
331
+
332
+
Our definition and grading rubrics to be used by the Large Language Model judge to score this metric:
333
+
334
+
**Definition:**
335
+
336
+
Tool Call Accuracy returns the correctness of a single tool call, or the passing rate of the correct tool calls among multiple ones. A correct tool call considers relevance and potential usefulness, including syntactic and semantic correctness of a proposed tool call from an intelligent system. The judgment for each tool call is based on the following provided criteria, user query, and the tool definitions available to the agent.
337
+
338
+
**Ratings:**
339
+
340
+
Criteria for an inaccurate tool call:
341
+
342
+
- The tool call isn't relevant and won't help resolve the user's need.
343
+
- The tool call includes parameters values that aren't present or inferred from previous interaction.
344
+
- The tool call has parameters not present in tool definitions.
345
+
346
+
Criteria for an accurate tool call:
347
+
348
+
- The tool call is directly relevant and very likely to help resolve the user's need.
349
+
- The tool call includes parameters values that are present or inferred from previous interaction.
350
+
- The tool call has parameters present in tool definitions.
351
+
352
+
## AI-assisted: Task Adherence
353
+
354
+
| Score characteristics | Score details |
355
+
| ----- | --- |
356
+
| Score range | 1 to 5 where 1 is the lowest quality and 5 is the highest quality. |
357
+
| What is this metric? | Task Adherence measures how well an agent’s response adheres to their assigned tasks, according to their task instruction (extracted from system message and user query), and available tools. |
358
+
| How does it work? | The metric is calculated by instructing a language model to follow the definition (in the description) and a set of grading rubrics, evaluate the user inputs, and output a score on a 5-point scale (higher means better quality). See the following definition and grading rubric. |
359
+
| When to use it? | The recommended scenario is evaluating agent’s ability to adhere to assigned tasks. |
360
+
| What does it need as input? | Query, Response, Tool Definitions (optional) |
361
+
362
+
Our definition and grading rubrics to be used by the Large Language Model judge to score this metric:
363
+
364
+
**Definition:**
365
+
366
+
Task Adherence assesses the quality of the response given in relation to a query from a user, specifically focusing on the agent’s ability to understand and resolve the user intent expressed in the query. There's also a field for tool definitions describing the functions, if any, that are accessible to the agent and that the agent might invoke in the response if necessary.
367
+
368
+
**Ratings:**
369
+
370
+
| Task Adherence | Definition |
371
+
| ---| ---|
372
+
| Task Adherence 1: Fully inadherent | The response completely ignores instructions or deviates significantly. |
373
+
| Task Adherence 2: Barely adherent | The response partially aligns with instructions but has critical gaps.|
374
+
| Task Adherence 3: Moderately adherent | The response meets the core requirements but lacks precision or clarity. |
375
+
| Task Adherence 4: Mostly adherent | The response is clear, accurate, and aligns with instructions with minor issues. |
376
+
| Task Adherence 5: Fully Adherent | The response is flawless, accurate, and follows instructions to the letter.|
377
+
378
+
## AI-assisted: Response Completeness
379
+
380
+
| Score characteristics | Score details |
381
+
| ----- | --- |
382
+
| Score range | 1 to 5 where 1 is the lowest quality and 5 is the highest quality. |
383
+
| What is this metric? | Response Completeness measures how comprehensive an agent’s response is when compared with the ground truth provided. |
384
+
| How does it work? | The metric is calculated by instructing a language model to follow the definition (in the description) and a set of grading rubrics, evaluate the user inputs, and output a score on a 5-point scale (higher means better quality). See the following definition and grading rubric. |
385
+
| When to use it? | The recommended scenario is evaluating agent’s final response to be comprehensive with respect to the ground truth provided. |
386
+
| What does it need as input? | Response, Ground Truth |
387
+
388
+
Our definition and grading rubrics to be used by the Large Language Model judge to score this metric:
389
+
390
+
**Definition:**
391
+
392
+
Response Completeness refers to how accurately and thoroughly a response represents the information provided in the ground truth. It considers both the inclusion of all relevant statements and the correctness of those statements. Each statement in the ground truth should be evaluated individually to determine if it is accurately reflected in the response.
393
+
394
+
**Ratings:**
395
+
396
+
| Response Completeness | Definition |
397
+
| ---| ---|
398
+
| Response Completeness 1: Fully incomplete |The response is considered fully incomplete if it doesn't contain any the necessary and relevant information with respect to the ground truth. In other words, it completely misses all the information, especially claims and statements, established in the ground truth. |
399
+
| Response Completeness 2: Barely complete | The response is considered barely complete if it only contains a small percentage of all the necessary and relevant information with respect to the ground truth. In other words, it misses almost all the information, especially claims and statements, established in the ground truth. |
400
+
| Response Completeness 3: Moderately complete | The response is considered moderately complete if it contains half of the necessary and relevant information with respect to the ground truth. In other words, it misses half of the information, especially claims and statements, established in the ground truth. |
401
+
| Response Completeness 4: Mostly complete | The response is considered mostly complete if it contains most of the necessary and relevant information with respect to the ground truth. In other words, it misses some minor information, especially claims and statements, established in the ground truth. |
402
+
| Response Completeness 5: Fully complete | The response is considered complete if it perfectly contains all the necessary and relevant information with respect to the ground truth. In other words, it doesn't miss any information from statements and claims in the ground truth. |
Copy file name to clipboardExpand all lines: articles/ai-foundry/concepts/management-center.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,7 @@ Assign roles, manage users, and ensure that all settings comply with organizatio
40
40
41
41
:::image type="content" source="../media/management-center/user-management.png" alt-text="Screenshot of the user management section of the management center." lightbox="../media/management-center/user-management.png":::
42
42
43
-
For more information, see [Role-based access control](rbac-ai-foundry.md#assigning-roles-in-azure-ai-foundry-portal).
43
+
For more information, see [Role-based access control](rbac-azure-ai-foundry.md#assigning-roles-in-azure-ai-foundry-portal).
0 commit comments