Skip to content

Commit 9da0de8

Browse files
committed
updated quality index defintiion
1 parent ba19f31 commit 9da0de8

File tree

1 file changed

+2
-9
lines changed

1 file changed

+2
-9
lines changed

articles/ai-foundry/concepts/model-benchmarks.md

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ Model benchmarks assess LLMs and SLMs across the following categories: quality,
3737

3838
Azure AI assesses the quality of LLMs and SLMs across various metrics that are grouped into two main categories: accuracy, and prompt-assisted metrics:
3939

40+
4041
For accuracy metric:
4142

4243
| Metric | Description |
@@ -53,15 +54,7 @@ For prompt-assisted metrics:
5354
| Groundedness | Groundedness measures how well the language model's generated answers align with information from the input source. |
5455
| Relevance | Relevance measures the extent to which the language model's generated responses are pertinent and directly related to the given questions. |
5556

56-
Azure AI also displays the quality index as follows:
57-
58-
| Index | Description |
59-
|-------|-------------|
60-
| Quality index | Quality index is calculated by scaling down GPTSimilarity between zero and one, followed by averaging with accuracy metrics. Higher values of quality index are better. |
61-
62-
The quality index represents the average score of the applicable primary metric (accuracy, rescaled GPTSimilarity) over 15 standard datasets and is provided on a scale of zero to one.
63-
64-
Quality index constitutes two categories of metrics:
57+
Quality includes two categories of metrics:
6558

6659
- Accuracy (for example, exact match or `pass@k`). Ranges from zero to one.
6760
- Prompt-based metrics (for example, GPTSimilarity, groundedness, coherence, fluency, and relevance). Ranges from one to five.

0 commit comments

Comments
 (0)