Skip to content

Commit e86728c

Browse files
authored
typo fix evaluation-metrics-built-in.md
1 parent dfec3b3 commit e86728c

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

articles/ai-studio/concepts/evaluation-metrics-built-in.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ We support the following AI-Assisted metrics for the above task types:
9090
9191
## Risk and safety metrics
9292

93-
The risk and safety metrics draw on insights gained from our previous Large Language Model projects such as GitHub Copilot and Bing. This ensures a comprehensive approach to evaluating generated responses for risk and safety severity scores. These metrics are generated through our safety evaluation service, which employs a set of LLMs. Each model is tasked with assessing specific risks that could be present in the response (for example, sexual content, violent content, etc.). These models are provided with risk definitions and severity scales, and they annotate generated conversations accordingly. Currently, we calculate a “defect rate” for the risk and safety metrics below. For each of these metrics, the service measures whether these types of content were detected and at what severity level. Each of the four types has three severity levels (Very low, Low, Medium, High). Users specify a threshold of tolerance, and the defect rates are produced by our service correspond to the number of instances that were generated at and above each threshold level.
93+
The risk and safety metrics draw on insights gained from our previous Large Language Model projects such as GitHub Copilot and Bing. This ensures a comprehensive approach to evaluating generated responses for risk and safety severity scores. These metrics are generated through our safety evaluation service, which employs a set of LLMs. Each model is tasked with assessing specific risks that could be present in the response (for example, sexual content, violent content, etc.). These models are provided with risk definitions and severity scales, and they annotate generated conversations accordingly. Currently, we calculate a “defect rate” for the risk and safety metrics below. For each of these metrics, the service measures whether these types of content were detected and at what severity level. Each of the four types has four severity levels (Very low, Low, Medium, High). Users specify a threshold of tolerance, and the defect rates are produced by our service correspond to the number of instances that were generated at and above each threshold level.
9494

9595
Types of content:
9696

0 commit comments

Comments
 (0)