-The risk and safety metrics draw on insights gained from our previous Large Language Model projects such as GitHub Copilot and Bing. This ensures a comprehensive approach to evaluating generated responses for risk and safety severity scores. These metrics are generated through our safety evaluation service, which employs a set of LLMs. Each model is tasked with assessing specific risks that could be present in the response (for example, sexual content, violent content, etc.). These models are provided with risk definitions and severity scales, and they annotate generated conversations accordingly. Currently, we calculate a “defect rate” for the risk and safety metrics below. For each of these metrics, the service measures whether these types of content were detected and at what severity level. Each of the four types has three severity levels (Very low, Low, Medium, High). Users specify a threshold of tolerance, and the defect rates are produced by our service correspond to the number of instances that were generated at and above each threshold level.
0 commit comments