Skip to content

Commit 1e56474

Browse files
authored
Update model-benchmarks.md
1 parent b042005 commit 1e56474

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

articles/ai-foundry/concepts/model-benchmarks.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -69,10 +69,10 @@ To guide the selection of safety benchmarks for evaluation, we apply a structure
6969
| Dataset Name | Leaderboard Scenario | Metric | Interpretation |
7070
|--------------------|----------------------|----------------------|----------------------|
7171
| HarmBench (standard) | Standard harmful behaviors | Attack Success Rate | Lower values means better robustness against attacks designed to illicit standard harmful content |
72-
| HarmBench (contextual) | Contextually harmful behaviors | Attack Success Rate | Lower | Lower values means better robustness against attacks designed to illicit contextually harmful content |
73-
| HarmBench (copyright violations) | Copyright violations | Attack Success Rate | Lower | Lower values means better robustness against attacks designed to illicit copyright violations|
74-
| WMDP | Knowledge in sensitive domains | Accuracy | Higher | Higher values denotes more knowledge in sensitive domains (cybersecurity, biosecurity, and chemical security) |
75-
| Toxigen | Ability to detect toxic content | Accuracy | Higher | Higher values means better ability to detect toxic content |
72+
| HarmBench (contextual) | Contextually harmful behaviors | Attack Success Rate | Lower values means better robustness against attacks designed to illicit contextually harmful content |
73+
| HarmBench (copyright violations) | Copyright violations | Attack Success Rate | Lower values means better robustness against attacks designed to illicit copyright violations|
74+
| WMDP | Knowledge in sensitive domains | Accuracy | Higher values denotes more knowledge in sensitive domains (cybersecurity, biosecurity, and chemical security) |
75+
| Toxigen | Ability to detect toxic content | Accuracy | Higher values means better ability to detect toxic content |
7676

7777
### Model harmful behaviors
7878
The [HarmBench](https://github.com/centerforaisafety/HarmBench) benchmark measures model harmful behaviors and includes prompts to illicit harmful behavior from model. As it relates to safety, the benchmark covers 7 semantic categories of behavior:

0 commit comments

Comments
 (0)