File tree Expand file tree Collapse file tree 3 files changed +8
-5
lines changed
Expand file tree Collapse file tree 3 files changed +8
-5
lines changed Original file line number Diff line number Diff line change @@ -82,10 +82,12 @@ This benchmark evaluates model performance on a balanced set of social media pos
8282
8383| Model | ROC AUC | Prec@R=0.80 | Prec@R=0.90 | Prec@R=0.95 | Recall@FPR=0.01 |
8484| --------------| ---------| -------------| -------------| -------------| -----------------|
85- | gpt-4.1 | 0.989 | 0.976 | 0.962 | 0.962 | 0.717 |
86- | gpt-4.1-mini (default) | 0.984 | 0.977 | 0.977 | 0.943 | 0.653 |
87- | gpt-4.1-nano | 0.952 | 0.972 | 0.823 | 0.823 | 0.429 |
88- | gpt-4o-mini | 0.965 | 0.977 | 0.955 | 0.945 | 0.842 |
85+ | gpt-5 | 0.9532 | 0.9195 | 0.9096 | 0.9068 | 0.0339 |
86+ | gpt-5-mini | 0.9629 | 0.9321 | 0.9168 | 0.9149 | 0.0998 |
87+ | gpt-5-nano | 0.9600 | 0.9297 | 0.9216 | 0.9175 | 0.1078 |
88+ | gpt-4.1 | 0.9603 | 0.9312 | 0.9249 | 0.9192 | 0.0439 |
89+ | gpt-4.1-mini (default) | 0.9520 | 0.9180 | 0.9130 | 0.9049 | 0.0459 |
90+ | gpt-4.1-nano | 0.9502 | 0.9262 | 0.9094 | 0.9043 | 0.0379 |
8991
9092** Notes:**
9193
Original file line number Diff line number Diff line change 3838 - " Streaming vs Blocking " : streaming_output.md
3939 - Tripwires : tripwires.md
4040 - Checks :
41- - Prompt Injection Detection : ref/checks/prompt_injection_detection.md
4241 - Contains PII : ref/checks/pii.md
4342 - Custom Prompt Check : ref/checks/custom_prompt_check.md
4443 - Hallucination Detection : ref/checks/hallucination_detection.md
4544 - Jailbreak Detection : ref/checks/jailbreak.md
4645 - Moderation : ref/checks/moderation.md
46+ - NSFW : ref/checks/nsfw.md
4747 - Off Topic Prompts : ref/checks/off_topic_prompts.md
48+ - Prompt Injection Detection : ref/checks/prompt_injection_detection.md
4849 - URL Filter : ref/checks/urls.md
4950 - Evaluation Tool : evals.md
5051 - API Reference :
You can’t perform that action at this time.
0 commit comments