Skip to content

Conversation

@eldarkurtic
Copy link

I think that the guard if n > 0 else 0.0 might be wrong here because precision and F1 metrics are well-defined for tests which don't have any "negative" samples.

Running the current tutorial code from https://github.com/AmenRa/GuardBench/blob/main/docs/llama_guard.md results with the following table:

2026-01-08 at 21:25:07 | GuardBench | INFO     | Results:
| Dataset                  | F1    | Recall   |
|--------------------------|-------|----------|
| AdvBench Behaviors       | 0.000 | 0.837    |
| AdvBench Strings         | 0.000 | 0.808    |
| BeaverTails 330k         | 0.685 | 0.545    |
| Bot-Adversarial Dialogue | 0.634 | 0.729    |
2026-01-08 at 21:25:07 | GuardBench | SUCCESS  | Done

Notice 0.0 for AdvBench Behaviors/Strings. This is coming due to the fact that these tests have only positive samples.

After the fix in this PR, the table looks as expected and showcased in your tutorial at https://github.com/AmenRa/GuardBench/blob/main/docs/llama_guard.md

2026-01-08 at 22:25:07 | GuardBench | INFO     | Results:
| Dataset                  | F1    | Recall   |
|--------------------------|-------|----------|
| AdvBench Behaviors       | 0.911 | 0.837    |
| AdvBench Strings         | 0.894 | 0.808    |
| BeaverTails 330k         | 0.685 | 0.545    |
| Bot-Adversarial Dialogue | 0.634 | 0.729    |
2026-01-08 at 21:25:07 | GuardBench | SUCCESS  | Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant