-
Notifications
You must be signed in to change notification settings - Fork 779
feat(detectors): Introduce custom detectors with RegexDetector and FunctionDetector #1484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
6943721
6a4fdd6
76a4659
28ce4cc
484fbc3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| garak.detectors.custom | ||
| ====================== | ||
|
|
||
| .. automodule:: garak.detectors.custom | ||
| :members: | ||
| :undoc-members: | ||
| :show-inheritance: | ||
|
|
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,313 @@ | ||||||
| """Custom detectors for flexible success criteria. | ||||||
|
|
||||||
| This module provides customizable detectors that allow users to define | ||||||
| their own success criteria without writing detector classes: | ||||||
|
|
||||||
| 1. RegexDetector - Match regex patterns in outputs | ||||||
| 2. FunctionDetector - Call user-provided functions for custom logic | ||||||
|
|
||||||
| CLI Examples: | ||||||
| # Regex detector | ||||||
| garak --detectors custom.RegexDetector \\ | ||||||
| --detector_options '{"custom": {"RegexDetector": {"patterns": ["api.?key","sk-[A-Za-z0-9]{32,}"]}}} | ||||||
|
|
||||||
| # Function detector | ||||||
| garak --detectors custom.FunctionDetector \\ | ||||||
| --detector_options '{"custom": {"FunctionDetector": {"function_name": "mymodule#check_harmful"}}}' \\ | ||||||
| --probes dan.Dan_11_0 | ||||||
|
|
||||||
| # Or use config file | ||||||
| garak --detectors custom.RegexDetector \\ | ||||||
| --detector_option_file detector_config.json \\ | ||||||
| --probes dan.Dan_11_0 | ||||||
|
||||||
| """ | ||||||
|
|
||||||
| import importlib | ||||||
| import logging | ||||||
| import re | ||||||
| from typing import List | ||||||
|
|
||||||
| from garak import _config | ||||||
| import garak.attempt | ||||||
| from garak.detectors.base import Detector | ||||||
| from garak.resources.function_detector import dummy | ||||||
|
|
||||||
| class RegexDetector(Detector): | ||||||
| """Detector that matches regex patterns in model outputs. | ||||||
|
|
||||||
| Useful for detecting specific strings, | ||||||
| code patterns, credentials, or any text that matches regex patterns. | ||||||
|
|
||||||
| Configuration: | ||||||
| { | ||||||
| "custom": { | ||||||
| "RegexDetector": { | ||||||
| "patterns": ["pattern1", "pattern2", ...], | ||||||
| "match_type": "any", # or "all" | ||||||
| "case_sensitive": false | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
|
|
||||||
| Examples: | ||||||
| # Detect credentials | ||||||
| --detector_options '{"custom": {"RegexDetector": { | ||||||
| "patterns": ["api.?key", "sk-[A-Za-z0-9]{32,}"] | ||||||
| }}}' | ||||||
|
|
||||||
| # Detect dangerous commands | ||||||
| --detector_options '{"custom": {"RegexDetector": { | ||||||
| "patterns": ["sudo", "rm -rf", "DROP TABLE"], | ||||||
| "match_type": "any", | ||||||
| "case_sensitive": false | ||||||
| }}}' | ||||||
| """ | ||||||
| lang_spec = "*" | ||||||
|
|
||||||
| DEFAULT_PARAMS = Detector.DEFAULT_PARAMS | { | ||||||
| "patterns": [], # users must provide patterns | ||||||
| "match_type": "any", # "any" or "all" | ||||||
| "case_sensitive": False, | ||||||
|
||||||
| "case_sensitive": False, | |
| "re_flags": [ "NOFLAG" ], |
If accepting this idea be sure to add validation in __init__.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced case_sensitive with re_flags - Now accepts a list of re.RegexFlag names (e.g., ["IGNORECASE", "MULTILINE"]) that get combined with |. These are also validated in __init__.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please log and skip, rather than risking aborting the run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid flags, patterns, and match_type are logged as warnings and skipped (won't abort the run)
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please log and skip, rather than risking aborting the run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid flags, patterns, and match_type are logged as warnings and skipped (won't abort the run)
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please log and skip, rather than risking aborting the run. Failing detector can safely return [None] * input_length. Would be best if we accepted detectors returning flat None on overall detector failure but I'm not sure that's possible yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failed detectors (invalid or no patterns, invalid functions) now logged and skipped instead of raising exceptions. I think current behavior is if we have a valid output text, detector must return float (and not None).
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what the best way of handling this is post-inference. There may be other detectors configured, for example. Please log and skip, rather than risking aborting the run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid functions and any problems loading them are now logged and doesn't abort run.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is module discarded once the constructor returns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently module reference is not stored; we only keep self.detection_function. The function object holds a reference to its module, preventing garbage collection
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please log and skip, rather than risking aborting the run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid functions and any problems loading them are now logged and doesn't abort run.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please log and skip, rather than risking aborting the run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid functions and any problems loading them are now logged and doesn't abort run.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| def always_pass(output_text: str) -> float: | ||
| return 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the name "custom" add much? I can see it makes sense to group them as done here. Though looking at the probe names, we have probes.function - detectors.function seems a fine analogue to that. And then detectors.regex could work for the other. I'll think on this some more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This is now divided into
detectors.functionanddetectors.regex.