-
Notifications
You must be signed in to change notification settings - Fork 764
feat(detectors): Introduce custom detectors with RegexDetector and FunctionDetector #1484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(detectors): Introduce custom detectors with RegexDetector and FunctionDetector #1484
Conversation
|
This could be really cool, thanks. We'll take a look! |
leondz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first draft - needs some validation from our side. do you have some example cases for testing?
garak/detectors/custom.py
Outdated
| CLI Examples: | ||
| # Regex detector | ||
| garak --detectors custom.RegexDetector \\ | ||
| --detector_options '{"custom": {"RegexDetector": {"patterns": ["api.?key","sk-[A-Za-z0-9]{32,}"]}}} | ||
|
|
||
| # Function detector | ||
| garak --detectors custom.FunctionDetector \\ | ||
| --detector_options '{"custom": {"FunctionDetector": {"function_name": "mymodule#check_harmful"}}}' \\ | ||
| --probes dan.Dan_11_0 | ||
|
|
||
| # Or use config file | ||
| garak --detectors custom.RegexDetector \\ | ||
| --detector_option_file detector_config.json \\ | ||
| --probes dan.Dan_11_0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please manage through Configurable mechanism and not CLI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A cli command based on a detector config file is viable here, however reference to using --config for a consolidated example may add clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, yes, missed that these were viable existing options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added respective --config examples.
garak/detectors/custom.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the name "custom" add much? I can see it makes sense to group them as done here. Though looking at the probe names, we have probes.function - detectors.function seems a fine analogue to that. And then detectors.regex could work for the other. I'll think on this some more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This is now divided into detectors.function and detectors.regex.
garak/detectors/custom.py
Outdated
| DEFAULT_PARAMS = Detector.DEFAULT_PARAMS | { | ||
| "patterns": [], # users must provide patterns | ||
| "match_type": "any", # "any" or "all" | ||
| "case_sensitive": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it makes sense to expose multiline / re.M here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to offer this as a set of strings corresponding to allowed enum values from re.RegexFlag to | together. Named something like re_flags?
This could even be initialized to the re.NOFLAG value:
| "case_sensitive": False, | |
| "re_flags": [ "NOFLAG" ], |
If accepting this idea be sure to add validation in __init__.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced case_sensitive with re_flags - Now accepts a list of re.RegexFlag names (e.g., ["IGNORECASE", "MULTILINE"]) that get combined with |. These are also validated in __init__.
garak/detectors/custom.py
Outdated
| # Validate match_type | ||
| self.match_type = str(self.match_type).lower() | ||
| if self.match_type not in ("any", "all"): | ||
| raise ValueError(f"match_type must be 'any' or 'all', got '{self.match_type}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please log and skip, rather than risking aborting the run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid flags, patterns, and match_type are logged as warnings and skipped (won't abort the run)
garak/detectors/custom.py
Outdated
| re.compile(pattern, flags) for pattern in self.patterns | ||
| ] | ||
| except re.error as e: | ||
| raise ValueError(f"Invalid regex pattern: {e}") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please log and skip, rather than risking aborting the run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid flags, patterns, and match_type are logged as warnings and skipped (won't abort the run)
garak/detectors/custom.py
Outdated
| pattern.search(output_text) for pattern in self.compiled_patterns | ||
| ) | ||
| else: | ||
| raise ValueError(f"Invalid match_type: {self.match_type}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please log and skip, rather than risking aborting the run. Failing detector can safely return [None] * input_length. Would be best if we accepted detectors returning flat None on overall detector failure but I'm not sure that's possible yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failed detectors (invalid or no patterns, invalid functions) now logged and skipped instead of raising exceptions. I think current behavior is if we have a valid output text, detector must return float (and not None).
garak/detectors/custom.py
Outdated
| raise ValueError( | ||
| f"function_name must be in format 'module#function', got '{self.function_name}'" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what the best way of handling this is post-inference. There may be other detectors configured, for example. Please log and skip, rather than risking aborting the run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid functions and any problems loading them are now logged and doesn't abort run.
garak/detectors/custom.py
Outdated
| try: | ||
| module = importlib.import_module(module_name) | ||
| except ImportError as e: | ||
| raise ImportError( | ||
| f"Could not import module '{module_name}' for FunctionDetector" | ||
| ) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is module discarded once the constructor returns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently module reference is not stored; we only keep self.detection_function. The function object holds a reference to its module, preventing garbage collection
garak/detectors/custom.py
Outdated
| try: | ||
| self.detection_function = getattr(module, func_name) | ||
| except AttributeError as e: | ||
| raise AttributeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please log and skip, rather than risking aborting the run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid functions and any problems loading them are now logged and doesn't abort run.
garak/detectors/custom.py
Outdated
|
|
||
| # Validate function is callable | ||
| if not callable(self.detection_function): | ||
| raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please log and skip, rather than risking aborting the run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid functions and any problems loading them are now logged and doesn't abort run.
77f295d to
e5c4eb3
Compare
a537c3d to
28ce4cc
Compare
|
Thanks for the review @leondz , updated the PR with suggested changes. Modified the description to reflect these and also added examples for testing. Happy to discuss any remaining concerns! |
Adds custom detectors (RegexDetector, FunctionDetector) that allow users to define their own detection logic.
This is useful when users want to implement custom detection behaviors, especially with user-provided goals or prompts (#1482).
Two new configurable detector types are introduced:
1. RegexDetector – Pattern Matching
Matches regex patterns in model outputs. Useful for detecting credentials, code patterns, or any text matching specific patterns.
re.RegexFlagnames to combine (e.g., ["IGNORECASE", "MULTILINE"])Sample config (
regex_test.yaml) :2. FunctionDetector – Custom Python Logic
(inspired by the
function.pygenerator)always_passFunction signature:
function(output_text: str, **kwargs) -> floatwhere return value is 0.0 (safe) to 1.0 (detected).Sample config (
function_test.yaml) :Sample script (
my_detectors.py)While the provided example shows a simple pattern-based function, I believe
FunctionDetectoris a flexible integration point for sophisticated detection workflows. Since any Python callable can be used, this opens up powerful possibilities:The
module#functionpattern essentially lets you bring any Python-callable detection logic into garak's evaluation pipeline, making it adaptable to diverse requirements.Verification
List the steps needed to make sure this thing works
{"openai": {"OpenAICompatible": {"uri": "https:<placeholder>/v1", "model": "qwen2", "api_key": "DUMMY"}}}python -m pytest tests/