- 
                Notifications
    You must be signed in to change notification settings 
- Fork 18
Feat: Add custom detectors to the built in detector #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add custom detectors to the built in detector #51
Conversation
| Reviewer's GuideThis PR extends the existing built-in detector service by integrating a custom detectors plugin: it introduces a CustomDetectorRegistry that discovers and wraps user-supplied functions, registers it in the FastAPI lifecycle, adjusts request handling to only invoke explicitly requested detectors, and adds logging, error handling, supporting modules, and tests. Sequence diagram for content detection with custom detectorssequenceDiagram
    participant Client
    participant FastAPI
    participant CustomDetectorRegistry
    participant custom_func_wrapper
    Client->>FastAPI: POST /api/v1/text/contents (with detector_params)
    FastAPI->>CustomDetectorRegistry: handle_request(content, detector_params)
    CustomDetectorRegistry->>custom_func_wrapper: custom_func_wrapper(func, func_name, content)
    custom_func_wrapper-->>CustomDetectorRegistry: ContentAnalysisResponse or None
    CustomDetectorRegistry-->>FastAPI: List[ContentAnalysisResponse]
    FastAPI-->>Client: ContentsAnalysisResponse
Class diagram for CustomDetectorRegistry and custom detector integrationclassDiagram
    class BaseDetectorRegistry
    class CustomDetectorRegistry {
        +registry: dict
        +__init__()
        +handle_request(content: str, detector_params: dict): List[ContentAnalysisResponse]
    }
    class ContentAnalysisResponse
    class custom_func_wrapper {
        +custom_func_wrapper(func: Callable, func_name: str, s: str): Optional[ContentAnalysisResponse]
    }
    BaseDetectorRegistry <|-- CustomDetectorRegistry
    CustomDetectorRegistry o-- custom_func_wrapper
    CustomDetectorRegistry o-- ContentAnalysisResponse
    custom_func_wrapper o-- ContentAnalysisResponse
    class "custom_detectors.custom_detectors" {
        +over_100_characters(text: str): bool
        +contains_word(text: str): bool
    }
    CustomDetectorRegistry o-- "custom_detectors.custom_detectors"
File-Level Changes
 Tips and commandsInteracting with Sourcery
 Customizing Your ExperienceAccess your dashboard to: 
 Getting Help
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `detectors/built_in/app.py:33` </location>
<code_context>
-
 @app.post("/api/v1/text/contents", response_model=ContentsAnalysisResponse)
 def detect_content(request: ContentAnalysisHttpRequest):
+    logging.info(f"Request for {request.detector_params}")
+
     detections = []
</code_context>
<issue_to_address>
**suggestion:** Consider using the module-level logger instead of the root logger for consistency.
Use 'logger.info' instead of 'logging.info' to ensure consistent logging and easier configuration management.
```suggestion
    logger.info(f"Request for {request.detector_params}")
```
</issue_to_address>
### Comment 2
<location> `detectors/built_in/custom_detectors_wrapper.py:54` </location>
<code_context>
+
+    def handle_request(self, content: str, detector_params: dict) -> List[ContentAnalysisResponse]:
+        detections = []
+        if "custom" in detector_params and isinstance(detector_params["custom"], (list, str)):
+            custom_functions = detector_params["custom"]
+            custom_functions = [custom_functions] if isinstance(custom_functions, str) else custom_functions
</code_context>
<issue_to_address>
**suggestion:** Type checking for 'custom' parameter may miss edge cases with unexpected types.
Unexpected types for 'custom' are ignored without notice. Logging or handling these cases would improve debuggability.
</issue_to_address>
### Comment 3
<location> `tests/detectors/builtIn/test_custom.py:14-22` </location>
<code_context>
+
+
+
+    def test_custom_detectors(self, client):
+        payload = {
+            "contents": ["What is an apple?"],
+            "detector_params": {"custom": ["contains_word"]}
+        }
+        resp = client.post("/api/v1/text/contents", json=payload)
+        assert resp.status_code == 200
+        texts = [d["text"] for d in resp.json()[0]]
+        assert "What is an apple?" in texts
+
</code_context>
<issue_to_address>
**suggestion (testing):** Missing negative and error case tests for custom detectors.
Add tests for scenarios where the custom detector does not match any content and for invalid detector names to ensure proper error handling and coverage of negative cases.
```suggestion
    def test_custom_detectors(self, client):
        payload = {
            "contents": ["What is an apple?"],
            "detector_params": {"custom": ["contains_word"]}
        }
        resp = client.post("/api/v1/text/contents", json=payload)
        assert resp.status_code == 200
        texts = [d["text"] for d in resp.json()[0]]
        assert "What is an apple?" in texts
    def test_custom_detector_no_match(self, client):
        payload = {
            "contents": ["Bananas are yellow."],
            "detector_params": {"custom": ["contains_word"]}
        }
        resp = client.post("/api/v1/text/contents", json=payload)
        assert resp.status_code == 200
        texts = [d["text"] for d in resp.json()[0]]
        assert "Bananas are yellow." not in texts
    def test_custom_detector_invalid_name(self, client):
        payload = {
            "contents": ["What is an apple?"],
            "detector_params": {"custom": ["non_existent_detector"]}
        }
        resp = client.post("/api/v1/text/contents", json=payload)
        assert resp.status_code == 400 or resp.status_code == 422
        # Optionally check for error message in response
        # assert "invalid detector" in resp.text.lower()
```
</issue_to_address>
### Comment 4
<location> `tests/detectors/builtIn/test_custom.py:4` </location>
<code_context>
+import pytest
+from fastapi.testclient import TestClient
+
+class TestRegexDetectors:
+    @pytest.fixture
+    def client(self):
</code_context>
<issue_to_address>
**nitpick:** Test class name is misleading.
Since this class only tests custom detectors, renaming it to 'TestCustomDetectors' would improve clarity.
</issue_to_address>
### Comment 5
<location> `detectors/built_in/custom_detectors_wrapper.py:58-67` </location>
<code_context>
    def handle_request(self, content: str, detector_params: dict) -> List[ContentAnalysisResponse]:
        detections = []
        if "custom" in detector_params and isinstance(detector_params["custom"], (list, str)):
            custom_functions = detector_params["custom"]
            custom_functions = [custom_functions] if isinstance(custom_functions, str) else custom_functions
            for custom_function in custom_functions:
                if self.registry.get(custom_function):
                    try:
                        result = custom_func_wrapper(self.registry[custom_function], custom_function, content)
                        if result is not None:
                            detections.append(result)
                    except Exception as e:
                        logger.error(e)
                        raise HTTPException(status_code=400, detail="Detection error, check detector logs")
                else:
                    raise HTTPException(status_code=400, detail=f"Unrecognized custom function: {custom_function}")
        return detections
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
- Explicitly raise from a previous error ([`raise-from-previous-error`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/raise-from-previous-error/))
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|  | ||
| def handle_request(self, content: str, detector_params: dict) -> List[ContentAnalysisResponse]: | ||
| detections = [] | ||
| if "custom" in detector_params and isinstance(detector_params["custom"], (list, str)): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Type checking for 'custom' parameter may miss edge cases with unexpected types.
Unexpected types for 'custom' are ignored without notice. Logging or handling these cases would improve debuggability.
| def test_custom_detectors(self, client): | ||
| payload = { | ||
| "contents": ["What is an apple?"], | ||
| "detector_params": {"custom": ["contains_word"]} | ||
| } | ||
| resp = client.post("/api/v1/text/contents", json=payload) | ||
| assert resp.status_code == 200 | ||
| texts = [d["text"] for d in resp.json()[0]] | ||
| assert "What is an apple?" in texts | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Missing negative and error case tests for custom detectors.
Add tests for scenarios where the custom detector does not match any content and for invalid detector names to ensure proper error handling and coverage of negative cases.
| def test_custom_detectors(self, client): | |
| payload = { | |
| "contents": ["What is an apple?"], | |
| "detector_params": {"custom": ["contains_word"]} | |
| } | |
| resp = client.post("/api/v1/text/contents", json=payload) | |
| assert resp.status_code == 200 | |
| texts = [d["text"] for d in resp.json()[0]] | |
| assert "What is an apple?" in texts | |
| def test_custom_detectors(self, client): | |
| payload = { | |
| "contents": ["What is an apple?"], | |
| "detector_params": {"custom": ["contains_word"]} | |
| } | |
| resp = client.post("/api/v1/text/contents", json=payload) | |
| assert resp.status_code == 200 | |
| texts = [d["text"] for d in resp.json()[0]] | |
| assert "What is an apple?" in texts | |
| def test_custom_detector_no_match(self, client): | |
| payload = { | |
| "contents": ["Bananas are yellow."], | |
| "detector_params": {"custom": ["contains_word"]} | |
| } | |
| resp = client.post("/api/v1/text/contents", json=payload) | |
| assert resp.status_code == 200 | |
| texts = [d["text"] for d in resp.json()[0]] | |
| assert "Bananas are yellow." not in texts | |
| def test_custom_detector_invalid_name(self, client): | |
| payload = { | |
| "contents": ["What is an apple?"], | |
| "detector_params": {"custom": ["non_existent_detector"]} | |
| } | |
| resp = client.post("/api/v1/text/contents", json=payload) | |
| assert resp.status_code == 400 or resp.status_code == 422 | |
| # Optionally check for error message in response | |
| # assert "invalid detector" in resp.text.lower() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @RobGeada , overall LGTM with one comment and 1 nit (+1 for sourcery's logger instead of logging suggestion)
650065c    to
    a86f66a      
    Compare
  
    | PR image build completed successfully! 📦 Huggingface PR image:  | 
Summary by Sourcery
Add support for custom detectors by introducing a new CustomDetectorRegistry to load and invoke user-defined functions, update the API to register and process only the detectors requested with improved error handling and logging, and include example detector functions along with an integration test.
New Features:
Enhancements:
Tests:
Chores: