-
Couldn't load subscription status.
- Fork 18
feat: better handle the multilabel classifiers by tweaking the response format #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: better handle the multilabel classifiers by tweaking the response format #44
Conversation
…o handle mulitclass classifiers
Reviewer's GuideThis PR refactors the HuggingFace detector to support threshold-based multi-label classification with environment- and parameter-based label exclusion, aligns schema imports for consistency, and exposes configurable detector parameters throughout. Sequence diagram for multi-label sequence classification with threshold and label exclusionsequenceDiagram
participant User as actor User
participant App as HuggingFace Detector
User->>App: Submit ContentAnalysisHttpRequest (with text, detector_params)
App->>App: process_sequence_classification(text, detector_params)
App->>App: For each label, check if prob >= threshold and not in safe_labels
App->>App: For each valid label, create ContentAnalysisResponse
App-->>User: Return ContentsAnalysisResponse (multiple responses if multi-label)
Class diagram for updated Detector class with multi-label supportclassDiagram
class Detector {
+risk_names: list
+model: Any
+cuda_device: Any
+model_name: str
+safe_labels: set
+__init__()
+process_causal_lm(text)
+process_sequence_classification(text, detector_params=None, threshold=None)
+run(input: ContentAnalysisHttpRequest) : ContentsAnalysisResponse
+close()
}
class ContentAnalysisHttpRequest
class ContentAnalysisResponse
class ContentsAnalysisResponse
Detector --> ContentAnalysisHttpRequest
Detector --> ContentAnalysisResponse
Detector --> ContentsAnalysisResponse
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @m-misiura - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `detectors/huggingface/detector.py:223` </location>
<code_context>
- Returns:
- List[ContentAnalysisResponse]: List of content analysis results.
- """
+ def process_sequence_classification(self, text, detector_params=None, threshold=None):
+ detector_params = detector_params or {}
+ if threshold is None:
+ threshold = detector_params.get("threshold", 0.5)
+ # Merge safe_labels from env and request
+ request_safe_labels = set(detector_params.get("safe_labels", []))
+ all_safe_labels = set(self.safe_labels) | request_safe_labels
content_analyses = []
tokenized = self.tokenizer(
</code_context>
<issue_to_address>
Combining safe_labels from environment and request may lead to type mismatches.
Because safe_labels may contain both ints and strings, and id2label keys are usually ints, mismatched types could cause incorrect filtering. Normalize types in all_safe_labels and comparison values to prevent subtle bugs.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def process_sequence_classification(self, text, detector_params=None, threshold=None): | ||
| detector_params = detector_params or {} | ||
| if threshold is None: | ||
| threshold = detector_params.get("threshold", 0.5) | ||
| # Merge safe_labels from env and request | ||
| request_safe_labels = set(detector_params.get("safe_labels", [])) | ||
| all_safe_labels = set(self.safe_labels) | request_safe_labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Combining safe_labels from environment and request may lead to type mismatches.
Because safe_labels may contain both ints and strings, and id2label keys are usually ints, mismatched types could cause incorrect filtering. Normalize types in all_safe_labels and comparison values to prevent subtle bugs.
This PR improves the HuggingFace detector for multi-label sequence classification tasks:
AutoModelForSequenceClassificationthe detector now returns aContentAnalysisResponsefor each label whose score exceeds a configurable threshold, supporting multi-label outputs.non_trigger_labelsparameter.scheme.py] from thecommonsubdirectory to ensure schema consistency with other detectors.Summary by Sourcery
Improve HuggingFace detector to support multi-label sequence classification by generating individual responses per label above a configurable threshold and excluding safe labels from detection through environment or request parameters
New Features:
Enhancements: