-
Notifications
You must be signed in to change notification settings - Fork 15
feat(RHOAIENG-28840): Support '/api/v1/text/generation' detections #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(RHOAIENG-28840): Support '/api/v1/text/generation' detections #23
Conversation
Reviewer's GuideThis PR refactors the LLMJudgeDetector to centralize parameter validation and scoring, adds full support for generation analysis (new methods, schemas, and endpoint), updates API handlers to use app-level state management, extends test coverage for the new features and renamed methods, and introduces resource cleanup in the Huggingface detector. Sequence diagram for /api/v1/text/generation endpoint request handlingsequenceDiagram
actor User
participant FastAPI_App as FastAPI App
participant LLMJudgeDetector
User->>FastAPI_App: POST /api/v1/text/generation
FastAPI_App->>LLMJudgeDetector: analyze_generation(request)
LLMJudgeDetector->>LLMJudgeDetector: evaluate_single_generation(prompt, generated_text, params)
LLMJudgeDetector-->>FastAPI_App: GenerationAnalysisResponse
FastAPI_App-->>User: GenerationAnalysisResponse
Class diagram for new and updated LLMJudgeDetector types and methodsclassDiagram
class LLMJudgeDetector {
- judge: Judge
- available_metrics: set
+ __init__()
+ _initialize_judge()
+ _validate_params(params: Dict[str, Any]) Dict[str, Any]
+ _get_score(result: EvaluationResult) float
+ evaluate_single_content(content: str, params: Dict[str, Any]) ContentAnalysisResponse
+ analyze_content(request: ContentAnalysisHttpRequest) ContentsAnalysisResponse
+ evaluate_single_generation(prompt: str, generated_text: str, params: Dict[str, Any]) GenerationAnalysisResponse
+ analyze_generation(request: GenerationAnalysisHttpRequest) GenerationAnalysisResponse
+ close()
}
class GenerationAnalysisHttpRequest {
+ prompt: str
+ generated_text: str
+ detector_params: Optional[Dict[str, Any]]
}
class GenerationAnalysisResponse {
+ detection: str
+ detection_type: str
+ score: float
+ evidences: Optional[List[EvidenceObj]]
+ metadata: dict
}
LLMJudgeDetector --> GenerationAnalysisHttpRequest
LLMJudgeDetector --> GenerationAnalysisResponse
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @saichandrapandraju - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `detectors/llm_judge/detector.py:69` </location>
<code_context>
+ """
+ Get the score from the evaluation result.
+ """
+ if isinstance(result.decision, (int, float)) or result.score is not None:
+ return float(result.score if result.score is not None else result.decision)
+ return 0.0 # FIXME: default to 0 because of non-optional field in schema
</code_context>
<issue_to_address>
Returning 0.0 as a default score may mask underlying issues.
Defaulting to 0.0 may conceal malformed results. Consider raising an exception or logging a warning to better detect such cases.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
def _get_score(self, result: EvaluationResult) -> float:
"""
Get the score from the evaluation result.
"""
if isinstance(result.decision, (int, float)) or result.score is not None:
return float(result.score if result.score is not None else result.decision)
return 0.0 # FIXME: default to 0 because of non-optional field in schema
=======
def _get_score(self, result: EvaluationResult) -> float:
"""
Get the score from the evaluation result.
"""
import logging
logger = logging.getLogger(__name__)
if isinstance(result.decision, (int, float)) or result.score is not None:
return float(result.score if result.score is not None else result.decision)
logger.warning(
"EvaluationResult missing valid score and decision: %r", result
)
raise ValueError("EvaluationResult does not contain a valid score or decision.")
>>>>>>> REPLACE
</suggested_fix>
### Comment 2
<location> `detectors/llm_judge/app.py:26` </location>
<code_context>
+ app.set_detector(LLMJudgeDetector())
+ yield
+ # Clean up resources
+ detector: LLMJudgeDetector = app.get_detector()
+ if detector and hasattr(detector, 'close'):
+ await detector.close()
</code_context>
<issue_to_address>
No check for detector existence before use.
Add a check to handle the case where 'get_detector()' returns None to prevent AttributeError.
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
detectors/huggingface/app.py
Outdated
detector: Detector = app.get_detector() | ||
if not detector: | ||
raise RuntimeError("Detector is not initialized") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Use named expression to simplify assignment and conditional (
use-named-expression
) - Lift code into else after jump in control flow (
reintroduce-else
) - Swap if/else branches (
swap-if-else-branches
)
|
||
Returns: | ||
ContentAnalysisResponse with evaluation results | ||
Make sure the params have valid metric/criteria and scale. | ||
""" | ||
if "metric" not in params: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Swap if/else branches (
swap-if-else-branches
) - Merge else clause's nested if statement into elif (
merge-else-if-into-elif
)
dab267d
to
ab5a358
Compare
This PR extends judge detections to support
/api/v1/text/generation
Detector API.Usage Examples:
Using Builtin metrics:
Request:
Response (with Qwen2.5-7B-instruct):
With custom evaluation setup:
Request:
Response (with Qwen2.5-7B-instruct):
Closes: #21
Summary by Sourcery
Extend the LLMJudgeDetector to support analysis of single LLM generations via a new
/api/v1/text/generation
endpoint, refactor content analysis methods and App lifecycle management, and add comprehensive tests for generation detection.New Features:
/api/v1/text/generation
HTTP handler and corresponding request/response schemas for generation analysis.evaluate_single_generation
andanalyze_generation
methods to LLMJudgeDetector to evaluate generated text against prompts.Enhancements:
run
toanalyze_content
and extract common parameter validation and score extraction into private helper methods.set_detector
/get_detector
/cleanup_detector
and implement properclose
methods for LLMJudge and HuggingFace detectors.Tests:
analyze_content
) and verify detector closing behavior.