feat(RHOAIENG-28840): Support '/api/v1/text/generation' detections #23

saichandrapandraju · 2025-07-07T17:01:16Z

This PR extends judge detections to support /api/v1/text/generation Detector API.

⚠️ This PR depends on feat: Refactor Detector lifecycle to use FastAPI app.state #18 and should be reviewed after it

Usage Examples:

Using Builtin metrics:

Request:

curl -s -X POST "http://<host>:<port>/api/v1/text/generation"\
   -H 'accept: application/json'\
   -H 'detector-id: llm_judge'\
   -H 'Content-Type: application/json'\
   -d '{
    "prompt":"What is Machine Learning?", 
    "generated_text": "Deep Learning models learn by adjusting weights through backpropagation.", 
    "detector_params": {"metric":"relevance"} 
}'

Response (with Qwen2.5-7B-instruct):

{
  "detection":"NEARLY_IRRELEVANT",
  "detection_type":"llm_judge",
  "score":0.1,
  "evidences":[],
  "metadata": {
      "reasoning": "The provided content discusses deep learning models and backpropagation, which are specific aspects of machine learning. However, it does not directly answer the question 'What is Machine Learning?'"
    }
}

With custom evaluation setup:

Request:

curl -s -X POST "http://<host>:<port>/api/v1/text/generation"\
   -H 'accept: application/json'\
   -H 'detector-id: llm_judge'\
   -H 'Content-Type: application/json'\
   -d '{
    "prompt":"What is Machine Learning?", 
    "generated_text": "Deep Learning models learn by adjusting weights through backpropagation.", 
    "detector_params": {
       "criteria": "technical accuracy for {level} students",
       "template_vars": {"level": "undergraduate"},
       "rubric": "Assign lower scores for inaccurate content and higher scores for accurate ones. Also assign appropriate decision labels like 'ACCURATE', 'INACCURATE' and 'SOMEWHAT_ACCURATE'."
    } 
}'

Response (with Qwen2.5-7B-instruct):

{
  "detection": "INACCURATE",
  "detection_type": "llm_judge",
  "score": 0.2,
  "evidences": [],
  "metadata": {
        "reasoning": "The content provided does not accurately answer the question 'What is Machine Learning?'. It instead gives a detail about a specific subfield or technique within Machine Learning (Deep Learning) without defining Machine Learning itself."
    }
}

Closes: #21

Summary by Sourcery

Extend the LLMJudgeDetector to support analysis of single LLM generations via a new /api/v1/text/generation endpoint, refactor content analysis methods and App lifecycle management, and add comprehensive tests for generation detection.

New Features:

Introduce a /api/v1/text/generation HTTP handler and corresponding request/response schemas for generation analysis.
Add evaluate_single_generation and analyze_generation methods to LLMJudgeDetector to evaluate generated text against prompts.
Implement default parameter validation and unified scoring logic for both content and generation analysis.

Enhancements:

Rename content analysis method run to analyze_content and extract common parameter validation and score extraction into private helper methods.
Unify detector instantiation and cleanup using FastAPI set_detector/get_detector/cleanup_detector and implement proper close methods for LLMJudge and HuggingFace detectors.

Tests:

Add a new test suite covering generation analysis scenarios, including basic metrics, full parameter sets, defaults, invalid metrics, numeric/None scores, and concurrency tests.
Update existing content analysis tests to reflect method renames (analyze_content) and verify detector closing behavior.

sourcery-ai · 2025-07-07T17:01:22Z

Reviewer's Guide

This PR refactors the LLMJudgeDetector to centralize parameter validation and scoring, adds full support for generation analysis (new methods, schemas, and endpoint), updates API handlers to use app-level state management, extends test coverage for the new features and renamed methods, and introduces resource cleanup in the Huggingface detector.

Sequence diagram for /api/v1/text/generation endpoint request handling

sequenceDiagram
    actor User
    participant FastAPI_App as FastAPI App
    participant LLMJudgeDetector
    User->>FastAPI_App: POST /api/v1/text/generation
    FastAPI_App->>LLMJudgeDetector: analyze_generation(request)
    LLMJudgeDetector->>LLMJudgeDetector: evaluate_single_generation(prompt, generated_text, params)
    LLMJudgeDetector-->>FastAPI_App: GenerationAnalysisResponse
    FastAPI_App-->>User: GenerationAnalysisResponse

Class diagram for new and updated LLMJudgeDetector types and methods

classDiagram
    class LLMJudgeDetector {
        - judge: Judge
        - available_metrics: set
        + __init__()
        + _initialize_judge()
        + _validate_params(params: Dict[str, Any]) Dict[str, Any]
        + _get_score(result: EvaluationResult) float
        + evaluate_single_content(content: str, params: Dict[str, Any]) ContentAnalysisResponse
        + analyze_content(request: ContentAnalysisHttpRequest) ContentsAnalysisResponse
        + evaluate_single_generation(prompt: str, generated_text: str, params: Dict[str, Any]) GenerationAnalysisResponse
        + analyze_generation(request: GenerationAnalysisHttpRequest) GenerationAnalysisResponse
        + close()
    }
    class GenerationAnalysisHttpRequest {
        + prompt: str
        + generated_text: str
        + detector_params: Optional[Dict[str, Any]]
    }
    class GenerationAnalysisResponse {
        + detection: str
        + detection_type: str
        + score: float
        + evidences: Optional[List[EvidenceObj]]
        + metadata: dict
    }
    LLMJudgeDetector --> GenerationAnalysisHttpRequest
    LLMJudgeDetector --> GenerationAnalysisResponse

File-Level Changes

Change	Details	Files
Extend LLMJudgeDetector with unified parameter handling and generation analysis support	Extract _validate_params and _get_score to consolidate defaulting and numeric decision handling Refactor evaluate_single_content to use the new helpers Add evaluate_single_generation and analyze_generation methods Annotate judge attribute and adjust imports	`detectors/llm_judge/detector.py`
Expose new generation analysis endpoint and refactor content handler to use app state	Rename content handler to detector_content_analysis_handler and adjust call to analyze_content Add /api/v1/text/generation POST handler invoking analyze_generation Replace global detector_objects dict with app.set_detector/get_detector and cleanup methods	`detectors/llm_judge/app.py` `detectors/common/app.py`
Define generation analysis request and response models	Add GenerationAnalysisHttpRequest with prompt, generated_text, detector_params Add GenerationAnalysisResponse with detection, score, evidences, metadata	`detectors/llm_judge/scheme.py`
Update LLMJudgeDetector tests for generation features and renamed methods	Rename test classes/methods to reflect analyze_content Replace calls to run() with analyze_content() in content tests Add comprehensive tests for evaluate_single_generation and analyze_generation with various params Extend performance tests to cover generation analysis concurrency	`tests/detectors/llm_judge/test_llm_judge_detector.py` `tests/detectors/llm_judge/test_performance.py`
Add resource cleanup in Huggingface detector and adopt stateful lifecycle in its app	Implement close() method clearing model, tokenizer, and CUDA cache Switch to app.set_detector/get_detector and call close during lifespan shutdown	`detectors/huggingface/detector.py` `detectors/huggingface/app.py`

Possibly linked issues

#0: PR adds the /api/v1/text/generation endpoint and llm_judge generation analysis, fulfilling the issue.
#0: The PR adds the /api/v1/text/generation endpoint to enable LLM-as-a-judge detections from vLLM models.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @saichandrapandraju - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `detectors/llm_judge/detector.py:69` </location>
<code_context>
+        """
+        Get the score from the evaluation result.
+        """
+        if isinstance(result.decision, (int, float)) or result.score is not None:
+            return float(result.score if result.score is not None else result.decision)
+        return 0.0 # FIXME: default to 0 because of non-optional field in schema
</code_context>

<issue_to_address>
Returning 0.0 as a default score may mask underlying issues.

Defaulting to 0.0 may conceal malformed results. Consider raising an exception or logging a warning to better detect such cases.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    def _get_score(self, result: EvaluationResult) -> float:
        """
        Get the score from the evaluation result.
        """
        if isinstance(result.decision, (int, float)) or result.score is not None:
            return float(result.score if result.score is not None else result.decision)
        return 0.0 # FIXME: default to 0 because of non-optional field in schema
=======
    def _get_score(self, result: EvaluationResult) -> float:
        """
        Get the score from the evaluation result.
        """
        import logging
        logger = logging.getLogger(__name__)

        if isinstance(result.decision, (int, float)) or result.score is not None:
            return float(result.score if result.score is not None else result.decision)
        logger.warning(
            "EvaluationResult missing valid score and decision: %r", result
        )
        raise ValueError("EvaluationResult does not contain a valid score or decision.")
>>>>>>> REPLACE

</suggested_fix>

### Comment 2
<location> `detectors/llm_judge/app.py:26` </location>
<code_context>
+    app.set_detector(LLMJudgeDetector())
+    yield
+    # Clean up resources
+    detector: LLMJudgeDetector = app.get_detector()
+    if detector and hasattr(detector, 'close'):
+        await detector.close()
</code_context>

<issue_to_address>
No check for detector existence before use.

Add a check to handle the case where 'get_detector()' returns None to prevent AttributeError.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

detectors/llm_judge/detector.py

detectors/llm_judge/app.py

sourcery-ai · 2025-07-07T17:02:26Z

detectors/huggingface/app.py

+    detector: Detector = app.get_detector()
+    if not detector:
+        raise RuntimeError("Detector is not initialized")


issue (code-quality): We've found these issues:

Use named expression to simplify assignment and conditional (use-named-expression)

Lift code into else after jump in control flow (reintroduce-else)

Swap if/else branches (swap-if-else-branches)

sourcery-ai · 2025-07-07T17:02:26Z

detectors/llm_judge/detector.py

-
-        Returns:
-            ContentAnalysisResponse with evaluation results
+        Make sure the params have valid metric/criteria and scale.
        """
        if "metric" not in params:


issue (code-quality): We've found these issues:

Swap if/else branches (swap-if-else-branches)

Merge else clause's nested if statement into elif (merge-else-if-into-elif)

…e judge to latest changes

sourcery-ai bot reviewed Jul 7, 2025

View reviewed changes

saichandrapandraju added 2 commits July 9, 2025 11:24

Add '/api/v1/text/generation' FMS Detector API support

315a437

add tests for generation analysis

ab5a358

saichandrapandraju force-pushed the generation-detection branch from dab267d to ab5a358 Compare July 9, 2025 15:25

saichandrapandraju self-assigned this Jul 9, 2025

saichandrapandraju requested review from ruivieira, RobGeada and m-misiura July 9, 2025 15:27

ruivieira changed the title ~~RHOAIENG-28840: Support '/api/v1/text/generation' detections~~ feat(RHOAIENG-28840): Support '/api/v1/text/generation' detections Jul 9, 2025

ruivieira added the enhancement New feature or request label Jul 9, 2025

ruivieira added this to TrustyAI planning Jul 9, 2025

ruivieira moved this to In Review in TrustyAI planning Jul 9, 2025

saichandrapandraju and others added 4 commits July 15, 2025 19:50

Merge branch 'main' into generation-detection

a92aa95

Merge branch 'trustyai-explainability:main' into generation-detection

028a738

Refactor: Move scheme imports to common and update Dockerfile + updat…

b6bf3d8

…e judge to latest changes

Add warning for invalid score and default to 0

a4c376b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(RHOAIENG-28840): Support '/api/v1/text/generation' detections #23

feat(RHOAIENG-28840): Support '/api/v1/text/generation' detections #23

Uh oh!

saichandrapandraju commented Jul 7, 2025 •

edited

Loading

Uh oh!

sourcery-ai bot commented Jul 7, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

sourcery-ai bot Jul 7, 2025

Uh oh!

sourcery-ai bot Jul 7, 2025

Uh oh!

Uh oh!

feat(RHOAIENG-28840): Support '/api/v1/text/generation' detections #23

Are you sure you want to change the base?

feat(RHOAIENG-28840): Support '/api/v1/text/generation' detections #23

Uh oh!

Conversation

saichandrapandraju commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage Examples:

Using Builtin metrics:

With custom evaluation setup:

Closes: #21

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for /api/v1/text/generation endpoint request handling

Class diagram for new and updated LLMJudgeDetector types and methods

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sourcery-ai bot Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

saichandrapandraju commented Jul 7, 2025 •

edited

Loading

sourcery-ai bot commented Jul 7, 2025 •

edited

Loading