Skip to content

feat(RHOAIENG-28840): Support '/api/v1/text/generation' detections #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

saichandrapandraju
Copy link
Collaborator

@saichandrapandraju saichandrapandraju commented Jul 7, 2025

This PR extends judge detections to support /api/v1/text/generation Detector API.

Usage Examples:

Using Builtin metrics:

Request:

curl -s -X POST "http://<host>:<port>/api/v1/text/generation"\
   -H 'accept: application/json'\
   -H 'detector-id: llm_judge'\
   -H 'Content-Type: application/json'\
   -d '{
    "prompt":"What is Machine Learning?", 
    "generated_text": "Deep Learning models learn by adjusting weights through backpropagation.", 
    "detector_params": {"metric":"relevance"} 
}'

Response (with Qwen2.5-7B-instruct):

{
  "detection":"NEARLY_IRRELEVANT",
  "detection_type":"llm_judge",
  "score":0.1,
  "evidences":[],
  "metadata": {
      "reasoning": "The provided content discusses deep learning models and backpropagation, which are specific aspects of machine learning. However, it does not directly answer the question 'What is Machine Learning?'"
    }
}

With custom evaluation setup:

Request:

curl -s -X POST "http://<host>:<port>/api/v1/text/generation"\
   -H 'accept: application/json'\
   -H 'detector-id: llm_judge'\
   -H 'Content-Type: application/json'\
   -d '{
    "prompt":"What is Machine Learning?", 
    "generated_text": "Deep Learning models learn by adjusting weights through backpropagation.", 
    "detector_params": {
       "criteria": "technical accuracy for {level} students",
       "template_vars": {"level": "undergraduate"},
       "rubric": "Assign lower scores for inaccurate content and higher scores for accurate ones. Also assign appropriate decision labels like 'ACCURATE', 'INACCURATE' and 'SOMEWHAT_ACCURATE'."
    } 
}'

Response (with Qwen2.5-7B-instruct):

{
  "detection": "INACCURATE",
  "detection_type": "llm_judge",
  "score": 0.2,
  "evidences": [],
  "metadata": {
        "reasoning": "The content provided does not accurately answer the question 'What is Machine Learning?'. It instead gives a detail about a specific subfield or technique within Machine Learning (Deep Learning) without defining Machine Learning itself."
    }
}

Closes: #21

Summary by Sourcery

Extend the LLMJudgeDetector to support analysis of single LLM generations via a new /api/v1/text/generation endpoint, refactor content analysis methods and App lifecycle management, and add comprehensive tests for generation detection.

New Features:

  • Introduce a /api/v1/text/generation HTTP handler and corresponding request/response schemas for generation analysis.
  • Add evaluate_single_generation and analyze_generation methods to LLMJudgeDetector to evaluate generated text against prompts.
  • Implement default parameter validation and unified scoring logic for both content and generation analysis.

Enhancements:

  • Rename content analysis method run to analyze_content and extract common parameter validation and score extraction into private helper methods.
  • Unify detector instantiation and cleanup using FastAPI set_detector/get_detector/cleanup_detector and implement proper close methods for LLMJudge and HuggingFace detectors.

Tests:

  • Add a new test suite covering generation analysis scenarios, including basic metrics, full parameter sets, defaults, invalid metrics, numeric/None scores, and concurrency tests.
  • Update existing content analysis tests to reflect method renames (analyze_content) and verify detector closing behavior.

Copy link

sourcery-ai bot commented Jul 7, 2025

Reviewer's Guide

This PR refactors the LLMJudgeDetector to centralize parameter validation and scoring, adds full support for generation analysis (new methods, schemas, and endpoint), updates API handlers to use app-level state management, extends test coverage for the new features and renamed methods, and introduces resource cleanup in the Huggingface detector.

Sequence diagram for /api/v1/text/generation endpoint request handling

sequenceDiagram
    actor User
    participant FastAPI_App as FastAPI App
    participant LLMJudgeDetector
    User->>FastAPI_App: POST /api/v1/text/generation
    FastAPI_App->>LLMJudgeDetector: analyze_generation(request)
    LLMJudgeDetector->>LLMJudgeDetector: evaluate_single_generation(prompt, generated_text, params)
    LLMJudgeDetector-->>FastAPI_App: GenerationAnalysisResponse
    FastAPI_App-->>User: GenerationAnalysisResponse
Loading

Class diagram for new and updated LLMJudgeDetector types and methods

classDiagram
    class LLMJudgeDetector {
        - judge: Judge
        - available_metrics: set
        + __init__()
        + _initialize_judge()
        + _validate_params(params: Dict[str, Any]) Dict[str, Any]
        + _get_score(result: EvaluationResult) float
        + evaluate_single_content(content: str, params: Dict[str, Any]) ContentAnalysisResponse
        + analyze_content(request: ContentAnalysisHttpRequest) ContentsAnalysisResponse
        + evaluate_single_generation(prompt: str, generated_text: str, params: Dict[str, Any]) GenerationAnalysisResponse
        + analyze_generation(request: GenerationAnalysisHttpRequest) GenerationAnalysisResponse
        + close()
    }
    class GenerationAnalysisHttpRequest {
        + prompt: str
        + generated_text: str
        + detector_params: Optional[Dict[str, Any]]
    }
    class GenerationAnalysisResponse {
        + detection: str
        + detection_type: str
        + score: float
        + evidences: Optional[List[EvidenceObj]]
        + metadata: dict
    }
    LLMJudgeDetector --> GenerationAnalysisHttpRequest
    LLMJudgeDetector --> GenerationAnalysisResponse
Loading

File-Level Changes

Change Details Files
Extend LLMJudgeDetector with unified parameter handling and generation analysis support
  • Extract _validate_params and _get_score to consolidate defaulting and numeric decision handling
  • Refactor evaluate_single_content to use the new helpers
  • Add evaluate_single_generation and analyze_generation methods
  • Annotate judge attribute and adjust imports
detectors/llm_judge/detector.py
Expose new generation analysis endpoint and refactor content handler to use app state
  • Rename content handler to detector_content_analysis_handler and adjust call to analyze_content
  • Add /api/v1/text/generation POST handler invoking analyze_generation
  • Replace global detector_objects dict with app.set_detector/get_detector and cleanup methods
detectors/llm_judge/app.py
detectors/common/app.py
Define generation analysis request and response models
  • Add GenerationAnalysisHttpRequest with prompt, generated_text, detector_params
  • Add GenerationAnalysisResponse with detection, score, evidences, metadata
detectors/llm_judge/scheme.py
Update LLMJudgeDetector tests for generation features and renamed methods
  • Rename test classes/methods to reflect analyze_content
  • Replace calls to run() with analyze_content() in content tests
  • Add comprehensive tests for evaluate_single_generation and analyze_generation with various params
  • Extend performance tests to cover generation analysis concurrency
tests/detectors/llm_judge/test_llm_judge_detector.py
tests/detectors/llm_judge/test_performance.py
Add resource cleanup in Huggingface detector and adopt stateful lifecycle in its app
  • Implement close() method clearing model, tokenizer, and CUDA cache
  • Switch to app.set_detector/get_detector and call close during lifespan shutdown
detectors/huggingface/detector.py
detectors/huggingface/app.py

Possibly linked issues

  • #0: PR adds the /api/v1/text/generation endpoint and llm_judge generation analysis, fulfilling the issue.
  • #0: The PR adds the /api/v1/text/generation endpoint to enable LLM-as-a-judge detections from vLLM models.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @saichandrapandraju - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `detectors/llm_judge/detector.py:69` </location>
<code_context>
+        """
+        Get the score from the evaluation result.
+        """
+        if isinstance(result.decision, (int, float)) or result.score is not None:
+            return float(result.score if result.score is not None else result.decision)
+        return 0.0 # FIXME: default to 0 because of non-optional field in schema
</code_context>

<issue_to_address>
Returning 0.0 as a default score may mask underlying issues.

Defaulting to 0.0 may conceal malformed results. Consider raising an exception or logging a warning to better detect such cases.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    def _get_score(self, result: EvaluationResult) -> float:
        """
        Get the score from the evaluation result.
        """
        if isinstance(result.decision, (int, float)) or result.score is not None:
            return float(result.score if result.score is not None else result.decision)
        return 0.0 # FIXME: default to 0 because of non-optional field in schema
=======
    def _get_score(self, result: EvaluationResult) -> float:
        """
        Get the score from the evaluation result.
        """
        import logging
        logger = logging.getLogger(__name__)

        if isinstance(result.decision, (int, float)) or result.score is not None:
            return float(result.score if result.score is not None else result.decision)
        logger.warning(
            "EvaluationResult missing valid score and decision: %r", result
        )
        raise ValueError("EvaluationResult does not contain a valid score or decision.")
>>>>>>> REPLACE

</suggested_fix>

### Comment 2
<location> `detectors/llm_judge/app.py:26` </location>
<code_context>
+    app.set_detector(LLMJudgeDetector())
+    yield
+    # Clean up resources
+    detector: LLMJudgeDetector = app.get_detector()
+    if detector and hasattr(detector, 'close'):
+        await detector.close()
</code_context>

<issue_to_address>
No check for detector existence before use.

Add a check to handle the case where 'get_detector()' returns None to prevent AttributeError.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 48 to 50
detector: Detector = app.get_detector()
if not detector:
raise RuntimeError("Detector is not initialized")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:


Returns:
ContentAnalysisResponse with evaluation results
Make sure the params have valid metric/criteria and scale.
"""
if "metric" not in params:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:

@saichandrapandraju saichandrapandraju self-assigned this Jul 9, 2025
@ruivieira ruivieira changed the title RHOAIENG-28840: Support '/api/v1/text/generation' detections feat(RHOAIENG-28840): Support '/api/v1/text/generation' detections Jul 9, 2025
@ruivieira ruivieira added the enhancement New feature or request label Jul 9, 2025
@ruivieira ruivieira moved this to In Review in TrustyAI planning Jul 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

Enable llm_judge detection for '/api/v1/text/generation' endpoint
2 participants