Skip to content

Conversation

@jp-agenta
Copy link
Member

fix-/-llm-as-a-judge-bugs-and-regressions

Copilot AI review requested due to automatic review settings November 12, 2025 09:02
@vercel
Copy link

vercel bot commented Nov 12, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
agenta-documentation Ready Ready Preview Comment Nov 12, 2025 9:03am

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


GitHub CI seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@bekossy bekossy changed the title fix-/-llm-as-a-judge-bugs-and-regressions [LLM as a judge]: Bugs and Regressions Nov 12, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes bugs and regressions in the LLM-as-a-judge functionality by reverting the version from 0.62.1 to 0.62.0, renaming the score field to correctness in JSON schemas, migrating from Monaco Editor to SharedEditor component, and restoring proper error handling in the Python SDK.

  • Renamed score to correctness in JSON schema generation and parsing for evaluators
  • Migrated DebugSection from Monaco Editor to SharedEditor with proper props and configuration
  • Restored error raising behavior in field_match_test_v0 instead of returning success: false

Reviewed Changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated no comments.

Show a summary per file
File Description
web/package.json Reverted version from 0.62.1 to 0.62.0
web/oss/package.json Reverted version from 0.62.1 to 0.62.0
web/ee/package.json Reverted version from 0.62.1 to 0.62.0
sdk/pyproject.toml Reverted version from 0.62.1 to 0.62.0
api/pyproject.toml Reverted version from 0.62.1 to 0.62.0
web/oss/src/styles/code-editor-styles.css Added CSS support for hiding line numbers with no-line-numbers class
web/oss/src/components/pages/evaluations/autoEvaluation/EvaluatorsModal/ConfigureEvaluator/JSONSchema/JSONSchemaGenerator.ts Renamed score to correctness in JSON schema properties and parsing logic
web/oss/src/components/pages/evaluations/autoEvaluation/EvaluatorsModal/ConfigureEvaluator/DebugSection.tsx Migrated from Monaco Editor to SharedEditor component with proper configuration
web/oss/src/components/Playground/Components/SharedEditor/types.d.ts Added antdInputProps type definition for textarea support
web/oss/src/components/Playground/Components/SharedEditor/index.tsx Implemented antdInputProps support with textarea/input conditional rendering
web/oss/src/components/Editor/types.d.ts Added showLineNumbers prop to EditorProps interface
web/oss/src/components/Editor/Editor.tsx Implemented showLineNumbers prop with CSS class support
sdk/agenta/sdk/workflows/handlers.py Restored error raising instead of returning success: false
api/oss/src/services/converters.py Added db_manager import
api/oss/src/resources/evaluators/evaluators.py Updated schema to use correctness instead of score
Comments suppressed due to low confidence (1)

api/oss/src/services/converters.py:5

  • The import of db_manager is added but not visible in the diff context where it's used. Ensure there's test coverage for the code path that uses this import to prevent future regressions.
from oss.src.services import db_manager

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bekossy bekossy closed this Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants