[LLM as a judge]: Bugs and Regressions #2919

jp-agenta · 2025-11-12T09:02:57Z

fix-/-llm-as-a-judge-bugs-and-regressions

vercel · 2025-11-12T09:03:01Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
agenta-documentation	Ready	Preview	Comment	Nov 12, 2025 9:03am

CLAassistant · 2025-11-12T09:03:10Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

GitHub CI seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Copilot

Pull Request Overview

This PR fixes bugs and regressions in the LLM-as-a-judge functionality by reverting the version from 0.62.1 to 0.62.0, renaming the score field to correctness in JSON schemas, migrating from Monaco Editor to SharedEditor component, and restoring proper error handling in the Python SDK.

Renamed score to correctness in JSON schema generation and parsing for evaluators
Migrated DebugSection from Monaco Editor to SharedEditor with proper props and configuration
Restored error raising behavior in field_match_test_v0 instead of returning success: false

Reviewed Changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
web/package.json	Reverted version from 0.62.1 to 0.62.0
web/oss/package.json	Reverted version from 0.62.1 to 0.62.0
web/ee/package.json	Reverted version from 0.62.1 to 0.62.0
sdk/pyproject.toml	Reverted version from 0.62.1 to 0.62.0
api/pyproject.toml	Reverted version from 0.62.1 to 0.62.0
web/oss/src/styles/code-editor-styles.css	Added CSS support for hiding line numbers with no-line-numbers class
web/oss/src/components/pages/evaluations/autoEvaluation/EvaluatorsModal/ConfigureEvaluator/JSONSchema/JSONSchemaGenerator.ts	Renamed score to correctness in JSON schema properties and parsing logic
web/oss/src/components/pages/evaluations/autoEvaluation/EvaluatorsModal/ConfigureEvaluator/DebugSection.tsx	Migrated from Monaco Editor to SharedEditor component with proper configuration
web/oss/src/components/Playground/Components/SharedEditor/types.d.ts	Added antdInputProps type definition for textarea support
web/oss/src/components/Playground/Components/SharedEditor/index.tsx	Implemented antdInputProps support with textarea/input conditional rendering
web/oss/src/components/Editor/types.d.ts	Added showLineNumbers prop to EditorProps interface
web/oss/src/components/Editor/Editor.tsx	Implemented showLineNumbers prop with CSS class support
sdk/agenta/sdk/workflows/handlers.py	Restored error raising instead of returning success: false
api/oss/src/services/converters.py	Added db_manager import
api/oss/src/resources/evaluators/evaluators.py	Updated schema to use correctness instead of score

Comments suppressed due to low confidence (1)

api/oss/src/services/converters.py:5

The import of db_manager is added but not visible in the diff context where it's used. Ensure there's test coverage for the code path that uses this import to prevent future regressions.

from oss.src.services import db_manager

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fix-/-llm-as-a-judge-bugs-and-regressions

015145f

Copilot AI review requested due to automatic review settings November 12, 2025 09:02

bekossy changed the title ~~fix-/-llm-as-a-judge-bugs-and-regressions~~ [LLM as a judge]: Bugs and Regressions Nov 12, 2025

Copilot started reviewing on behalf of jp-agenta November 12, 2025 09:03 View session

vercel bot deployed to Preview November 12, 2025 09:03 View deployment

Copilot finished reviewing on behalf of jp-agenta November 12, 2025 09:04

Copilot AI reviewed Nov 12, 2025

View reviewed changes

bekossy closed this Nov 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLM as a judge]: Bugs and Regressions #2919

[LLM as a judge]: Bugs and Regressions #2919

Uh oh!

jp-agenta commented Nov 12, 2025

Uh oh!

vercel bot commented Nov 12, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Nov 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[LLM as a judge]: Bugs and Regressions #2919

[LLM as a judge]: Bugs and Regressions #2919

Uh oh!

Conversation

jp-agenta commented Nov 12, 2025

Uh oh!

vercel bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Nov 12, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vercel bot commented Nov 12, 2025 •

edited

Loading