Skip to content

Add custom endpoint URL support for Judge LLM in evaluations #3282

@gorkem-bwl

Description

@gorkem-bwl

The Judge LLM in LLM Evaluations currently only supports cloud API providers (OpenAI, Anthropic, Google, xAI, Mistral, OpenRouter) etc .. Customers with internally hosted models (Ollama, vLLM, TGI, or any OpenAI-compatible inference server) cannot use them as the Judge LLM.

As of now model being evaluated supports endpointUrl field for custom/local endpoints however Judge LLM only accepts provider, model, apiKey eg no endpoint URL option. Users should be able to provide a custom endpoint URL for the Judge LLM, enabling self-hosted models to serve as the judge in evaluations.

Files that need changes:

FE:

  • Clients/src/presentation/pages/EvalsDashboard/NewExperimentModal.tsx — Add optional endpointUrl field to the judgeLlm config object and render
    an input for it
  • Related TypeScript interfaces for the judge config

BE:

  • EvalServer/src/utils/run_evaluation.py (~line 145-172) — Extract endpointUrl from judge config and set OPENAI_API_BASE environment variable when present
  • EvaluationModule/src/deepeval_engine/deepeval_evaluator.py — Update get_judge_llm() and CustomDeepEvalLLM to accept and pass base_url
  • EvaluationModule/src/deepeval_engine/model_runner.py — Update _setup_openai() to use base_url parameter when OPENAI_API_BASE is set
  • EvaluationModule/scorers/judge_runner.py — Pass base_url when creating OpenAI client for judge scoring
  • EvaluationModule/scorers/provider_registry.py — Allow custom models when a custom endpoint is provided (relax the hardcoded model allowlist)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions