Langfuse LLM-as-Judge: Can evaluator models use web search/tools during scoring? #11784

aniket-versant · 2026-01-28T23:16:49Z

aniket-versant
Jan 28, 2026

Describe your question

Hi Langfuse team — when running LLM-as-a-judge evaluations, can the judge model actually browse/search the web (or call tools/functions) during scoring? We’re using LiteLLM with AWS Bedrock (Claude 4.5 Sonnet) and the judge appears to rely on stale priors instead of verifying facts, even when the prompt says “MUST use web search.” Is tool/browsing access supported for judges, and if so, how do we enable/configure it (or is passing retrieved context the recommended approach)?

Langfuse Cloud or Self-Hosted?

Langfuse Cloud

If Self-Hosted

No response

If Langfuse Cloud

https://aix-lens-stage.cnbc.com/project/cmkg8fepf003plj07g28lk8zq

SDK and integration versions

No response

Pre-Submission Checklist

I have checked for existing issues/discussions and consulted Langfuse AI.

Answered by Lotte-Verheyden

Jan 29, 2026

Hi @aniket-versant

Langfuse's LLM-as-a-Judge evaluators currently do not support tool calling or web search during the evaluation process.

You can follow Inkeep bot's approach, or set up an external evaluation pipeline.
There is a related feature request for this, feel free to upvote it so that you're updated once we make this functionality available: https://github.com/orgs/langfuse/discussions/9346

View full answer

Lotte-Verheyden · 2026-01-28T23:17:14Z

inkeep-ai-assistant[bot]
bot Jan 28, 2026

Based on the available documentation, Langfuse's LLM-as-a-Judge evaluators do not currently support web search or tool calling during the evaluation process(1).

The evaluator works by sending a prompt to the configured LLM model with variables populated from your trace data (input, output, ground_truth, etc.)(1). The model is expected to return a structured output containing a score and reasoning(1).

Recommended approach: Instead of relying on the judge model to search the web, you should pass any necessary context or retrieved information as part of the evaluation variables(1). When setting up your evaluator, you can map additional data from your traces using JSONPath expressions to include relevant context that the judge needs for accurate scoring(1).

For example, if you need the judge to verify facts, you would:

Retrieve the necessary information in your application code
Include it in your trace metadata or as a custom field
Map that field to a variable in your evaluator prompt (e.g., {{retrieved_context}})

This ensures the judge has access to current, verified information without needing real-time web search capabilities.

Note: While the documentation mentions that gpt-5 with web search is listed in GitHub Ideas discussions(2), this appears to be a feature request rather than a currently available capability.

📚 Sources:

Have another question? Just tag @inkeep.

1 reply

Lotte-Verheyden Jan 29, 2026
Collaborator

Hi @aniket-versant

Langfuse's LLM-as-a-Judge evaluators currently do not support tool calling or web search during the evaluation process.

You can follow Inkeep bot's approach, or set up an external evaluation pipeline.
There is a related feature request for this, feel free to upvote it so that you're updated once we make this functionality available: https://github.com/orgs/langfuse/discussions/9346

Answer selected by Lotte-Verheyden

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Langfuse LLM-as-Judge: Can evaluator models use web search/tools during scoring? #11784

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Langfuse

Langfuse LLM-as-Judge: Can evaluator models use web search/tools during scoring? #11784

Uh oh!

aniket-versant Jan 28, 2026

Describe your question

Langfuse Cloud or Self-Hosted?

If Self-Hosted

If Langfuse Cloud

SDK and integration versions

Pre-Submission Checklist

Replies: 1 comment · 1 reply

Uh oh!

inkeep-ai-assistant[bot] bot Jan 28, 2026

Uh oh!

Lotte-Verheyden Jan 29, 2026 Collaborator

aniket-versant
Jan 28, 2026

Replies: 1 comment 1 reply

inkeep-ai-assistant[bot]
bot Jan 28, 2026

Lotte-Verheyden Jan 29, 2026
Collaborator