Skipping an item in LLM-as-a-judge eval #11994

sidhi-ts · 2026-02-11T17:25:03Z

sidhi-ts
Feb 11, 2026

Describe your question

I am using an LLM-as-a-judge evaluator for a particular metric when evaluating a dataset. However, not all items might be valid to be scored by that evaluator. When using SDK evaluators, I know that it is possible to "skip" that item by returning None for that item instead of an Evaluation object.

How do I do this for LLM-as-a-judge evaluator?

Langfuse Cloud or Self-Hosted?

Self-Hosted

If Self-Hosted

v3.148.0

If Langfuse Cloud

No response

SDK and integration versions

No response

Pre-Submission Checklist

I have checked for existing issues/discussions and consulted Langfuse AI.

Answered by Lotte-Verheyden

Feb 12, 2026

Hi @sidhi-ts, this functionality is currently not supported. What we recommend in this case is to split up your datasets according to what should be evaluated. You can then configure your evaluators to only run on the datasets that are relevant.

If this would not work for you, feel free to share more details about your use case so we can consider it as a feature request :)

View full answer

sidhi-ts · 2026-02-11T17:25:30Z

inkeep-ai-assistant[bot]
bot Feb 11, 2026

Based on the available documentation, there isn't a direct way to skip individual items in LLM-as-a-Judge evaluators similar to returning None in SDK evaluators(1).

LLM-as-a-Judge evaluators work differently from SDK evaluators. When you set up an LLM-as-a-Judge evaluator, you configure it to run on specific data by choosing filters (trace name, tags, userId, metadata, etc.) and variable mappings(1). The evaluator then processes all matched items according to these filters.

Workarounds you could consider:

Use filters more precisely - Configure your evaluator with more specific filters to exclude items that shouldn't be evaluated. You can filter by trace name, tags, metadata, and other trace/observation properties(1).
Use SDK evaluators instead - For cases where you need conditional logic to skip certain items, SDK evaluators (Custom Scores) provide more flexibility since you can return None to skip scoring(1).
Handle in variable mapping - If certain data patterns should be excluded, you might be able to use JSONPath expressions in the variable mapping to filter out invalid cases, though this has limitations(1).

The LLM-as-a-Judge feature is designed to run evaluations at scale on filtered data sets, but it doesn't currently support item-level conditional skipping within the evaluator itself(1).

📚 Sources:

LLM-as-a-Judge

Have another question? Just tag @inkeep.

4 replies

sidhi-ts Feb 11, 2026
Author

Use filters more precisely - Configure your evaluator with more specific filters to exclude items that shouldn't be evaluated. You can filter by trace name, tags, metadata, and other trace/observation properties(1).

Note that I am running this against dataset runs and not live tracing data. So, for the target filter, I am only able to filter by dataset and not trace properties.

Lotte-Verheyden Feb 12, 2026
Collaborator

Hi @sidhi-ts, this functionality is currently not supported. What we recommend in this case is to split up your datasets according to what should be evaluated. You can then configure your evaluators to only run on the datasets that are relevant.

If this would not work for you, feel free to share more details about your use case so we can consider it as a feature request :)

Answer selected by Lotte-Verheyden

sidhi-ts Feb 12, 2026
Author

Here it is dependent on the output and not the input. For example, if the output JSON has an error string, then we would want to skip running the metric so that the outliers don't affect the overall score.

Lotte-Verheyden Feb 12, 2026
Collaborator

That clarifies things, thanks @sidhi-ts! This sounds like LLM as a judge being triggered by another score, I'll link the idea issue for that here: https://github.com/orgs/langfuse/discussions/4106
Feel free to upvote the issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Skipping an item in LLM-as-a-judge eval #11994

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Langfuse

Skipping an item in LLM-as-a-judge eval #11994

Uh oh!

sidhi-ts Feb 11, 2026

Describe your question

Langfuse Cloud or Self-Hosted?

If Self-Hosted

If Langfuse Cloud

SDK and integration versions

Pre-Submission Checklist

Replies: 1 comment · 4 replies

Uh oh!

inkeep-ai-assistant[bot] bot Feb 11, 2026

Uh oh!

sidhi-ts Feb 11, 2026 Author

Uh oh!

Lotte-Verheyden Feb 12, 2026 Collaborator

Uh oh!

sidhi-ts Feb 12, 2026 Author

Uh oh!

Lotte-Verheyden Feb 12, 2026 Collaborator

sidhi-ts
Feb 11, 2026

Replies: 1 comment 4 replies

inkeep-ai-assistant[bot]
bot Feb 11, 2026

sidhi-ts Feb 11, 2026
Author

Lotte-Verheyden Feb 12, 2026
Collaborator

sidhi-ts Feb 12, 2026
Author

Lotte-Verheyden Feb 12, 2026
Collaborator