Help with the LLM As A Judge Evaluation #11937

rohitkumar-vc · 2026-02-09T12:14:14Z

rohitkumar-vc
Feb 9, 2026

Describe your question

I have created a FASTAPI application in which there are nested LLM calls. I want to run evaluator on those nested llm calls. for better understanding of the call structured please refer to the image:

Here nested function calls joke function and then joke function calls the LLM. later the nested function calls poetry function and then in turn it again calls the LLM. I want to run evaluation on the test (name given by me in config run_name langchain).

In each trace object there can be multiple test (LLM Calls), I want for each trace object each LLM within that I should be able to run the evaluator on.

As of now I tried selecting the object as generation then I selected name as test, this works but it is just considering the last LLM call of the trace for the evolution.

Langfuse Cloud or Self-Hosted?

Langfuse Cloud

If Self-Hosted

No response

If Langfuse Cloud

No response

SDK and integration versions

No response

Pre-Submission Checklist

I have checked for existing issues/discussions and consulted Langfuse AI.

jannikmaierhoefer · 2026-02-09T12:14:54Z

inkeep-ai-assistant[bot]
bot Feb 9, 2026

To evaluate each nested LLM call separately in your trace, you should use observation-level evaluators instead of trace-level evaluators(1).

Based on your setup where you have multiple LLM generations (named "test") within a single trace, here's what you need to do:

Solution: Use Live Observations Evaluation

When setting up your LLM-as-a-Judge evaluator, select "Live Observations" as your evaluation target instead of "Live Traces"(1). This allows you to:

Target specific observations - Filter by observation type (Generation) and observation name ("test")(1)
Evaluate each LLM call separately - Each generation observation will be evaluated independently, not just the last one(1)
Apply flexible filtering - Combine trace-level and observation-level filters to narrow down exactly which observations to evaluate(1)

Configuration Steps

Navigate to the Evaluators page and create a new evaluator(1)
When choosing which data to evaluate, select "Live Observations"(1)
Set filters:
- Object Type: Generation(1)
- Object Name: "test" (or whatever name you've given your LLM generations)(1)
Map your variables (input/output) from the observation properties(1)

Important Note for SDK Users

If you're using the OpenTelemetry-based SDKs (Python v3+ or JS/TS v4+), observation-level evaluators are the recommended approach(1). If you need to filter observations by trace-level attributes like userId, sessionId, or tags, you must use propagate_attributes() in your instrumentation code to make those attributes available on observations(1).

This approach will evaluate both of your "test" generations (the one in call_joke_llm and the one in call_poetry_llm) separately, rather than just the last one.

📚 Sources:

LLM-as-a-Judge

Have another question? Just tag @inkeep.

1 reply

jannikmaierhoefer Feb 10, 2026
Maintainer

Hi @rohitkumar-vc, inkeep is correct. Currently the evaluator can only target the first generation with the same name in a trace. However, we are releasing observation-level evals in the next couple of days that allow you to evaluate both LLM observations. Please keep an eye on our changelog for this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Help with the LLM As A Judge Evaluation #11937

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Langfuse

Help with the LLM As A Judge Evaluation #11937

Uh oh!

rohitkumar-vc Feb 9, 2026

Describe your question

Langfuse Cloud or Self-Hosted?

If Self-Hosted

If Langfuse Cloud

SDK and integration versions

Pre-Submission Checklist

Replies: 1 comment · 1 reply

Uh oh!

inkeep-ai-assistant[bot] bot Feb 9, 2026

Solution: Use Live Observations Evaluation

Configuration Steps

Important Note for SDK Users

Uh oh!

jannikmaierhoefer Feb 10, 2026 Maintainer

rohitkumar-vc
Feb 9, 2026

Replies: 1 comment 1 reply

inkeep-ai-assistant[bot]
bot Feb 9, 2026

jannikmaierhoefer Feb 10, 2026
Maintainer