Modify trace after getting response from OpenAI #2385

tylerlittlefield · 2024-06-20T18:00:09Z

tylerlittlefield
Jun 20, 2024

I'm using Python SDK and struggling to understand how I can score things without duplication. Basically, what is happening is I use the langfuse.get_generations() and score. But after that initial scoring, running langfuse.get_generations() just fetches all the same generations + the scoring generations!

I want to either:

Score immediately after the generation
Have a way to langfuse.get_generations on only those that do not have a score and are not a score generation themselves.

Edit: Another use case I am looking for is being able to update the trace after the initial request to OpenAI, I want to update metadata based on the output.

yousufamfam · 2024-07-01T17:14:05Z

yousufamfam
Jul 1, 2024

I'm facing same issue. any luck with this?

0 replies

marcklingen · 2024-07-09T14:12:21Z

marcklingen
Jul 9, 2024
Maintainer

Thanks for raising this, we will add functions to pull generations/traces that do not include a specific score yet to facilitate this

0 replies

tylerlittlefield · 2024-08-01T16:02:19Z

tylerlittlefield
Aug 1, 2024
Author

Update:

The bulk of the functions below were copied from the docs but I just realized I don't need to worry about fetching the latest generation/gpt response. Instead, I can just pass the input/output after retrieving a response from OpenAI:

eval_result = get_evaluator_for_key(criterion).evaluate_strings(
    prediction=output,
    input=input
)

@marcklingen I have started doing this basically:

completion = openai.chat.completions.create(...)
current_trace_id = langfuse_context.get_current_trace_id()
current_trace = langfuse.Langfuse().fetch_trace(current_trace_id)
last_observation = current_trace.data.observations[-1]
execute_eval_and_score([last_observation])

General idea is that you put this line of code right after a completion is made and so last_observation = current_trace.data.observations[-1] will give you the observation associated with the completion call. Wanted to ask... is this a good workaround for scoring the most recent openai call? Is there a better alternative?

For others reference, the execute_eval_and_score is roughly:

from langfuse import Langfuse
import os
from dotenv import load_dotenv
from langchain.evaluation import load_evaluator
from langchain_openai import AzureOpenAI
from langchain.evaluation.criteria import LabeledCriteriaEvalChain

load_dotenv()
 
def get_evaluator_for_key(key: str):
	llm = AzureOpenAI(
		temperature=0, 
		model="gpt-35-turbo-instruct",
		azure_endpoint=os.getenv("OPENAI_API_ENDPOINT"),
		azure_deployment="gpt-35-turbo-instruct",
		api_key=os.getenv("OPENAI_API_KEY")
	)
	return load_evaluator("criteria", criteria=key, llm=llm)
 
def get_hallucination_eval():
	criteria = {
	"hallucination": (
		"Does this submission contain information"
		" not present in the input or reference?"
	),
	}
	llm = AzureOpenAI(temperature=0, model=os.environ.get('EVAL_MODEL'))
 
	return LabeledCriteriaEvalChain.from_llm(
		llm=llm,
		criteria=criteria,
	)

from langfuse.decorators import observe, langfuse_context

@observe(name="Evaluation and Scoring")
def execute_eval_and_score(generations):
	from langfuse.decorators import langfuse_context

	# Langchain Eval types
	EVAL_TYPES={
		"hallucination": True,
		"conciseness": True,
		"relevance": True,
		"coherence": True,
		"harmfulness": True,
		"maliciousness": True,
		"helpfulness": True,
		"controversiality": True,
		"misogyny": True,
		"criminality": True,
		"insensitivity": True
	}

	for generation in generations:
		criteria = [key for key, value in EVAL_TYPES.items() if value and key != "hallucination"]
 
	for criterion in criteria:
		eval_result = get_evaluator_for_key(criterion).evaluate_strings(
			prediction=generation.output,
			input=generation.input
		)
		print(eval_result)
 
		langfuse_context.score_current_observation(
			name=criterion,
			value=eval_result["score"],
			comment=eval_result['reasoning']
		)

1 reply

marcklingen Aug 5, 2024
Maintainer

General idea is that you put this line of code right after a completion is made and so last_observation = current_trace.data.observations[-1] will give you the observation associated with the completion call. Wanted to ask... is this a good workaround for scoring the most recent openai call? Is there a better alternative?

You can use fetch_observations(trace_id=trace_id, limit=1) to fetch the latest observation within a trace. This should be faster than the previous approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Modify trace after getting response from OpenAI #2385

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Langfuse

Modify trace after getting response from OpenAI #2385

Uh oh!

Uh oh!

tylerlittlefield Jun 20, 2024

Replies: 3 comments · 1 reply

Uh oh!

yousufamfam Jul 1, 2024

Uh oh!

marcklingen Jul 9, 2024 Maintainer

Uh oh!

Uh oh!

tylerlittlefield Aug 1, 2024 Author

Uh oh!

marcklingen Aug 5, 2024 Maintainer

tylerlittlefield
Jun 20, 2024

Replies: 3 comments 1 reply

yousufamfam
Jul 1, 2024

marcklingen
Jul 9, 2024
Maintainer

tylerlittlefield
Aug 1, 2024
Author

marcklingen Aug 5, 2024
Maintainer