Modify trace after getting response from OpenAI #2385
Replies: 3 comments 1 reply
-
|
I'm facing same issue. any luck with this? |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for raising this, we will add functions to pull generations/traces that do not include a specific score yet to facilitate this |
Beta Was this translation helpful? Give feedback.
-
|
Update: The bulk of the functions below were copied from the docs but I just realized I don't need to worry about fetching the latest generation/gpt response. Instead, I can just pass the input/output after retrieving a response from OpenAI: eval_result = get_evaluator_for_key(criterion).evaluate_strings(
prediction=output,
input=input
)@marcklingen I have started doing this basically: completion = openai.chat.completions.create(...)
current_trace_id = langfuse_context.get_current_trace_id()
current_trace = langfuse.Langfuse().fetch_trace(current_trace_id)
last_observation = current_trace.data.observations[-1]
execute_eval_and_score([last_observation])General idea is that you put this line of code right after a completion is made and so For others reference, the execute_eval_and_score is roughly: from langfuse import Langfuse
import os
from dotenv import load_dotenv
from langchain.evaluation import load_evaluator
from langchain_openai import AzureOpenAI
from langchain.evaluation.criteria import LabeledCriteriaEvalChain
load_dotenv()
def get_evaluator_for_key(key: str):
llm = AzureOpenAI(
temperature=0,
model="gpt-35-turbo-instruct",
azure_endpoint=os.getenv("OPENAI_API_ENDPOINT"),
azure_deployment="gpt-35-turbo-instruct",
api_key=os.getenv("OPENAI_API_KEY")
)
return load_evaluator("criteria", criteria=key, llm=llm)
def get_hallucination_eval():
criteria = {
"hallucination": (
"Does this submission contain information"
" not present in the input or reference?"
),
}
llm = AzureOpenAI(temperature=0, model=os.environ.get('EVAL_MODEL'))
return LabeledCriteriaEvalChain.from_llm(
llm=llm,
criteria=criteria,
)
from langfuse.decorators import observe, langfuse_context
@observe(name="Evaluation and Scoring")
def execute_eval_and_score(generations):
from langfuse.decorators import langfuse_context
# Langchain Eval types
EVAL_TYPES={
"hallucination": True,
"conciseness": True,
"relevance": True,
"coherence": True,
"harmfulness": True,
"maliciousness": True,
"helpfulness": True,
"controversiality": True,
"misogyny": True,
"criminality": True,
"insensitivity": True
}
for generation in generations:
criteria = [key for key, value in EVAL_TYPES.items() if value and key != "hallucination"]
for criterion in criteria:
eval_result = get_evaluator_for_key(criterion).evaluate_strings(
prediction=generation.output,
input=generation.input
)
print(eval_result)
langfuse_context.score_current_observation(
name=criterion,
value=eval_result["score"],
comment=eval_result['reasoning']
) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm using Python SDK and struggling to understand how I can score things without duplication. Basically, what is happening is I use the langfuse.get_generations() and score. But after that initial scoring, running langfuse.get_generations() just fetches all the same generations + the scoring generations!
I want to either:
Edit: Another use case I am looking for is being able to update the trace after the initial request to OpenAI, I want to update metadata based on the output.
Beta Was this translation helpful? Give feedback.
All reactions