Trace and span execution duration not accurate in Langfuse #11441
Replies: 1 comment 3 replies
-
|
Your issue stems from bypassing the Langfuse SDK's built-in timing mechanisms by queuing and batching observations asynchronously through SQS(1). The Core ProblemThe Langfuse SDK uses synchronous timestamps to track accurate latency(1). When you abstract away the Why Timing is InaccurateThe execution duration shown in Langfuse UI is calculated based on when observations are created and ended in the Langfuse system, not when your actual functions ran(1). Your batching workflow introduces delays between:
This means the timestamps Langfuse receives don't reflect your actual function execution times. Potential Solutions1. Capture Timestamps at Function Execution TimeSince you're building a custom SDK abstraction, you need to capture the actual start and end timestamps during function execution (before queuing to SQS) and include them in your queued messages. When you later create observations in Langfuse, you'll need to use these captured timestamps. However, the provided documentation doesn't show explicit methods to override start/end times when creating observations manually. The SDK's context managers and decorators handle timing automatically(2). 2. Consider Using Manual Observations with Explicit TimingThe SDK supports manual observation creation with 3. Architectural RecommendationFor accurate timing with your batching approach, you would need to:
Unfortunately, the provided documentation doesn't explicitly cover how to set custom start/end times on observations, which is what you'd need for your use case. Regarding Downtime ScenariosYou're correct that if Langfuse or its components are down and traces are sent later, timing could be affected(1). However, the SDK's design with fully async requests and background processing is meant to minimize latency impact(1). RecommendationI cannot find explicit documentation on overriding observation start/end times in the provided sources. Your best path forward would be to:
The SDK is designed to handle high-volume tracing efficiently without manual batching(1), so you might reconsider whether your SQS-based batching approach is necessary. ** 📚 Sources: Have another question? Just tag @inkeep. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe your question
Hi,
We have a self hosted Langfuse infrastructure (v 3.124.1) on AWS EKS, and have a workflow where we from GenAI applications, we are abstracting the Langfuse observe decorator and creating a custom Python SDK, that will capture instrumentation messages (traces and span) and send it to SQS queue and then have a python process to process them in batches (say 50 messages/spans) and send traces to Langfuse, so the load on Langfuse infra is manageable.
As we are not directly sending traces from GenAI application to Langfuse using the @observe decorator, the execution time (start clock + end clock when span is written) captured in Langfuse UI traces and spans are inaccurate and are factoring in whole workflow duration and not showing just the function execution time (assuming a function corresponds to a span).
Also, we are not able to update the execution time at trace or span level in the Langfuse UI with the accurate execution time of the function run that we are capturing in the metadata. Is there anyway we ca do this?
Also, if the traces/spans are not sent to Langfuse immediately using @observe decorator or say if Langfuse or it's DB components are down, we may end up with same scenario where execution time shown in nested traces/span in Langfuse UI may be incorrect. Anyway, we can overcome this scenario or have any workarounds to override this execution time based on metadata info?
Langfuse Cloud or Self-Hosted?
Self-Hosted
If Self-Hosted
3.124.1
If Langfuse Cloud
No response
SDK and integration versions
No response
Pre-Submission Checklist
Beta Was this translation helpful? Give feedback.
All reactions