Replies: 2 comments
-
|
Intermittent timeouts during Infrastructure OptimizationBackend Resource Scaling: The minimum spec of 2 vCPU/4 GiB for web containers is often insufficient under load(1). Scale up your web container resources and monitor performance during peak API request periods(1). S3/Blob Storage Performance: If S3 uploads are slow or throttled, consider increasing Redis and ClickHouse: High Redis CPU or ClickHouse connection limits can block API requests(1). Use at least 4 CPUs for Redis and enable cluster mode if needed(1). For ClickHouse, increase resources or reduce batch size/intervals if you experience performance issues under load(1). Load Balancer Configuration: Ensure your keep-alive timeout is at least 5 seconds higher than the load balancer idle timeout(1). This prevents premature connection closures that can cause timeouts. API Usage Best PracticesRetry Strategy: The Python SDK provides strongly-typed access to the public REST API via the Batch Operations: Instead of making individual Alternative Approach - Metrics API: For aggregating cost data across multiple traces, the Metrics API may be more efficient than individual trace lookups(2). It provides aggregated analytics and metrics with customizable dimensions, metrics, filters, and time granularity(2). Debugging StepsEnable debug logging with Check the health of your Langfuse instance using the built-in endpoints(3):
If you continue experiencing timeouts, review your ClickHouse and Redis connectivity from the web container, as backend processing bottlenecks often involve these components(1)(3). 📚 Sources:
Have another question? Just tag @inkeep. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @atul-alf, thanks. I would recommend testing our new observations endpoint (v2) for this use case as we improved performance an reliability here: https://langfuse.com/docs/api-and-data-platform/features/observations-api |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe your question
We are currently implementing a logic to calculate the total execution cost of our agents, including all associated sub-tasks. Our current approach involves iterating through sub-tasks and utilizing the
langfuse.api.trace.getmethod to retrieve and aggregate total_costs.We have observed intermittent timeouts during these API requests, which are affecting the reliability of our cost computations.
Could you please provide recommendations on how to stabilize these calls? We would appreciate any guidance on retry strategies or alternative methods for efficient data extraction.
Langfuse Cloud or Self-Hosted?
Langfuse Cloud
If Self-Hosted
v3.150.0
If Langfuse Cloud
v3.150.0
SDK and integration versions
langfuse 3.12.1
Pre-Submission Checklist
Beta Was this translation helpful? Give feedback.
All reactions