Skip to content

Conversation

@hassiebp
Copy link
Contributor

@hassiebp hassiebp commented Nov 17, 2025

Important

Switches from async to sync API for dataset run item creation in client.py to address performance issues at high concurrency.

  • Behavior:
    • Switch from async to sync API for dataset_run_items.create in _process_experiment_item() in client.py.
    • Addresses performance degradation in httpx.AsyncClient at high concurrency (100+ requests).
    • Ensures consistent request timing by avoiding async bottleneck.

This description was created by Ellipsis for 3d570eb. You can customize this summary. It will automatically update as commits are pushed.

Disclaimer: Experimental PR review

Greptile Summary

  • Replaced async-wrapped sync API call (asyncio.to_thread) with direct blocking sync call in _process_experiment_item
  • This change aims to address httpx.AsyncClient performance issues but introduces event loop blocking during high-concurrency experiment runs

Confidence Score: 1/5

  • This PR introduces a critical event loop blocking issue that will severely degrade performance
  • The change replaces async-friendly asyncio.to_thread() with a direct blocking sync call inside an async function that runs 100+ items concurrently via asyncio.gather(), which blocks the event loop and defeats async concurrency
  • langfuse/_client/client.py requires immediate attention - the blocking sync call will prevent concurrent execution

Important Files Changed

Filename Overview
langfuse/_client/client.py Changed from async-wrapped sync API call to blocking sync call inside async function, which will block event loop during high-concurrency experiment runs

Sequence Diagram

sequenceDiagram
    participant User
    participant Langfuse
    participant ProcessItem as "_process_experiment_item"
    participant API as "dataset_run_items.create"
    participant Server as "Langfuse Server"
    
    User->>Langfuse: "run_experiment(data, task)"
    Langfuse->>Langfuse: "asyncio.gather() 100+ items"
    loop For each item concurrently
        Langfuse->>ProcessItem: "await _process_experiment_item(item)"
        ProcessItem->>ProcessItem: "Check if item has dataset_id"
        ProcessItem->>API: "self.api.dataset_run_items.create(request)"
        Note over API: "Blocking sync call (blocks event loop)"
        API->>Server: "HTTP POST /dataset_run_items"
        Server-->>API: "Response (~300ms)"
        API-->>ProcessItem: "dataset_run_item"
        ProcessItem->>ProcessItem: "Execute task(item)"
        ProcessItem-->>Langfuse: "ExperimentItemResult"
    end
    Langfuse-->>User: "ExperimentResult"
Loading

@hassiebp hassiebp enabled auto-merge (squash) November 17, 2025 14:33
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Comment on lines +2818 to 2827
dataset_run_item = self.api.dataset_run_items.create(
CreateDatasetRunItemRequest(
runName=experiment_run_name,
runDescription=experiment_description,
metadata=experiment_metadata,
datasetItemId=item.id, # type: ignore
traceId=trace_id,
observationId=span.id,
),
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: calling sync API self.api.dataset_run_items.create() directly inside an async function blocks the event loop. When running 100+ concurrent items via asyncio.gather() (line 2690), this blocking call defeats the purpose of async concurrency. Each blocked call prevents other coroutines from executing, effectively serializing execution and degrading performance worse than the original asyncio.to_thread() approach.

Use asyncio.to_thread() to offload the blocking call:

Suggested change
dataset_run_item = self.api.dataset_run_items.create(
CreateDatasetRunItemRequest(
runName=experiment_run_name,
runDescription=experiment_description,
metadata=experiment_metadata,
datasetItemId=item.id, # type: ignore
traceId=trace_id,
observationId=span.id,
),
)
)
dataset_run_item = await asyncio.to_thread(
self.api.dataset_run_items.create,
CreateDatasetRunItemRequest(
runName=experiment_run_name,
runDescription=experiment_description,
metadata=experiment_metadata,
datasetItemId=item.id, # type: ignore
traceId=trace_id,
observationId=span.id,
)
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: langfuse/_client/client.py
Line: 2818:2827

Comment:
**logic:** calling sync API `self.api.dataset_run_items.create()` directly inside an async function blocks the event loop. When running 100+ concurrent items via `asyncio.gather()` (line 2690), this blocking call defeats the purpose of async concurrency. Each blocked call prevents other coroutines from executing, effectively serializing execution and degrading performance worse than the original `asyncio.to_thread()` approach.

Use `asyncio.to_thread()` to offload the blocking call:

```suggestion
                        dataset_run_item = await asyncio.to_thread(
                            self.api.dataset_run_items.create,
                            CreateDatasetRunItemRequest(
                                runName=experiment_run_name,
                                runDescription=experiment_description,
                                metadata=experiment_metadata,
                                datasetItemId=item.id,  # type: ignore
                                traceId=trace_id,
                                observationId=span.id,
                            )
                        )
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants