Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 1 addition & 25 deletions pages/docs/evaluation/dataset-runs/remote-run.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -364,7 +364,7 @@ Please refer to the [integrations](/docs/integrations/overview) page for details

When running an experiment on a dataset, the application that shall be tested is executed for each item in the dataset. The execution trace is then linked to the dataset item. This allows you to compare different runs of the same application on the same dataset. Each experiment is identified by a `run_name`.

<LangTabs items={["Python SDK", "JS/TS SDK", "Langchain (Python)", "Langchain (JS/TS)", "Vercel AI SDK", "Other frameworks"]}>
<LangTabs items={["Python SDK", "JS/TS SDK", "Langchain (JS/TS)", "Vercel AI SDK", "Other frameworks"]}>
<Tab>

You may then execute that LLM-app for each dataset item to create a dataset run:
Expand Down Expand Up @@ -433,30 +433,6 @@ for (const item of dataset.items) {
await langfuse.flush();
```

</Tab>
<Tab>

```python /for item in dataset.items:/
from langfuse import get_client

# Load the dataset
dataset = get_client().get_dataset("<dataset_name>")

# Loop over the dataset items
for item in dataset.items:
# Langchain callback handler that automatically links the execution trace to the dataset item
handler = item.get_langchain_handler(run_name="<run_name>")

# Execute application and pass custom handler
my_langchain_chain.run(item.input, callbacks=[handler])

# Optionally: Add scores computed in your experiment runner, e.g. json equality check
langfuse.score(trace_id=handler.get_trace_id(), name="my_score", value=1)

# Flush the langfuse client to ensure all data is sent to the server at the end of the experiment run
langfuse.flush()
```

</Tab>

<Tab>
Expand Down