Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
399 changes: 399 additions & 0 deletions src/docs.json

Large diffs are not rendered by default.

94 changes: 0 additions & 94 deletions src/langsmith/analyze-single-experiment.mdx

This file was deleted.

2 changes: 1 addition & 1 deletion src/langsmith/bind-evaluator-to-dataset.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -50,5 +50,5 @@ def perform_eval(run, example):

## Next steps

* Analyze your experiment results in the [experiments tab](/langsmith/analyze-single-experiment)
* Analyze your experiment results in the [experiments tab](/langsmith/work-with-experiments)
* Compare your experiment results in the [comparison view](/langsmith/compare-experiment-results)
13 changes: 0 additions & 13 deletions src/langsmith/download-experiment-results-as-csv.mdx

This file was deleted.

2 changes: 1 addition & 1 deletion src/langsmith/evaluate-pairwise.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ Note that you should choose a feedback key that is distinct from standard feedba
The following example uses [a prompt](https://smith.langchain.com/hub/langchain-ai/pairwise-evaluation-2) which asks the LLM to decide which is better between two AI assistant responses. It uses structured output to parse the AI's response: 0, 1, or 2.

<Info>
In the Python example below, we are pulling [this structured prompt](https://smith.langchain.com/hub/langchain-ai/pairwise-evaluation-2) from the [LangChain Hub](/langsmith/langchain-hub) and using it with a LangChain chat model wrapper.
In the Python example below, we are pulling [this structured prompt](https://smith.langchain.com/hub/langchain-ai/pairwise-evaluation-2) from the [LangChain Hub](/langsmith/manage-prompts) and using it with a LangChain chat model wrapper.

**Usage of LangChain is totally optional.** To illustrate this point, the TypeScript example uses the OpenAI SDK directly.
</Info>
Expand Down
2 changes: 1 addition & 1 deletion src/langsmith/evaluation-overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ Learn [how run pairwise evaluations](/langsmith/evaluate-pairwise).

## Experiment

Each time we evaluate an application on a dataset, we are conducting an experiment. An experiment contains the results of running a specific version of your application on the dataset. To understand how to use the LangSmith experiment view, see [how to analyze experiment results](/langsmith/analyze-single-experiment).
Each time we evaluate an application on a dataset, we are conducting an experiment. An experiment contains the results of running a specific version of your application on the dataset. To understand how to use the LangSmith experiment view, see [how to analyze experiment results](/langsmith/work-with-experiments).

![Experiment view](/langsmith/images/experiment-view.png)

Expand Down
2 changes: 1 addition & 1 deletion src/langsmith/home.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ The quality and development speed of AI applications depends on high-quality eva

* Get started by [creating your first evaluation](/langsmith/run-evaluation-from-prompt-playground).
* Quickly assess the performance of your application using our [off-the-shelf evaluators](https://docs.smith.langchain.com/langsmith/prebuilt-evaluators) as a starting point.
* [Analyze results](/langsmith/analyze-single-experiment) of evaluations in the LangSmith UI and [compare results](https://docs.smith.langchain.com/langsmith/compare-experiment-results) over time.
* [Analyze results](/langsmith/work-with-experiments) of evaluations in the LangSmith UI and [compare results](https://docs.smith.langchain.com/langsmith/compare-experiment-results) over time.
* Easily collect [human feedback](/langsmith/annotation-queues) on your data to improve your application.

## Prompt Engineering
Expand Down
2 changes: 1 addition & 1 deletion src/langsmith/manage-datasets-in-application.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ In order to create and manage splits in the app, you can select some examples in

### Edit example metadata

You can add metadata to your examples by clicking on an example and then clicking "Edit" on the top righthand side of the popover. From this page, you can update/delete existing metadata, or add new metadata. You may use this to store information about your examples, such as tags or version info, which you can then [group by](/langsmith/analyze-single-experiment#group-results-by-metadata) when analyzing experiment results or [filter by](/langsmith/manage-datasets-programmatically#list-examples-by-metadata) when you call `list_examples` in the SDK.
You can add metadata to your examples by clicking on an example and then clicking "Edit" on the top righthand side of the popover. From this page, you can update/delete existing metadata, or add new metadata. You may use this to store information about your examples, such as tags or version info, which you can then [group by](/langsmith/work-with-experiments#group-results-by-metadata) when analyzing experiment results or [filter by](/langsmith/manage-datasets-programmatically#list-examples-by-metadata) when you call `list_examples` in the SDK.

![Add Metadata](/langsmith/images/add-metadata.gif)

Expand Down
2 changes: 1 addition & 1 deletion src/langsmith/manage-prompts-programmatically.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ Similar to pushing a prompt, you can also pull a prompt as a RunnableSequence of
</Tab>
</Tabs>

When pulling a prompt, you can also specify a specific commit hash or [commit tag](/langsmith/prompt-tags) to pull a specific version of the prompt.
When pulling a prompt, you can also specify a specific commit hash or [commit tag](/langsmith/manage-prompts) to pull a specific version of the prompt.

<Tabs>
<Tab title="Python">
Expand Down
23 changes: 0 additions & 23 deletions src/langsmith/renaming-experiment.mdx

This file was deleted.

124 changes: 124 additions & 0 deletions src/langsmith/work-with-experiments.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
---
title: Work with experiments
sidebarTitle: Work with experiments
---

This page describes some of the essential tasks for working with [_experiments_](/langsmith/evaluation-overview#experiment) in LangSmith:

- **[Analyze a single experiment](#analyze-a-single-experiment)**: View and interpret experiment results, customize columns, filter data, and compare runs.
- **[Download experiment results as a CSV](#how-to-download-experiment-results-as-a-csv)**: Export your experiment data for external analysis and sharing.
- **[Rename an experiment](#how-to-rename-an-experiment)**: Update experiment names in both the Playground and Experiments view.

## Analyze a single experiment

After running an experiment, you can use LangSmith's experiment view to analyze the results and draw insights about your experiment's performance.

### Open the experiment view

To open the experiment view, select the relevant [_dataset_](/langsmith/evaluation-overview#datasets) from the **Dataset & Experiments** page and then select the experiment you want to view.

![Open experiment view](/langsmith/images/select-experiment.png)

### View experiment results

#### Customize columns

By default, the experiment view shows the input, output, and reference output for each [example](/langsmith/evaluation-overview#examples) in the dataset, feedback scores from evaluations and experiment metrics like cost, token counts, latency and status.

You can customize the columns using the **Display** button to make it easier to interpret experiment results:

- **Break out fields from inputs, outputs, and reference outputs** into their own columns. This is especially helpful if you have long inputs/outputs/reference outputs and want to surface important fields.
- **Hide and reorder columns** to create focused views for analysis.
- **Control decimal precision on feedback scores**. By default, LangSmith surfaces numerical feedback scores with a decimal precision of 2, but you can customize this setting to be up to 6 decimals.

<Tip>
You can set default configurations for an entire dataset or temporarily save settings just for yourself.
</Tip>

![Experiment view](/langsmith/images/column-config.gif)

You can also set the high, middle, and low thresholds for numeric feedback scores in your experiment, which affects the threshold at which score chips render as red or green:

![Column heatmap configuration](/langsmith/images/column-heat-map.png)

#### Sort and filter

To sort or filter feedback scores, you can use the actions in the column headers.

![Sort and filter](/langsmith/images/sort-filter.png)

#### Table views

Depending on the view most useful for your analysis, you can change the formatting of the table by toggling between a compact view, a full, view, and a diff view.

- The **Compact** view shows each run as a one-line row, for ease of comparing scores at a glance.
- The **Full** view shows the full output for each run for digging into the details of individual runs.
- The **Diff** view shows the text difference between the reference output and the output for each run.

![Diff view](/langsmith/images/diff-mode.png)

#### View the traces

Hover over any of the output cells, and click on the trace icon to view the trace for that run. This will open up a trace in the side panel.

To view the entire tracing project, click on the **View Project** button in the top right of the header.

![View trace](/langsmith/images/view-trace.png)

#### View evaluator runs

For evaluator scores, you can view the source run by hovering over the evaluator score cell and clicking on the arrow icon. This will open up a trace in the side panel. If you're running a [LLM-as-a-judge evaluator](/langsmith/llm-as-judge), you can view the prompt used for the evaluator in this run. If your experiment has [repetitions](/langsmith/evaluation-overview#repetitions), you can click on the aggregate average score to find links to all of the individual runs.

![View evaluator runs](/langsmith/images/evaluator-run.png)

### Group results by metadata

You can add metadata to examples to categorize and organize them. For example, if you're evaluating factual accuracy on a question answering dataset, the metadata might include which subject area each question belongs to. Metadata can be added either [via the UI](/langsmith/manage-datasets-in-application#edit-example-metadata) or [via the SDK](/langsmith/manage-datasets-programmatically#update-single-example).

To analyze results by metadata, use the **Group by** dropdown in the top right corner of the experiment view and select your desired metadata key. This displays average feedback scores, latency, total tokens, and cost for each metadata group.

<Info>
You will only be able to group by example metadata on experiments created after February 20th, 2025. Any experiments before that date can still be grouped by metadata, but only if the metadata is on the experiment traces themselves.
</Info>

![Group by](/langsmith/images/group-by.gif)

### Repetitions

If you've run your experiment with [_repetitions_](/langsmith/evaluation-overview#repetitions), there will be arrows in the output results column so you can view outputs in the table. To view each run from the repetition, hover over the output cell and click the expanded view.

When you run an experiment with repetitions, LangSmith displays the average for each feedback score in the table. Click on the feedback score to view the feedback scores from individual runs, or to view the standard deviation across repetitions.

![Repetitions](/langsmith/images/repetitions.png)

### Compare to another experiment

In the top right of the experiment view, you can select another experiment to compare to. This will open up a comparison view, where you can see how the two experiments compare. To learn more about the comparison view, see [how to compare experiment results](/langsmith/compare-experiment-results).

![Compare](/langsmith/images/compare-to-another.png)

## Download experiment results as a CSV

LangSmith lets you download experiment results as a CSV file, which allows you to analyze and share your results.

To download as a CSV, click the download icon at the top of the experiment view. The icon is directly to the left of the [Compact toggle](/langsmith/compare-experiment-results#adjust-the-table-display).

![Download CSV](/langsmith/images/download-experiment-results-as-csv.png)

## Rename an experiment

<Note>
Experiment names must be unique per workspace.
</Note>

You can rename an experiment in the LangSmith UI in:

- The [Playground](#renaming-an-experiment-in-the-playground). When running experiments in the Playground, a default name with the format `pg::prompt-name::model::uuid` (eg. `pg::gpt-4o-mini::897ee630`) is automatically assigned.

You can rename an experiment immediately after running it by editing its name in the Playground table header.

![Edit name in playground](/langsmith/images/rename-in-playground.png)

- The [Experiments view](#renaming-an-experiment-in-the-experiments-view). When viewing results in the experiments view, you can rename an experiment by using the pencil icon beside the experiment name.

![Edit name in experiments view](/langsmith/images/rename-in-experiments-view.png)