Skip to content

Commit 4e6c679

Browse files
authored
docs: Consolidate experiment content for LS nav (#170)
1 parent bc0a081 commit 4e6c679

10 files changed

+130
-136
lines changed

src/langsmith/analyze-single-experiment.mdx

Lines changed: 0 additions & 94 deletions
This file was deleted.

src/langsmith/bind-evaluator-to-dataset.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,5 +50,5 @@ def perform_eval(run, example):
5050

5151
## Next steps
5252

53-
* Analyze your experiment results in the [experiments tab](/langsmith/analyze-single-experiment)
53+
* Analyze your experiment results in the [experiments tab](/langsmith/work-with-experiments)
5454
* Compare your experiment results in the [comparison view](/langsmith/compare-experiment-results)

src/langsmith/download-experiment-results-as-csv.mdx

Lines changed: 0 additions & 13 deletions
This file was deleted.

src/langsmith/evaluate-pairwise.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ Note that you should choose a feedback key that is distinct from standard feedba
7878
The following example uses [a prompt](https://smith.langchain.com/hub/langchain-ai/pairwise-evaluation-2) which asks the LLM to decide which is better between two AI assistant responses. It uses structured output to parse the AI's response: 0, 1, or 2.
7979

8080
<Info>
81-
In the Python example below, we are pulling [this structured prompt](https://smith.langchain.com/hub/langchain-ai/pairwise-evaluation-2) from the [LangChain Hub](/langsmith/langchain-hub) and using it with a LangChain chat model wrapper.
81+
In the Python example below, we are pulling [this structured prompt](https://smith.langchain.com/hub/langchain-ai/pairwise-evaluation-2) from the [LangChain Hub](/langsmith/manage-prompts) and using it with a LangChain chat model wrapper.
8282

8383
**Usage of LangChain is totally optional.** To illustrate this point, the TypeScript example uses the OpenAI SDK directly.
8484
</Info>

src/langsmith/evaluation-overview.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ Learn [how run pairwise evaluations](/langsmith/evaluate-pairwise).
122122

123123
## Experiment
124124

125-
Each time we evaluate an application on a dataset, we are conducting an experiment. An experiment contains the results of running a specific version of your application on the dataset. To understand how to use the LangSmith experiment view, see [how to analyze experiment results](/langsmith/analyze-single-experiment).
125+
Each time we evaluate an application on a dataset, we are conducting an experiment. An experiment contains the results of running a specific version of your application on the dataset. To understand how to use the LangSmith experiment view, see [how to analyze experiment results](/langsmith/work-with-experiments).
126126

127127
![Experiment view](/langsmith/images/experiment-view.png)
128128

src/langsmith/home.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ The quality and development speed of AI applications depends on high-quality eva
4040

4141
* Get started by [creating your first evaluation](/langsmith/run-evaluation-from-prompt-playground).
4242
* Quickly assess the performance of your application using our [off-the-shelf evaluators](https://docs.smith.langchain.com/langsmith/prebuilt-evaluators) as a starting point.
43-
* [Analyze results](/langsmith/analyze-single-experiment) of evaluations in the LangSmith UI and [compare results](https://docs.smith.langchain.com/langsmith/compare-experiment-results) over time.
43+
* [Analyze results](/langsmith/work-with-experiments) of evaluations in the LangSmith UI and [compare results](https://docs.smith.langchain.com/langsmith/compare-experiment-results) over time.
4444
* Easily collect [human feedback](/langsmith/annotation-queues) on your data to improve your application.
4545

4646
## Prompt Engineering

src/langsmith/manage-datasets-in-application.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ In order to create and manage splits in the app, you can select some examples in
121121

122122
### Edit example metadata
123123

124-
You can add metadata to your examples by clicking on an example and then clicking "Edit" on the top righthand side of the popover. From this page, you can update/delete existing metadata, or add new metadata. You may use this to store information about your examples, such as tags or version info, which you can then [group by](/langsmith/analyze-single-experiment#group-results-by-metadata) when analyzing experiment results or [filter by](/langsmith/manage-datasets-programmatically#list-examples-by-metadata) when you call `list_examples` in the SDK.
124+
You can add metadata to your examples by clicking on an example and then clicking "Edit" on the top righthand side of the popover. From this page, you can update/delete existing metadata, or add new metadata. You may use this to store information about your examples, such as tags or version info, which you can then [group by](/langsmith/work-with-experiments#group-results-by-metadata) when analyzing experiment results or [filter by](/langsmith/manage-datasets-programmatically#list-examples-by-metadata) when you call `list_examples` in the SDK.
125125

126126
![Add Metadata](/langsmith/images/add-metadata.gif)
127127

src/langsmith/manage-prompts-programmatically.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,7 @@ Similar to pushing a prompt, you can also pull a prompt as a RunnableSequence of
214214
</Tab>
215215
</Tabs>
216216

217-
When pulling a prompt, you can also specify a specific commit hash or [commit tag](/langsmith/prompt-tags) to pull a specific version of the prompt.
217+
When pulling a prompt, you can also specify a specific commit hash or [commit tag](/langsmith/manage-prompts) to pull a specific version of the prompt.
218218

219219
<Tabs>
220220
<Tab title="Python">

src/langsmith/renaming-experiment.mdx

Lines changed: 0 additions & 23 deletions
This file was deleted.
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
---
2+
title: Work with experiments
3+
sidebarTitle: Work with experiments
4+
---
5+
6+
This page describes some of the essential tasks for working with [_experiments_](/langsmith/evaluation-overview#experiment) in LangSmith:
7+
8+
- **[Analyze a single experiment](#analyze-a-single-experiment)**: View and interpret experiment results, customize columns, filter data, and compare runs.
9+
- **[Download experiment results as a CSV](#how-to-download-experiment-results-as-a-csv)**: Export your experiment data for external analysis and sharing.
10+
- **[Rename an experiment](#how-to-rename-an-experiment)**: Update experiment names in both the Playground and Experiments view.
11+
12+
## Analyze a single experiment
13+
14+
After running an experiment, you can use LangSmith's experiment view to analyze the results and draw insights about your experiment's performance.
15+
16+
### Open the experiment view
17+
18+
To open the experiment view, select the relevant [_dataset_](/langsmith/evaluation-overview#datasets) from the **Dataset & Experiments** page and then select the experiment you want to view.
19+
20+
![Open experiment view](/langsmith/images/select-experiment.png)
21+
22+
### View experiment results
23+
24+
#### Customize columns
25+
26+
By default, the experiment view shows the input, output, and reference output for each [example](/langsmith/evaluation-overview#examples) in the dataset, feedback scores from evaluations and experiment metrics like cost, token counts, latency and status.
27+
28+
You can customize the columns using the **Display** button to make it easier to interpret experiment results:
29+
30+
- **Break out fields from inputs, outputs, and reference outputs** into their own columns. This is especially helpful if you have long inputs/outputs/reference outputs and want to surface important fields.
31+
- **Hide and reorder columns** to create focused views for analysis.
32+
- **Control decimal precision on feedback scores**. By default, LangSmith surfaces numerical feedback scores with a decimal precision of 2, but you can customize this setting to be up to 6 decimals.
33+
34+
<Tip>
35+
You can set default configurations for an entire dataset or temporarily save settings just for yourself.
36+
</Tip>
37+
38+
![Experiment view](/langsmith/images/column-config.gif)
39+
40+
You can also set the high, middle, and low thresholds for numeric feedback scores in your experiment, which affects the threshold at which score chips render as red or green:
41+
42+
![Column heatmap configuration](/langsmith/images/column-heat-map.png)
43+
44+
#### Sort and filter
45+
46+
To sort or filter feedback scores, you can use the actions in the column headers.
47+
48+
![Sort and filter](/langsmith/images/sort-filter.png)
49+
50+
#### Table views
51+
52+
Depending on the view most useful for your analysis, you can change the formatting of the table by toggling between a compact view, a full, view, and a diff view.
53+
54+
- The **Compact** view shows each run as a one-line row, for ease of comparing scores at a glance.
55+
- The **Full** view shows the full output for each run for digging into the details of individual runs.
56+
- The **Diff** view shows the text difference between the reference output and the output for each run.
57+
58+
![Diff view](/langsmith/images/diff-mode.png)
59+
60+
#### View the traces
61+
62+
Hover over any of the output cells, and click on the trace icon to view the trace for that run. This will open up a trace in the side panel.
63+
64+
To view the entire tracing project, click on the **View Project** button in the top right of the header.
65+
66+
![View trace](/langsmith/images/view-trace.png)
67+
68+
#### View evaluator runs
69+
70+
For evaluator scores, you can view the source run by hovering over the evaluator score cell and clicking on the arrow icon. This will open up a trace in the side panel. If you're running a [LLM-as-a-judge evaluator](/langsmith/llm-as-judge), you can view the prompt used for the evaluator in this run. If your experiment has [repetitions](/langsmith/evaluation-overview#repetitions), you can click on the aggregate average score to find links to all of the individual runs.
71+
72+
![View evaluator runs](/langsmith/images/evaluator-run.png)
73+
74+
### Group results by metadata
75+
76+
You can add metadata to examples to categorize and organize them. For example, if you're evaluating factual accuracy on a question answering dataset, the metadata might include which subject area each question belongs to. Metadata can be added either [via the UI](/langsmith/manage-datasets-in-application#edit-example-metadata) or [via the SDK](/langsmith/manage-datasets-programmatically#update-single-example).
77+
78+
To analyze results by metadata, use the **Group by** dropdown in the top right corner of the experiment view and select your desired metadata key. This displays average feedback scores, latency, total tokens, and cost for each metadata group.
79+
80+
<Info>
81+
You will only be able to group by example metadata on experiments created after February 20th, 2025. Any experiments before that date can still be grouped by metadata, but only if the metadata is on the experiment traces themselves.
82+
</Info>
83+
84+
![Group by](/langsmith/images/group-by.gif)
85+
86+
### Repetitions
87+
88+
If you've run your experiment with [_repetitions_](/langsmith/evaluation-overview#repetitions), there will be arrows in the output results column so you can view outputs in the table. To view each run from the repetition, hover over the output cell and click the expanded view.
89+
90+
When you run an experiment with repetitions, LangSmith displays the average for each feedback score in the table. Click on the feedback score to view the feedback scores from individual runs, or to view the standard deviation across repetitions.
91+
92+
![Repetitions](/langsmith/images/repetitions.png)
93+
94+
### Compare to another experiment
95+
96+
In the top right of the experiment view, you can select another experiment to compare to. This will open up a comparison view, where you can see how the two experiments compare. To learn more about the comparison view, see [how to compare experiment results](/langsmith/compare-experiment-results).
97+
98+
![Compare](/langsmith/images/compare-to-another.png)
99+
100+
## Download experiment results as a CSV
101+
102+
LangSmith lets you download experiment results as a CSV file, which allows you to analyze and share your results.
103+
104+
To download as a CSV, click the download icon at the top of the experiment view. The icon is directly to the left of the [Compact toggle](/langsmith/compare-experiment-results#adjust-the-table-display).
105+
106+
![Download CSV](/langsmith/images/download-experiment-results-as-csv.png)
107+
108+
## Rename an experiment
109+
110+
<Note>
111+
Experiment names must be unique per workspace.
112+
</Note>
113+
114+
You can rename an experiment in the LangSmith UI in:
115+
116+
- The [Playground](#renaming-an-experiment-in-the-playground). When running experiments in the Playground, a default name with the format `pg::prompt-name::model::uuid` (eg. `pg::gpt-4o-mini::897ee630`) is automatically assigned.
117+
118+
You can rename an experiment immediately after running it by editing its name in the Playground table header.
119+
120+
![Edit name in playground](/langsmith/images/rename-in-playground.png)
121+
122+
- The [Experiments view](#renaming-an-experiment-in-the-experiments-view). When viewing results in the experiments view, you can rename an experiment by using the pencil icon beside the experiment name.
123+
124+
![Edit name in experiments view](/langsmith/images/rename-in-experiments-view.png)

0 commit comments

Comments
 (0)