Skip to content

Commit 1f15332

Browse files
authored
AI Toolkit docs edit pass (#7962)
* Update metadata * Edit pass
1 parent 1dc87b4 commit 1f15332

31 files changed

+284
-248
lines changed

docs/intelligentapps/bulkrun.md

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,39 +2,41 @@
22
Order: 4
33
Area: intelligentapps
44
TOCTitle: Bulk Run
5-
ContentId:
5+
ContentId: 1124d141-e893-4780-aba7-b6ca13628bc5
66
PageTitle: Bulk Run Prompts
7-
DateApproved:
7+
DateApproved: 12/11/2024
88
MetaDescription: Run a set of prompts in an imported dataset, individually or in a full batch towards the selected genAI models and parameters.
9-
MetaSocialImage:
109
---
1110

1211
# Run multiple prompts in bulk
1312

14-
The bulk run feature in AI Toolkit allows you to run multiple prompts in batch. When you use the playground, you can only run one prompt manually at a time, in the order they're listed. Bulk run takes a dataset as input, where each row in the dataset has a prompt as the minimal requirement. Typically, the dataset has multiple rows. Once imported, you can select any prompt to run or run all prompts on the selected model. The responses will be displayed in the same dataset view. The results from running the dataset can be exported.
13+
The bulk run feature in AI Toolkit enables you to run multiple prompts in batch. When you use the playground, you can only run one prompt manually at a time, in the order they're listed.
1514

16-
To start a bulk run:
15+
Bulk run takes a dataset as input, where each row in the dataset has at least a prompt. Typically, the dataset has multiple rows. Once imported, you can select one or more prompts to run on the selected model. The responses are then displayed in the same dataset view. The results from running the dataset can be exported.
1716

18-
1. In the AI Toolkit view, select **TOOLS** > **Bulk Run** to open the Bulk Run view.
17+
## Start a bulk run
1918

19+
1. In the AI Toolkit view, select **TOOLS** > **Bulk Run** to open the Bulk Run view
2020

21-
1. Select either a sample dataset or import a local JSONL file that has a `query` field to use as prompts.
21+
1. Select either a sample dataset or import a local [JSONL](https://jsonlines.org/) file with chat prompts
2222

23-
![Select dataset](./images/bulkrun/dataset.png)
23+
The JSONL file needs to have a `query` field to represent a prompt.
2424

2525
1. Once the dataset is loaded, select **Run** or **Rerun** on any prompt to run a single prompt.
2626

27-
28-
Like in the playground, you can select AI model, add context for your prompt, and change inference parameters.
27+
Similar to testing a model in the playground, select a model, add context for your prompt, and change inference parameters.
2928

3029
![Bulk run prompts](./images/bulkrun/bulkrun_one.png)
3130

32-
1. Select **Run all** on the top of the Bulk Run view to automatically run through queries. The responses are shown in the **response** column.
31+
1. Select **Run all** to automatically run through all queries.
3332

34-
There is an option to only run the remaining queries that have not yet been run.
33+
The model responses are shown in the **response** column.
3534

3635
![Run all](./images/bulkrun/runall.png)
3736

38-
1. Select the **Export** button to export the results to a JSONL format.
37+
> [!TIP]
38+
> There is an option to only run the remaining queries that have not yet been run.
39+
40+
1. Select the **Export** button to export the results to a JSONL format
3941

40-
1. Select **Import** to import another dataset in JSONL format for the bulk run.
42+
1. Select **Import** to import another dataset in JSONL format for the bulk run

docs/intelligentapps/evaluation.md

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,30 +2,35 @@
22
Order: 5
33
Area: intelligentapps
44
TOCTitle: Evaluation
5-
ContentId:
5+
ContentId: 3342b8ef-72fe-4cca-baad-64ee57c05b5f
66
PageTitle: AI Evaluation
7-
DateApproved:
7+
DateApproved: 12/11/2024
88
MetaDescription: Import a dataset with LLMs or SLMs output or rerun it for the queries. Run evaluation job for the popular evaluators like F1 score, relevance, coherence, similarity... find, visualize, and compare the evaluation results in tables or charts.
9-
MetaSocialImage:
109
---
1110

1211
# Model evaluation
1312

14-
AI engineers often need to evaluate models with different parameters or prompts in a dataset for comparing to ground truth and compute evaluator values from the comparisons. AI Toolkit allows you to perform evaluations with minimal effort.
13+
AI engineers often need to evaluate models with different parameters or prompts for comparing to ground truth and compute evaluator values from the comparisons. AI Toolkit lets you perform evaluations with minimal effort by uploading a prompts dataset.
1514

1615
![Start evaluation](./images/evaluation/evaluation.png)
1716

1817
## Start an evaluation job
1918

20-
1. In AI Toolkit view, select **TOOLS** > **Evaluation** to open the Evaluation view.
21-
1. Select the **Create Evaluation** button and provide the following information:
19+
1. In AI Toolkit view, select **TOOLS** > **Evaluation** to open the Evaluation view
20+
21+
1. Select **Create Evaluation**, and then provide the following information:
2222

2323
- **Evaluation job name:** default or a name you can specify
24-
- **Evaluator:** currently the built-in evaluators can be selected.
25-
![Evaluators](./images/evaluation/evaluators.png)
24+
25+
- **Evaluator:** currently, only the built-in evaluators can be selected.
26+
27+
![Screenshot of a Quick Pick with the list of built-in evaluators](./images/evaluation/evaluators.png)
28+
2629
- **Judging model:** a model from the list that can be selected as judging model to evaluate for some evaluators.
27-
- **Dataset:** you can start with a sample dataset for learning purpose, or import a JSONL file with fields `query`,`response`,`ground truth`.
28-
1. Once you provide all necessary information for evaluation, a new evaluation job is created. You will be promoted to open your new evaluation job details.
30+
31+
- **Dataset:** select a sample dataset for learning purpose, or import a JSONL file with fields `query`,`response`,`ground truth`.
32+
33+
1. A new evaluation job is created and you will be prompted to open your new evaluation job details
2934

3035
![Open evaluation](./images/evaluation/openevaluation.png)
3136

@@ -43,10 +48,8 @@ Each evaluation job has a link to the dataset that was used, logs from the evalu
4348

4449
## Find results of evaluation
4550

46-
Select the evaluation job detail, the view has columns of selected evaluators with the numerical values. Some may have aggregate values.
47-
48-
You can also select **Open In Data Wrangler** to open the data with the Data Wrangler extension.
51+
The evaluation job details view shows a table of the results for each of the selected evaluators. Note that some results may have aggregate values.
4952

50-
> <a class="install-extension-btn" href="vscode:extension/ms-toolsai.datawrangler">Install Data Wrangler</a>
53+
You can also select **Open In Data Wrangler** to open the data with the [Data Wrangler extension](vscode:extension/ms-toolsai.datawrangler).
5154

52-
![Data Wrangler](./images/evaluation/datawrangler.png)
55+
![Screenshot the Data Wrangler extension, showing the evaluation results.](./images/evaluation/datawrangler.png)

0 commit comments

Comments
 (0)