You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/intelligentapps/bulkrun.md
+16-14Lines changed: 16 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,39 +2,41 @@
2
2
Order: 4
3
3
Area: intelligentapps
4
4
TOCTitle: Bulk Run
5
-
ContentId:
5
+
ContentId: 1124d141-e893-4780-aba7-b6ca13628bc5
6
6
PageTitle: Bulk Run Prompts
7
-
DateApproved:
7
+
DateApproved: 12/11/2024
8
8
MetaDescription: Run a set of prompts in an imported dataset, individually or in a full batch towards the selected genAI models and parameters.
9
-
MetaSocialImage:
10
9
---
11
10
12
11
# Run multiple prompts in bulk
13
12
14
-
The bulk run feature in AI Toolkit allows you to run multiple prompts in batch. When you use the playground, you can only run one prompt manually at a time, in the order they're listed. Bulk run takes a dataset as input, where each row in the dataset has a prompt as the minimal requirement. Typically, the dataset has multiple rows. Once imported, you can select any prompt to run or run all prompts on the selected model. The responses will be displayed in the same dataset view. The results from running the dataset can be exported.
13
+
The bulk run feature in AI Toolkit enables you to run multiple prompts in batch. When you use the playground, you can only run one prompt manually at a time, in the order they're listed.
15
14
16
-
To start a bulk run:
15
+
Bulk run takes a dataset as input, where each row in the dataset has at least a prompt. Typically, the dataset has multiple rows. Once imported, you can select one or more prompts to run on the selected model. The responses are then displayed in the same dataset view. The results from running the dataset can be exported.
17
16
18
-
1. In the AI Toolkit view, select **TOOLS** > **Bulk Run** to open the Bulk Run view.
17
+
## Start a bulk run
19
18
19
+
1. In the AI Toolkit view, select **TOOLS** > **Bulk Run** to open the Bulk Run view
20
20
21
-
1. Select either a sample dataset or import a local JSONL file that has a `query` field to use as prompts.
21
+
1. Select either a sample dataset or import a local [JSONL](https://jsonlines.org/) file with chat prompts
22
22
23
-

23
+
The JSONL file needs to have a `query` field to represent a prompt.
24
24
25
25
1. Once the dataset is loaded, select **Run** or **Rerun** on any prompt to run a single prompt.
26
26
27
-
28
-
Like in the playground, you can select AI model, add context for your prompt, and change inference parameters.
27
+
Similar to testing a model in the playground, select a model, add context for your prompt, and change inference parameters.
29
28
30
29

31
30
32
-
1. Select **Run all** on the top of the Bulk Run view to automatically run through queries. The responses are shown in the **response** column.
31
+
1. Select **Run all** to automatically run through all queries.
33
32
34
-
There is an option to only run the remaining queries that have not yet been run.
33
+
The model responses are shown in the **response** column.
35
34
36
35

37
36
38
-
1. Select the **Export** button to export the results to a JSONL format.
37
+
> [!TIP]
38
+
> There is an option to only run the remaining queries that have not yet been run.
39
+
40
+
1. Select the **Export** button to export the results to a JSONL format
39
41
40
-
1. Select **Import** to import another dataset in JSONL format for the bulk run.
42
+
1. Select **Import** to import another dataset in JSONL format for the bulk run
Copy file name to clipboardExpand all lines: docs/intelligentapps/evaluation.md
+18-15Lines changed: 18 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,30 +2,35 @@
2
2
Order: 5
3
3
Area: intelligentapps
4
4
TOCTitle: Evaluation
5
-
ContentId:
5
+
ContentId: 3342b8ef-72fe-4cca-baad-64ee57c05b5f
6
6
PageTitle: AI Evaluation
7
-
DateApproved:
7
+
DateApproved: 12/11/2024
8
8
MetaDescription: Import a dataset with LLMs or SLMs output or rerun it for the queries. Run evaluation job for the popular evaluators like F1 score, relevance, coherence, similarity... find, visualize, and compare the evaluation results in tables or charts.
9
-
MetaSocialImage:
10
9
---
11
10
12
11
# Model evaluation
13
12
14
-
AI engineers often need to evaluate models with different parameters or prompts in a dataset for comparing to ground truth and compute evaluator values from the comparisons. AI Toolkit allows you to perform evaluations with minimal effort.
13
+
AI engineers often need to evaluate models with different parameters or prompts for comparing to ground truth and compute evaluator values from the comparisons. AI Toolkit lets you perform evaluations with minimal effort by uploading a prompts dataset.
1. In AI Toolkit view, select **TOOLS** > **Evaluation** to open the Evaluation view.
21
-
1. Select the **Create Evaluation** button and provide the following information:
19
+
1. In AI Toolkit view, select **TOOLS** > **Evaluation** to open the Evaluation view
20
+
21
+
1. Select **Create Evaluation**, and then provide the following information:
22
22
23
23
- **Evaluation job name:** default or a name you can specify
24
-
- **Evaluator:** currently the built-in evaluators can be selected.
25
-

24
+
25
+
- **Evaluator:** currently, only the built-in evaluators can be selected.
26
+
27
+

28
+
26
29
- **Judging model:** a model from the list that can be selected as judging model to evaluate for some evaluators.
27
-
- **Dataset:** you can start with a sample dataset for learning purpose, or import a JSONL file with fields `query`,`response`,`ground truth`.
28
-
1. Once you provide all necessary information for evaluation, a new evaluation job is created. You will be promoted to open your new evaluation job details.
30
+
31
+
- **Dataset:** select a sample dataset for learning purpose, or import a JSONL file with fields `query`,`response`,`ground truth`.
32
+
33
+
1. A new evaluation job is created and you will be prompted to open your new evaluation job details
0 commit comments