Skip to content

Commit d150341

Browse files
committed
Improve docs
Signed-off-by: Claudio Spiess <[email protected]>
1 parent b7cbec5 commit d150341

File tree

3 files changed

+123
-36
lines changed

3 files changed

+123
-36
lines changed

docs/autopdl.md

Lines changed: 97 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,15 @@ hide:
77

88
# AutoPDL Tutorial
99

10-
The following sections show how to use the AutoPDL optimizer to produce optimized PDL programs for specific tasks.
10+
The following sections show how to use the AutoPDL optimizer introduced by [Spiess et al. (2025)](https://openreview.net/forum?id=CAeISyE3aR) in "AutoPDL: Automatic Prompt Optimization for LLM Agents" ([arXiv](https://arxiv.org/abs/2504.04365)), to produce optimized PDL programs for specific tasks. Please ensure PDL was installed with extras e.g.
11+
12+
``` { .bash .copy .annotate linenums="1" }
13+
pip install 'prompt-declaration-language[all]'
14+
# or from source
15+
git clone [email protected]:IBM/prompt-declaration-language.git
16+
cd prompt-declaration-language
17+
pip install -e '.[all]'
18+
```
1119

1220
To optimize a PDL program, we need the program, an optimizer configuration, a dataset, and an _evaluator_. An evaluator is a Python subclass of `OptimizerEvaluator` that evaluates a candidate, which is a generated configuration instance consisting of e.g. fewshot examples. The evaluator class follows this structure:
1321

@@ -52,39 +60,14 @@ class OptimizerEvaluator(Thread):
5260

5361
Let's go through an example for `GSM8K`. Our PDL program uses different prompt patterns from the prompt library, and the variables `prompt_pattern`, `question`, `model`, and `demonstrations` are inserted at runtime by the evaluator.
5462

55-
5663
```yaml title="examples/optimizer/gsm8k.pdl" linenums="1"
5764
--8<-- "./examples/optimizer/gsm8k.pdl"
5865
```
5966

60-
We write a configuration file for the optimizer, see `src/pdl/optimize/config_parser.py` for all fields:
61-
62-
``` { .yaml .copy .annotate title="gsm8k_optimizer_config.yml" linenums="1" }
63-
benchmark: gsm8k # Name our benchmark
64-
budget: null # Set a budget, can be number of iterations, or a duration string e.g. "2h"
65-
budget_growth: double # double validation set size each iteration
66-
# or to_max: reach max_test_set_size by final iteration
67-
initial_test_set_size: 2 # size of test set in first iteration
68-
max_test_set_size: 10 # maximum test set size
69-
num_candidates: 100 # how many candidates to evaluate
70-
num_demonstrations: 5 # how many demonstrations to include per candidate
71-
parallelism: 1 # how many threads to run evaluations across
72-
shuffle_test: false # shuffling of test set
73-
test_set_name: test # name of test set
74-
train_set_name: train # name of train set
75-
validation_set_name: validation # name of validation set
76-
demonstrations_variable_name: demonstrations # variable name to insert demonstrations into
77-
variables: # define discrete options to sample from
78-
model: # set ${ model } variable
79-
- watsonx/meta-llama/llama-3-1-8b-instruct
80-
prompt_pattern: # set ${ prompt_pattern } variable to one of these
81-
- cot
82-
- react
83-
- rewoo
84-
num_demonstrations: # overrides num demonstrations above
85-
- 0
86-
- 3
87-
- 5
67+
We write a configuration file for the optimizer, and save it as `gsm8k_optimizer_config.yml`. See `src/pdl/optimize/config_parser.py` for all fields.
68+
69+
``` { .yaml .copy .annotate title="examples/optimizer/gsm8k_optimizer_config.yml" linenums="1" }
70+
--8<-- "./examples/optimizer/gsm8k_optimizer_config.yml"
8871
```
8972

9073

@@ -95,20 +78,99 @@ variables: # define discrete options to sample from
9578
We can see an example of a script to run the optimization process in `examples/optimizer/optimize.py`.
9679
Usage:
9780

98-
```
81+
```text
9982
python optimize.py optimize -h
10083
usage: optimize.py optimize [-h] --config CONFIG --dataset-path DATASET_PATH [--experiments-path EXPERIMENTS_PATH]
10184
[--yield_output | --no-yield_output] [--dry | --no-dry]
10285
pdl_file
10386
```
10487

105-
We also need a dataset to optimize against, with `train`, `test`, and `validation` splits. To produce such a dataset, we can use HuggingFace Datasets `load_dataset` and `save_to_disk`. This example requires the dataset to have columns `question`, `reasoning`, and `answer`, which can be created from the original `openai/gsm8k` dataset. Processing scripts are under development and will follow shortly.
88+
We also need a dataset to optimize against, with `train`, `test`, and `validation` splits. To produce such a dataset, we can use HuggingFace Datasets `load_dataset` and `save_to_disk`. This example requires the dataset to have columns `question`, `reasoning`, and `answer`, which can be created from the original `openai/gsm8k` dataset.
89+
90+
We provide three scripts in `examples/optimizer` to create datasets, including the rule based agentic trajectories. These are `process_gsm8k.py`, `process_fever.py`, and `process_mbpp.py`. They load the original datasets, process them, and save them to disk in the required format. Dataset specific instructions may be found in the respective script files. Note that the scripts create a folder named `var` in the current directory, which contains the processed dataset in a format that can be used by the optimizer. Therefore, they should be run in the root of the PDL repository.
10691

107-
We can run an example like so:
92+
Let's run the GSM8K dataset processing script:
10893

94+
``` { .bash .copy .annotate linenums="1" }
95+
python examples/optimizer/process_gsm8k.py
10996
```
97+
98+
Which should save the processed dataset in `var/gsm8k_trajectified` and output something like:
99+
100+
```text
101+
Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 557195.73 examples/s]
102+
Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 1319/1319 [00:00<00:00, 363559.64 examples/s]
103+
Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 1024/1024 [00:00<00:00, 271472.56 examples/s]
104+
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 71242.31 examples/s]
105+
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 1024/1024 [00:00<00:00, 68826.30 examples/s]
106+
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 22520.85 examples/s]
107+
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 18186.53 examples/s]
108+
Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 698328.77 examples/s]
109+
Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 1319/1319 [00:00<00:00, 232468.57 examples/s]
110+
Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 1024/1024 [00:00<00:00, 413375.10 examples/s]
111+
DatasetDict({
112+
train: Dataset({
113+
features: ['question', 'answer', 'reasoning', 'raw_answer', 'answer_part', 'traj_keys', 'traj_values', 'rewoo_traj_keys', 'rewoo_traj_values'],
114+
num_rows: 6449
115+
})
116+
test: Dataset({
117+
features: ['question', 'answer', 'reasoning', 'raw_answer', 'answer_part'],
118+
num_rows: 1319
119+
})
120+
validation: Dataset({
121+
features: ['question', 'answer', 'reasoning', 'raw_answer', 'answer_part'],
122+
num_rows: 1024
123+
})
124+
})
125+
```
126+
127+
Finally, we can run the example like so:
128+
129+
``` { .bash .copy .annotate linenums="1" }
110130
cd examples/optimizer
111-
python optimize.py optimize --config config.yml --dataset-path datasets/gsm8k gsm8k.pdl
131+
python optimize.py optimize --config gsm8k_optimizer_config.yml --dataset-path ../../var/gsm8k_trajectified gsm8k.pdl
132+
```
133+
134+
This will report details about the optimization process, such as the number of candidates evaluated. The output will look something like this:
135+
136+
```text
137+
PDL Optimizer pdl_optimizer.py:336
138+
┌──────────────────────────────┬─────────────────────────────────────────────┐
139+
│ Config combinations │ 9 │
140+
│ Max candidates │ 100 │
141+
│ Num. candidates │ 100 │
142+
│ Starting validation set size │ 2 │
143+
│ Max validation set size │ 10 │
144+
│ Num. iterations │ 7 │
145+
│ Total evaluations │ 1,200 │
146+
│ Num. threads │ 1 │
147+
│ Validation set multiplier │ 2 │
148+
│ Shuffle validation set │ False │
149+
│ Budget policy │ None │
150+
├──────────────────────────────┼─────────────────────────────────────────────┤
151+
│ model │ ['watsonx/meta-llama/llama-3-2-3b-instruct… │
152+
│ prompt_pattern │ ['cot', 'react', 'rewoo'] │
153+
│ num_demonstrations │ [0, 3, 5] │
154+
└──────────────────────────────┴─────────────────────────────────────────────┘
155+
Iteration pdl_optimizer.py:419
156+
┌─────────────────────┬─────┐
157+
│ Index │ 0 │
158+
│ Validation set size │ 2 │
159+
│ Num. candidates │ 100 │
160+
└─────────────────────┴─────┘
161+
Evaluation pdl_optimizer.py:601
162+
┌────────────────────────┬──────────────────────────────────────────┐
163+
│ Test set size │ 2 │
164+
├────────────────────────┼──────────────────────────────────────────┤
165+
│ model │ watsonx/meta-llama/llama-3-2-3b-instruct │
166+
│ prompt_pattern │ cot │
167+
│ num_demonstrations │ 0 │
168+
│ uuid │ enl0ertp │
169+
│ demonstrations_indices │ 0 │
170+
│ demonstrations │ 0 │
171+
└────────────────────────┴──────────────────────────────────────────┘
172+
Running without parallelism util.py:74
173+
0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/1,200 [ 0:00:01 < -:--:-- , ? it/s ]
112174
```
113175

114-
Once the process is complete, a file `optimized_gsm8k.pdl` is written. This file contains the optimal configuration and is directly executable by the standard PDL interpreter.
176+
Once the process is complete, a file `optimized_gsm8k.pdl` is written in same directory as the source PDL file. This file contains the optimal configuration and is directly executable by the standard PDL interpreter. A log of the optimization process is written to `experiments/` by default.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
benchmark: gsm8k # Name our benchmark
2+
budget: null # Set a budget, can be number of iterations, or a duration string e.g. "2h"
3+
budget_growth: double # double validation set size each iteration
4+
# or to_max: reach max_test_set_size by final iteration
5+
initial_test_set_size: 2 # size of test set in first iteration
6+
max_test_set_size: 10 # maximum test set size
7+
num_candidates: 100 # how many candidates to evaluate
8+
num_demonstrations: 5 # how many demonstrations to include per candidate
9+
parallelism: 1 # how many threads to run evaluations across
10+
shuffle_test: false # shuffling of test set
11+
test_set_name: test # name of test set
12+
train_set_name: train # name of train set
13+
validation_set_name: validation # name of validation set
14+
demonstrations_variable_name: demonstrations # variable name to insert demonstrations into
15+
variables: # define discrete options to sample from
16+
model: # set ${ model } variable
17+
- watsonx/meta-llama/llama-3-2-3b-instruct
18+
prompt_pattern: # set ${ prompt_pattern } variable to one of these
19+
- cot
20+
- react
21+
- rewoo
22+
num_demonstrations: # overrides num demonstrations above
23+
- 0
24+
- 3
25+
- 5

mkdocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ nav:
4747
- API Reference: api_reference.md
4848
- Contribute: contrib.md
4949
- Viewer: viewer.md
50-
# - AutoPDL: autopdl.md # Hide documentation for now
50+
- AutoPDL: autopdl.md
5151

5252
# Define some IBM colors
5353
extra_css:

0 commit comments

Comments
 (0)