Skip to content

Commit f553053

Browse files
authored
docs: update AutoPDL docs (#1160)
Signed-off-by: Louis Mandel <[email protected]>
1 parent 235431c commit f553053

File tree

1 file changed

+22
-8
lines changed

1 file changed

+22
-8
lines changed

docs/autopdl.md

Lines changed: 22 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,28 +21,42 @@ cd prompt-declaration-language
2121
pip install -e '.[all]'
2222
```
2323

24-
### Writing a PDL program to optimize
24+
## Writing a PDL program to optimize
2525

2626
The first step in using AutoPDL is to write a PDL program that has free variables. Consider for example, the following PDL program, which queries an LLM to correct a sentence with grammatical errors ([file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/optimizer/grammar_correction.pdl)):
2727

28-
```yaml
28+
```yaml linenums="1"
2929
--8<-- "./examples/optimizer/grammar_correction.pdl"
3030
```
3131

3232
This program starts with a definition section. Note that a `defs` section is necessary. This is followed by a `lastOf` sequence (a list of blocks to be executed where the result of the last block is returned as the result of the whole sequence). First, the program establishes some demonstrations obtained from a `demonstrations` variable. The `for` loop at lines 5 to 10 ensures that all demonstrations are formatted in a consistent way. On lines 11 to 16 the program formulates a prompt to correct a sentence stored in variable `input`. Lines 17 through 21 show a model call where the model id is given by variable `model`. Finally, lines 23 through 28 check if variable `verify` is set to `true`. If so, it makes another model to verify the previous response and to produce a new one if needed.
3333

3434
Notice that variables `input`, `model`, `demonstrations`, `verify` are not defined. The first of these is an instance variable that will help in holding different instances when the optimizer is running. The rest of them are parameters to be optimized. We can pick among different models, different demonstrations, and especially different prompting patterns. PDL supports first-class functions, so the program could be made to pick the optimal function to be used, thereby choosing the prompting pattern. In this example, finding an optimal value for `verify` will determine whether it's best to call the model once or twice.
3535

36-
In addition to the PDL program, AutoPDL also needs a dataset and a loss function as inputs. These will be used to perform the optimization, and as a source of demonstrations. In this example, we can use [process_grammar_correction.py](https://github.com/IBM/prompt-declaration-language/blob/main/examples/optimizer/process_grammar_correction.py) to obtain a dataset split into train/validation/test. The train split will be used to draw instances and demonstrations, the validation for checking during the optimization, and test to evaluate and obtain a final score at the end of the optimization run.
3736

38-
To obtain this dataset simply run:
37+
38+
## Dataset
39+
40+
In addition to the PDL program, AutoPDL also needs a dataset. These will be used to perform the optimization, and as a source of demonstrations. The train split will be used to draw instances and demonstrations, the validation for checking during the optimization, and test to evaluate and obtain a final score at the end of the optimization run.
41+
42+
In this example, we need a dataset containing sentences with mistakes and the corrected version. We can use [process_grammar_correction.py](https://github.com/IBM/prompt-declaration-language/blob/main/examples/optimizer/process_grammar_correction.py) to obtain a dataset split into train/validation/test. Simply run:
43+
3944
```
4045
python process_grammar_correction.py
4146
```
4247

48+
## Loss function
49+
50+
The lost function is use to guide the optimizer towards the best solution and evaluate the final program. The loss function can must be a PDL function named `score` that takes as input the result of the program, the ground truth, and returns a floating point number.
51+
In our example, we are using the Levenshtein distance that we import from the `textdistance` Python module. The `score` function is defined in the [`eval_levenshtein.pdl` file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/optimizer/eval_levenshtein.pdl):
52+
53+
```yaml
54+
--8<-- "./examples/optimizer/eval_levenshtein.pdl"
55+
```
56+
4357
The final ingredient needed is a configuration file as explained in the next section.
4458

45-
### Writing a configuration file
59+
## Writing a configuration file
4660

4761
An AutoPDL configuration file describes the state-space and parameters for the search. In this example, the configuration is given in the following [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/optimizer/grammar_correction.yaml):
4862

@@ -59,12 +73,12 @@ Last but not least, `variables` indicates the domain of each variable that needs
5973

6074
Notice that variable `input` in the PDL program is not given a domain. This is because it will hold the different instances that will be evaluated (it was included in the `instance_columns` field).
6175

62-
For a complete list of available fields in the configuration file, see [file](https://github.com/IBM/prompt-declaration-language/blob/main/src/pdl/optimize/config_parser.py):
76+
For a complete list of available fields in the configuration file is given in the configuration parser [file](https://github.com/IBM/prompt-declaration-language/blob/main/src/pdl/optimize/config_parser.py).
6377

6478

6579
We are ready to run the optimizer!
6680

67-
### Running AutoPDL
81+
## Running AutoPDL
6882

6983
To run the optimizer, execute the following command:
7084

@@ -85,4 +99,4 @@ To run the optimized program, execute the command:
8599
pdl optimized_grammar_correction.pdl
86100
```
87101

88-
A log of the optimization process is written to experiments/ by default.
102+
A log of the optimization process is written to `experiments/` by default.

0 commit comments

Comments
 (0)