You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/autopdl.md
+22-8Lines changed: 22 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,28 +21,42 @@ cd prompt-declaration-language
21
21
pip install -e '.[all]'
22
22
```
23
23
24
-
###Writing a PDL program to optimize
24
+
## Writing a PDL program to optimize
25
25
26
26
The first step in using AutoPDL is to write a PDL program that has free variables. Consider for example, the following PDL program, which queries an LLM to correct a sentence with grammatical errors ([file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/optimizer/grammar_correction.pdl)):
This program starts with a definition section. Note that a `defs` section is necessary. This is followed by a `lastOf` sequence (a list of blocks to be executed where the result of the last block is returned as the result of the whole sequence). First, the program establishes some demonstrations obtained from a `demonstrations` variable. The `for` loop at lines 5 to 10 ensures that all demonstrations are formatted in a consistent way. On lines 11 to 16 the program formulates a prompt to correct a sentence stored in variable `input`. Lines 17 through 21 show a model call where the model id is given by variable `model`. Finally, lines 23 through 28 check if variable `verify` is set to `true`. If so, it makes another model to verify the previous response and to produce a new one if needed.
33
33
34
34
Notice that variables `input`, `model`, `demonstrations`, `verify` are not defined. The first of these is an instance variable that will help in holding different instances when the optimizer is running. The rest of them are parameters to be optimized. We can pick among different models, different demonstrations, and especially different prompting patterns. PDL supports first-class functions, so the program could be made to pick the optimal function to be used, thereby choosing the prompting pattern. In this example, finding an optimal value for `verify` will determine whether it's best to call the model once or twice.
35
35
36
-
In addition to the PDL program, AutoPDL also needs a dataset and a loss function as inputs. These will be used to perform the optimization, and as a source of demonstrations. In this example, we can use [process_grammar_correction.py](https://github.com/IBM/prompt-declaration-language/blob/main/examples/optimizer/process_grammar_correction.py) to obtain a dataset split into train/validation/test. The train split will be used to draw instances and demonstrations, the validation for checking during the optimization, and test to evaluate and obtain a final score at the end of the optimization run.
37
36
38
-
To obtain this dataset simply run:
37
+
38
+
## Dataset
39
+
40
+
In addition to the PDL program, AutoPDL also needs a dataset. These will be used to perform the optimization, and as a source of demonstrations. The train split will be used to draw instances and demonstrations, the validation for checking during the optimization, and test to evaluate and obtain a final score at the end of the optimization run.
41
+
42
+
In this example, we need a dataset containing sentences with mistakes and the corrected version. We can use [process_grammar_correction.py](https://github.com/IBM/prompt-declaration-language/blob/main/examples/optimizer/process_grammar_correction.py) to obtain a dataset split into train/validation/test. Simply run:
43
+
39
44
```
40
45
python process_grammar_correction.py
41
46
```
42
47
48
+
## Loss function
49
+
50
+
The lost function is use to guide the optimizer towards the best solution and evaluate the final program. The loss function can must be a PDL function named `score` that takes as input the result of the program, the ground truth, and returns a floating point number.
51
+
In our example, we are using the Levenshtein distance that we import from the `textdistance` Python module. The `score` function is defined in the [`eval_levenshtein.pdl` file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/optimizer/eval_levenshtein.pdl):
The final ingredient needed is a configuration file as explained in the next section.
44
58
45
-
###Writing a configuration file
59
+
## Writing a configuration file
46
60
47
61
An AutoPDL configuration file describes the state-space and parameters for the search. In this example, the configuration is given in the following [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/optimizer/grammar_correction.yaml):
48
62
@@ -59,12 +73,12 @@ Last but not least, `variables` indicates the domain of each variable that needs
59
73
60
74
Notice that variable `input` in the PDL program is not given a domain. This is because it will hold the different instances that will be evaluated (it was included in the `instance_columns` field).
61
75
62
-
For a complete list of available fields in the configuration file, see [file](https://github.com/IBM/prompt-declaration-language/blob/main/src/pdl/optimize/config_parser.py):
76
+
For a complete list of available fields in the configuration file is given in the configuration parser [file](https://github.com/IBM/prompt-declaration-language/blob/main/src/pdl/optimize/config_parser.py).
63
77
64
78
65
79
We are ready to run the optimizer!
66
80
67
-
###Running AutoPDL
81
+
## Running AutoPDL
68
82
69
83
To run the optimizer, execute the following command:
70
84
@@ -85,4 +99,4 @@ To run the optimized program, execute the command:
85
99
pdl optimized_grammar_correction.pdl
86
100
```
87
101
88
-
A log of the optimization process is written to experiments/ by default.
102
+
A log of the optimization process is written to `experiments/` by default.
0 commit comments