Skip to content

Commit 7390f7c

Browse files
authored
docs: link the tutorial to the code (#987)
Signed-off-by: Louis Mandel <[email protected]>
1 parent f3dfd30 commit 7390f7c

32 files changed

+63
-229
lines changed

docs/README.md

Lines changed: 11 additions & 127 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ PDL provides the following features:
2323

2424
The PDL interpreter takes a PDL program as input and generates data by executing its instructions (calling out to models, code, etc...).
2525

26-
See below for a quick reference, followed by [installation notes](#interpreter_installation) and an [overview](#overview) of the language. A more detailed description of the language features can be found in this [tutorial](https://ibm.github.io/prompt-declaration-language/tutorial).
26+
See below for a quick reference, followed by [installation notes](#interpreter-installation) and an [overview](#overview) of the language. A more detailed description of the language features can be found in this [tutorial](https://ibm.github.io/prompt-declaration-language/tutorial).
2727

2828

2929
## Quick Reference
@@ -194,7 +194,7 @@ The same example on Replicate:
194194
```yaml
195195
text:
196196
- "Hello\n"
197-
- model: replicate/ibm-granite/granite-3.2-8b-instruct
197+
- model: replicate/ibm-granite/granite-3.2-8b-instruct
198198
parameters:
199199
stop: "!"
200200
temperature: 0
@@ -215,24 +215,9 @@ See the [tutorial](https://ibm.github.io/prompt-declaration-language/tutorial) f
215215
Consider now an example from AI for code, where we want to build a prompt template for code explanation. We have a JSON file as input
216216
containing the source code and some information regarding the repository where it came from.
217217

218-
For example, given the data in this JSON [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/code/data.yaml):
218+
For example, given the data in this JSON [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/examples/tutorial/programs/code/data.yaml):
219219
```yaml
220-
source_code:
221-
|
222-
@SuppressWarnings("unchecked")
223-
public static Map<String, String> deserializeOffsetMap(String lastSourceOffset) throws IOException {
224-
Map<String, String> offsetMap;
225-
if (lastSourceOffset == null || lastSourceOffset.isEmpty()) {
226-
offsetMap = new HashMap<>();
227-
} else {
228-
offsetMap = JSON_MAPPER.readValue(lastSourceOffset, Map.class);
229-
}
230-
return offsetMap;
231-
}
232-
repo_info:
233-
repo: streamsets/datacollector
234-
path: stagesupport/src/main/java/com/.../OffsetUtil.java
235-
function_name: OffsetUtil.deserializeOffsetMap
220+
--8<-- "./examples/tutorial/programs/code/data.yaml"
236221
```
237222

238223
we would like to express the following prompt and submit it to an LLM:
@@ -259,28 +244,10 @@ public static Map<String, String> deserializeOffsetMap(String lastSourceOffset)
259244
}
260245
```
261246

262-
In PDL, this would be expressed as follows (see [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/code/code.pdl)):
247+
In PDL, this would be expressed as follows (see [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/tutorial/programs/code/code.pdl)):
263248

264249
```yaml
265-
description: Code explanation example
266-
defs:
267-
CODE:
268-
read: ./data.yaml
269-
parser: yaml
270-
text:
271-
- "\n${ CODE.source_code }\n"
272-
- model: ollama/granite-code:8b
273-
input: |
274-
Here is some info about the location of the function in the repo.
275-
repo:
276-
${ CODE.repo_info.repo }
277-
path: ${ CODE.repo_info.path }
278-
Function_name: ${ CODE.repo_info.function_name }
279-
280-
281-
Explain the following code:
282-
```
283-
${ CODE.source_code }```
250+
--8<-- "./examples/tutorial/programs/code/code.pdl"
284251
```
285252

286253
In this program we first define some variables using the `defs` construct. Here `CODE` is defined to be a new variable, holding the result of the `read` block that follows.
@@ -310,48 +277,10 @@ public static Map<String, String> deserializeOffsetMap(String lastSourceOffset)
310277
The code is a Java method that takes a string `lastSourceOffset` as input and returns a `Map<String, String>`. The method uses the Jackson library to deserialize the JSON-formatted string into a map. If the input string is null or empty, an empty HashMap is returned. Otherwise, the string is deserialized into a Map using the `JSON_MAPPER.readValue()` method.
311278
```
312279

313-
Notice that in PDL variables are used to templatize any entity in the document, not just textual prompts to LLMs. We can add a block to this document to evaluate the quality of the output using a similarity metric with respect to our [ground truth](https://github.com/IBM/prompt-declaration-language/blob/main/examples/code/ground_truth.txt). See [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/code/code-eval.pdl):
280+
Notice that in PDL variables are used to templatize any entity in the document, not just textual prompts to LLMs. We can add a block to this document to evaluate the quality of the output using a similarity metric with respect to our [ground truth](https://github.com/IBM/prompt-declaration-language/blob/main/examples/tutorial/programs/code/ground_truth.txt). See [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/tutorial/programs/code/code-eval.pdl):
314281

315282
```yaml
316-
description: Code explanation example
317-
defs:
318-
CODE:
319-
read: ./data.yaml
320-
parser: yaml
321-
TRUTH:
322-
read: ./ground_truth.txt
323-
text:
324-
- "\n${ CODE.source_code }\n"
325-
- model: ollama/granite-code:8b
326-
def: EXPLANATION
327-
input: |
328-
Here is some info about the location of the function in the repo.
329-
repo:
330-
${ CODE.repo_info.repo }
331-
path: ${ CODE.repo_info.path }
332-
Function_name: ${ CODE.repo_info.function_name }
333-
334-
335-
Explain the following code:
336-
```
337-
${ CODE.source_code }```
338-
- |
339-
340-
341-
EVALUATION:
342-
The similarity (Levenshtein) between this answer and the ground truth is:
343-
- def: EVAL
344-
lang: python
345-
code: |
346-
import textdistance
347-
expl = """
348-
${ EXPLANATION }
349-
"""
350-
truth = """
351-
${ TRUTH }
352-
"""
353-
# (In PDL, set `result` to the output you wish for your code block.)
354-
result = textdistance.levenshtein.normalized_similarity(expl, truth)
283+
--8<-- "./examples/tutorial/programs/code/code-eval.pdl"
355284
```
356285

357286
This program has an input block that reads the ground truth from filename `examples/code/ground_truth.txt` and assigns its contents to variable `TRUTH`. It also assigns the output of the model to the variable `EXPLANATION`, using a `def` construct. In PDL, any block can have a `def` to capture the result of that block in a variable. The last block is a call to Python code, which is included after the `code` field. Notice how code is included here simply as data. We collate fragments of Python with outputs obtained from previous blocks. This is one of the powerful features of PDL: the ability to specify the execution of code that is not known ahead of time. We can use LLMs to generate code that is later executed in the same programming model. This is made possible because PDL treats code as data, like any another part of the document.
@@ -385,55 +314,10 @@ PDL allows rapid prototyping of prompts by allowing the user to change prompts a
385314
Finally, we can output JSON data as a result of this program, as follows:
386315

387316
```yaml
388-
description: Code explanation example
389-
defs:
390-
CODE:
391-
read: ./data.yaml
392-
parser: yaml
393-
TRUTH:
394-
read: ./ground_truth.txt
395-
text:
396-
- model: ollama/granite-code:8b
397-
def: EXPLANATION
398-
contribute: []
399-
input:
400-
|
401-
Here is some info about the location of the function in the repo.
402-
repo:
403-
${ CODE.repo_info.repo }
404-
path: ${ CODE.repo_info.path }
405-
Function_name: ${ CODE.repo_info.function_name }
406-
407-
408-
Explain the following code:
409-
```
410-
${ CODE.source_code }```
411-
parameters:
412-
temperature: 0
413-
- def: EVAL
414-
contribute: []
415-
lang: python
416-
code:
417-
|
418-
import textdistance
419-
expl = """
420-
${ EXPLANATION }
421-
"""
422-
truth = """
423-
${ TRUTH }
424-
"""
425-
# (In PDL, set `result` to the output you wish for your code block.)
426-
result = textdistance.levenshtein.normalized_similarity(expl, truth)
427-
- data:
428-
input: ${ CODE }
429-
output: ${ EXPLANATION }
430-
metric: ${ EVAL }
431-
432-
```
433-
434-
The data block takes various variables and combines their values into a JSON object with fields `input`, `output`, and `metric`. We mute the output of all the other blocks with `contribute` set to `[]`. The `contribute` construct can be used to specify how the result of a block is contributed to the background context.
435-
Setting it to `[]` means no contribution is made from this block to the background context. By default, the result of every block is contributed to the context. For the blocks in the program above, we use a `def` construct to save the intermediate result of each block.
317+
--8<-- "./examples/tutorial/programs/code/code-json.pdl"
318+
```
436319

320+
The data block takes various variables and combines their values into a JSON object with fields `input`, `output`, and `metric`.
437321
The output of this program is the corresponding serialized JSON object, with the appropriate treatment of quotation marks. Similarly PDL can read jsonl files and create jsonl files by piping to a file.
438322

439323
## PDL Language Tutorial

docs/tutorial.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -374,10 +374,10 @@ In this program, we first define a query about the weather in some location (ass
374374

375375
## Data Block
376376

377-
PDL offers the ability to create JSON data as illustrated by the following example (described in detail in the [Overview](https://ibm.github.io/prompt-declaration-language/#overview) section). The `data` block can gather previously defined variables into a JSON structure. This feature is useful for data generation. Programs such as this one can be generalized to read jsonl files to generate data en masse by piping into another jsonl file ([file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/tutorial/programs/code-json.pdl)).
377+
PDL offers the ability to create JSON data as illustrated by the following example (described in detail in the [Overview](https://ibm.github.io/prompt-declaration-language/#overview) section). The `data` block can gather previously defined variables into a JSON structure. This feature is useful for data generation. Programs such as this one can be generalized to read jsonl files to generate data en masse by piping into another jsonl file ([file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/tutorial/programs/code/code-json.pdl)).
378378

379379
```yaml
380-
--8<-- "./examples/tutorial/programs/code-json.pdl"
380+
--8<-- "./examples/tutorial/programs/code/code-json.pdl"
381381
```
382382

383383
Notice that in the `data` block the values are interpreted as Jinja expressions. If values need to be PDL programs to be interpreted, then you need to use

examples/code/code-json.pdl

Lines changed: 0 additions & 39 deletions
This file was deleted.

examples/tutorial/programs/code-json.pdl

Lines changed: 0 additions & 39 deletions
This file was deleted.
File renamed without changes.

examples/code/code-eval.pdl renamed to examples/tutorial/programs/code/code-eval.pdl

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,12 @@
11
description: Code explanation example
22
defs:
3-
# The variable `CODE` shall be the contents of the parsed YAML file
43
CODE:
54
read: ./data.yaml
65
parser: yaml
7-
# The variable `TRUTH` shall be the text of the file
86
TRUTH:
97
read: ./ground_truth.txt
108
text:
11-
# Print the source code to the console
129
- "\n${ CODE.source_code }\n"
13-
# Use ollama to invoke a Granite model with a prompt. Output AND
14-
# set the variable `EXPLANATION` to the output.
1510
- model: ollama_chat/granite3.2:2b
1611
def: EXPLANATION
1712
input: |
@@ -26,14 +21,12 @@ text:
2621
```
2722
${ CODE.source_code }```
2823
parameters:
29-
# Use no LLM creativity. (Note that 0 is the default; this line has no effect)
3024
temperature: 0
3125
- |
3226

3327

3428
EVALUATION:
3529
The similarity (Levenshtein) between this answer and the ground truth is:
36-
# We aren't only defining `EVAL`, we are also executing it.
3730
- def: EVAL
3831
lang: python
3932
# (Use `pip install textdistance` if needed to install the textdistance package)
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
description: Code explanation example
2+
defs:
3+
CODE:
4+
read: ./data.yaml
5+
parser: yaml
6+
TRUTH:
7+
read: ./ground_truth.txt
8+
EXPLANATION:
9+
model: ollama_chat/granite3.2:2b
10+
input:
11+
|
12+
Here is some info about the location of the function in the repo.
13+
repo:
14+
${ CODE.repo_info.repo }
15+
path: ${ CODE.repo_info.path }
16+
Function_name: ${ CODE.repo_info.function_name }
17+
18+
19+
Explain the following code:
20+
```
21+
${ CODE.source_code }```
22+
EVAL:
23+
lang: python
24+
code:
25+
|
26+
import textdistance
27+
expl = """
28+
${ EXPLANATION }
29+
"""
30+
truth = """
31+
${ TRUTH }
32+
"""
33+
result = textdistance.levenshtein.normalized_similarity(expl, truth)
34+
data:
35+
input: ${ CODE }
36+
output: ${ EXPLANATION }
37+
metric: ${ EVAL }
38+

examples/code/code.pdl renamed to examples/tutorial/programs/code/code.pdl

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,5 +21,4 @@ text:
2121
```
2222
${ CODE.source_code }```
2323
parameters:
24-
# Use no LLM creativity. (Note that 0 is the default; this line has no effect)
2524
temperature: 0
File renamed without changes.

0 commit comments

Comments
 (0)