Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 11 additions & 127 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ PDL provides the following features:

The PDL interpreter takes a PDL program as input and generates data by executing its instructions (calling out to models, code, etc...).

See below for a quick reference, followed by [installation notes](#interpreter_installation) and an [overview](#overview) of the language. A more detailed description of the language features can be found in this [tutorial](https://ibm.github.io/prompt-declaration-language/tutorial).
See below for a quick reference, followed by [installation notes](#interpreter-installation) and an [overview](#overview) of the language. A more detailed description of the language features can be found in this [tutorial](https://ibm.github.io/prompt-declaration-language/tutorial).


## Quick Reference
Expand Down Expand Up @@ -194,7 +194,7 @@ The same example on Replicate:
```yaml
text:
- "Hello\n"
- model: replicate/ibm-granite/granite-3.2-8b-instruct
- model: replicate/ibm-granite/granite-3.2-8b-instruct
parameters:
stop: "!"
temperature: 0
Expand All @@ -215,24 +215,9 @@ See the [tutorial](https://ibm.github.io/prompt-declaration-language/tutorial) f
Consider now an example from AI for code, where we want to build a prompt template for code explanation. We have a JSON file as input
containing the source code and some information regarding the repository where it came from.

For example, given the data in this JSON [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/code/data.yaml):
For example, given the data in this JSON [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/examples/tutorial/programs/code/data.yaml):
```yaml
source_code:
|
@SuppressWarnings("unchecked")
public static Map<String, String> deserializeOffsetMap(String lastSourceOffset) throws IOException {
Map<String, String> offsetMap;
if (lastSourceOffset == null || lastSourceOffset.isEmpty()) {
offsetMap = new HashMap<>();
} else {
offsetMap = JSON_MAPPER.readValue(lastSourceOffset, Map.class);
}
return offsetMap;
}
repo_info:
repo: streamsets/datacollector
path: stagesupport/src/main/java/com/.../OffsetUtil.java
function_name: OffsetUtil.deserializeOffsetMap
--8<-- "./examples/tutorial/programs/code/data.yaml"
```

we would like to express the following prompt and submit it to an LLM:
Expand All @@ -259,28 +244,10 @@ public static Map<String, String> deserializeOffsetMap(String lastSourceOffset)
}
```

In PDL, this would be expressed as follows (see [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/code/code.pdl)):
In PDL, this would be expressed as follows (see [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/tutorial/programs/code/code.pdl)):

```yaml
description: Code explanation example
defs:
CODE:
read: ./data.yaml
parser: yaml
text:
- "\n${ CODE.source_code }\n"
- model: ollama/granite-code:8b
input: |
Here is some info about the location of the function in the repo.
repo:
${ CODE.repo_info.repo }
path: ${ CODE.repo_info.path }
Function_name: ${ CODE.repo_info.function_name }


Explain the following code:
```
${ CODE.source_code }```
--8<-- "./examples/tutorial/programs/code/code.pdl"
```

In this program we first define some variables using the `defs` construct. Here `CODE` is defined to be a new variable, holding the result of the `read` block that follows.
Expand Down Expand Up @@ -310,48 +277,10 @@ public static Map<String, String> deserializeOffsetMap(String lastSourceOffset)
The code is a Java method that takes a string `lastSourceOffset` as input and returns a `Map<String, String>`. The method uses the Jackson library to deserialize the JSON-formatted string into a map. If the input string is null or empty, an empty HashMap is returned. Otherwise, the string is deserialized into a Map using the `JSON_MAPPER.readValue()` method.
```

Notice that in PDL variables are used to templatize any entity in the document, not just textual prompts to LLMs. We can add a block to this document to evaluate the quality of the output using a similarity metric with respect to our [ground truth](https://github.com/IBM/prompt-declaration-language/blob/main/examples/code/ground_truth.txt). See [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/code/code-eval.pdl):
Notice that in PDL variables are used to templatize any entity in the document, not just textual prompts to LLMs. We can add a block to this document to evaluate the quality of the output using a similarity metric with respect to our [ground truth](https://github.com/IBM/prompt-declaration-language/blob/main/examples/tutorial/programs/code/ground_truth.txt). See [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/tutorial/programs/code/code-eval.pdl):

```yaml
description: Code explanation example
defs:
CODE:
read: ./data.yaml
parser: yaml
TRUTH:
read: ./ground_truth.txt
text:
- "\n${ CODE.source_code }\n"
- model: ollama/granite-code:8b
def: EXPLANATION
input: |
Here is some info about the location of the function in the repo.
repo:
${ CODE.repo_info.repo }
path: ${ CODE.repo_info.path }
Function_name: ${ CODE.repo_info.function_name }


Explain the following code:
```
${ CODE.source_code }```
- |


EVALUATION:
The similarity (Levenshtein) between this answer and the ground truth is:
- def: EVAL
lang: python
code: |
import textdistance
expl = """
${ EXPLANATION }
"""
truth = """
${ TRUTH }
"""
# (In PDL, set `result` to the output you wish for your code block.)
result = textdistance.levenshtein.normalized_similarity(expl, truth)
--8<-- "./examples/tutorial/programs/code/code-eval.pdl"
```

This program has an input block that reads the ground truth from filename `examples/code/ground_truth.txt` and assigns its contents to variable `TRUTH`. It also assigns the output of the model to the variable `EXPLANATION`, using a `def` construct. In PDL, any block can have a `def` to capture the result of that block in a variable. The last block is a call to Python code, which is included after the `code` field. Notice how code is included here simply as data. We collate fragments of Python with outputs obtained from previous blocks. This is one of the powerful features of PDL: the ability to specify the execution of code that is not known ahead of time. We can use LLMs to generate code that is later executed in the same programming model. This is made possible because PDL treats code as data, like any another part of the document.
Expand Down Expand Up @@ -385,55 +314,10 @@ PDL allows rapid prototyping of prompts by allowing the user to change prompts a
Finally, we can output JSON data as a result of this program, as follows:

```yaml
description: Code explanation example
defs:
CODE:
read: ./data.yaml
parser: yaml
TRUTH:
read: ./ground_truth.txt
text:
- model: ollama/granite-code:8b
def: EXPLANATION
contribute: []
input:
|
Here is some info about the location of the function in the repo.
repo:
${ CODE.repo_info.repo }
path: ${ CODE.repo_info.path }
Function_name: ${ CODE.repo_info.function_name }


Explain the following code:
```
${ CODE.source_code }```
parameters:
temperature: 0
- def: EVAL
contribute: []
lang: python
code:
|
import textdistance
expl = """
${ EXPLANATION }
"""
truth = """
${ TRUTH }
"""
# (In PDL, set `result` to the output you wish for your code block.)
result = textdistance.levenshtein.normalized_similarity(expl, truth)
- data:
input: ${ CODE }
output: ${ EXPLANATION }
metric: ${ EVAL }

```

The data block takes various variables and combines their values into a JSON object with fields `input`, `output`, and `metric`. We mute the output of all the other blocks with `contribute` set to `[]`. The `contribute` construct can be used to specify how the result of a block is contributed to the background context.
Setting it to `[]` means no contribution is made from this block to the background context. By default, the result of every block is contributed to the context. For the blocks in the program above, we use a `def` construct to save the intermediate result of each block.
--8<-- "./examples/tutorial/programs/code/code-json.pdl"
```

The data block takes various variables and combines their values into a JSON object with fields `input`, `output`, and `metric`.
The output of this program is the corresponding serialized JSON object, with the appropriate treatment of quotation marks. Similarly PDL can read jsonl files and create jsonl files by piping to a file.

## PDL Language Tutorial
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -374,10 +374,10 @@ In this program, we first define a query about the weather in some location (ass

## Data Block

PDL offers the ability to create JSON data as illustrated by the following example (described in detail in the [Overview](https://ibm.github.io/prompt-declaration-language/#overview) section). The `data` block can gather previously defined variables into a JSON structure. This feature is useful for data generation. Programs such as this one can be generalized to read jsonl files to generate data en masse by piping into another jsonl file ([file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/tutorial/programs/code-json.pdl)).
PDL offers the ability to create JSON data as illustrated by the following example (described in detail in the [Overview](https://ibm.github.io/prompt-declaration-language/#overview) section). The `data` block can gather previously defined variables into a JSON structure. This feature is useful for data generation. Programs such as this one can be generalized to read jsonl files to generate data en masse by piping into another jsonl file ([file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/tutorial/programs/code/code-json.pdl)).

```yaml
--8<-- "./examples/tutorial/programs/code-json.pdl"
--8<-- "./examples/tutorial/programs/code/code-json.pdl"
```

Notice that in the `data` block the values are interpreted as Jinja expressions. If values need to be PDL programs to be interpreted, then you need to use
Expand Down
39 changes: 0 additions & 39 deletions examples/code/code-json.pdl

This file was deleted.

39 changes: 0 additions & 39 deletions examples/tutorial/programs/code-json.pdl

This file was deleted.

File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,17 +1,12 @@
description: Code explanation example
defs:
# The variable `CODE` shall be the contents of the parsed YAML file
CODE:
read: ./data.yaml
parser: yaml
# The variable `TRUTH` shall be the text of the file
TRUTH:
read: ./ground_truth.txt
text:
# Print the source code to the console
- "\n${ CODE.source_code }\n"
# Use ollama to invoke a Granite model with a prompt. Output AND
# set the variable `EXPLANATION` to the output.
- model: ollama_chat/granite3.2:2b
def: EXPLANATION
input: |
Expand All @@ -26,14 +21,12 @@ text:
```
${ CODE.source_code }```
parameters:
# Use no LLM creativity. (Note that 0 is the default; this line has no effect)
temperature: 0
- |


EVALUATION:
The similarity (Levenshtein) between this answer and the ground truth is:
# We aren't only defining `EVAL`, we are also executing it.
- def: EVAL
lang: python
# (Use `pip install textdistance` if needed to install the textdistance package)
Expand Down
38 changes: 38 additions & 0 deletions examples/tutorial/programs/code/code-json.pdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
description: Code explanation example
defs:
CODE:
read: ./data.yaml
parser: yaml
TRUTH:
read: ./ground_truth.txt
EXPLANATION:
model: ollama_chat/granite3.2:2b
input:
|
Here is some info about the location of the function in the repo.
repo:
${ CODE.repo_info.repo }
path: ${ CODE.repo_info.path }
Function_name: ${ CODE.repo_info.function_name }


Explain the following code:
```
${ CODE.source_code }```
EVAL:
lang: python
code:
|
import textdistance
expl = """
${ EXPLANATION }
"""
truth = """
${ TRUTH }
"""
result = textdistance.levenshtein.normalized_similarity(expl, truth)
data:
input: ${ CODE }
output: ${ EXPLANATION }
metric: ${ EVAL }

Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,4 @@ text:
```
${ CODE.source_code }```
parameters:
# Use no LLM creativity. (Note that 0 is the default; this line has no effect)
temperature: 0
File renamed without changes.
6 changes: 3 additions & 3 deletions pdl-live-react/demos/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ no
in europe?
yes
EOF
pdl --trace "$T"/demo5.json "$TOP"/examples/talk/6-code-json.pdl
pdl --trace "$T"/demo6.json "$UI"/demos/error.pdl || true
pdl --trace "$T"/demo7.json "$TOP"/examples/talk/4-function.pdl
pdl --trace "$T"/demo5.json "$TOP"/examples/tutorial/programs/code/code-json.pdl
# pdl --trace "$T"/demo6.json "$UI"/demos/error.pdl || true
pdl --trace "$T"/demo7.json "$TOP"/examples/tutorial/function_definition.pdl
# TODO demo8...
pdl --stream none --trace "$T"/demo9.json "$TOP"/examples/gsm8k/gsm8.pdl
Loading