Skip to content

Commit e502e8e

Browse files
jgchnesnible
andauthored
Documentation formatting and contributing improvements (#521)
* Documentation formatting and contributing improvements Signed-off-by: Jing Chen <[email protected]> * Update viewer link Signed-off-by: Jing Chen <[email protected]> * Update docs/contrib.md Co-authored-by: Ed Snible <[email protected]> --------- Signed-off-by: Jing Chen <[email protected]> Signed-off-by: Jing Chen <[email protected]> Co-authored-by: Ed Snible <[email protected]>
1 parent 76babb0 commit e502e8e

File tree

3 files changed

+106
-49
lines changed

3 files changed

+106
-49
lines changed

docs/README.md

Lines changed: 46 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,12 @@ hide:
55
--- -->
66
# Prompt Declaration Language
77

8-
LLMs will continue to change the way we build software systems. They are not only useful as coding assistants, providing snippets of code, explanations, and code transformations, but they can also help replace components that could only previously be achieved with rule-based systems. Whether LLMs are used as coding assistants or software components, reliability remains an important concern. LLMs have a textual interface and the structure of useful prompts is not captured formally. Programming frameworks do not enforce or validate such structures since they are not specified in a machine-consumable way. The purpose of the Prompt Declaration Language (PDL) is to allow developers to specify the structure of prompts and to enforce it, while providing a unified programming framework for composing LLMs with rule-based systems.
8+
LLMs will continue to change the way we build software systems. They are not only useful as coding assistants, providing snippets of code, explanations, and code transformations, but they can also help replace components that could only previously be achieved with rule-based systems. Whether LLMs are used as coding assistants or software components, reliability remains an important concern. LLMs have a textual interface and the structure of useful prompts is not captured formally. Programming frameworks do not enforce or validate such structures since they are not specified in a machine-consumable way. The purpose of the Prompt Declaration Language (PDL) is to allow developers to specify the structure of prompts and to enforce it, while providing a unified programming framework for composing LLMs with rule-based systems.
99

1010
PDL is based on the premise that interactions between users, LLMs and rule-based systems form a *document*. Consider for example the interactions between a user and a chatbot. At each interaction, the exchanges form a document that gets longer and longer. Similarly, chaining models together or using tools for specific tasks result in outputs that together form a document. PDL allows users to specify the shape of data in such documents in a declarative way (in YAML), and is agnostic of any programming language. Because of its document-oriented nature, it can be used to easily express a variety of data generation tasks (inference, data synthesis, data generation for model training, etc...).
1111

1212
PDL provides the following features:
13+
1314
- Ability to use any LLM locally or remotely via [LiteLLM](https://www.litellm.ai/), including [IBM's watsonx](https://www.ibm.com/watsonx)
1415
- Ability to templatize not only prompts for one LLM call, but also composition of LLMs with tools (code and APIs). Templates can encompass tasks of larger granularity than a single LLM call
1516
- Control structures: variable definitions and use, conditionals, loops, functions
@@ -20,7 +21,7 @@ PDL provides the following features:
2021
- Support for chat APIs and chat templates
2122
- Live Document visualization
2223

23-
The PDL interpreter takes a PDL program as input and renders it into a document by execution its instructions (calling out to models, code, etc...).
24+
The PDL interpreter takes a PDL program as input and renders it into a document by execution its instructions (calling out to models, code, etc...).
2425

2526
See below for a quick reference, followed by [installation notes](#interpreter_installation) and an [overview](#overview) of the language. A more detailed description of the language features can be found in this [tutorial](https://ibm.github.io/prompt-declaration-language/tutorial).
2627

@@ -51,9 +52,11 @@ pip install 'prompt-declaration-language[examples]'
5152
Most examples in this repository use IBM Granite models on [Replicate](https://replicate.com/).
5253
In order to run these examples, you need to create a free account
5354
on Replicate, get an API key and store it in the environment variable:
54-
- `REPLICATE_API_TOKEN`
55+
56+
- `REPLICATE_API_KEY`
5557

5658
In order to use foundation models hosted on [watsonx](https://www.ibm.com/watsonx) via LiteLLM, you need a watsonx account (a free plan is available) and set up the following environment variables:
59+
5760
- `WATSONX_URL`, the API url (set to `https://{region}.ml.cloud.ibm.com`) of your watsonx instance. The region can be found by clicking in the upper right corner of the watsonx dashboard (for example a valid region is `us-south` ot `eu-gb`).
5861
- `WATSONX_APIKEY`, the API key (see information on [key creation](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui#create_user_key))
5962
- `WATSONX_PROJECT_ID`, the project hosting the resources (see information about [project creation](https://www.ibm.com/docs/en/watsonx/saas?topic=projects-creating-project) and [finding project ID](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-project-id.html?context=wx)).
@@ -84,7 +87,7 @@ The PDL repository has been configured so that every `*.pdl` file is associated
8487

8588
The interpreter executes Python code specified in PDL code blocks. To sandbox the interpreter for safe execution,
8689
you can use the `--sandbox` flag which runs the interpreter in a Docker-compatible container. Without this flag, the interpreter
87-
and all code is executed locally. To use the `--sandbox` flag, you need to have a Docker daemon running, such as
90+
and all code is executed locally. To use the `--sandbox` flag, you need to have a Docker daemon running, such as
8891
[Rancher Desktop](https://rancherdesktop.io).
8992

9093
The interpreter prints out a log by default in the file `log.txt`. This log contains the details of inputs and outputs to every block in the program. It is useful to examine this file when the program is behaving differently than expected. The log displays the exact prompts submitted to models by LiteLLM (after applying chat templates), which can be
@@ -108,10 +111,10 @@ This can also be done by passing a JSON or YAML file:
108111
pdl --data-file <JSON-or-YAML-file> <my-example>
109112
```
110113

111-
The interpreter can also output a trace file that is used by the Live Document visualization tool (see [Live Document](#live_document)):
114+
The interpreter can also output a trace file that is used by the Live Document visualization tool (see [Live Document](#live-document-visualizer)):
112115

113116
```
114-
pdl --trace <file.json> <my-example>
117+
pdl --trace <file.json> <my-example>
115118
```
116119

117120
For more information:
@@ -181,19 +184,19 @@ containing the source code and some information regarding the repository where i
181184

182185
For example, given the data in this JSON [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/code/data.yaml):
183186
```yaml
184-
source_code:
187+
source_code:
185188
|
186189
@SuppressWarnings("unchecked")
187190
public static Map<String, String> deserializeOffsetMap(String lastSourceOffset) throws IOException {
188191
Map<String, String> offsetMap;
189-
if (lastSourceOffset == null || lastSourceOffset.isEmpty()) {
190-
offsetMap = new HashMap<>();
192+
if (lastSourceOffset == null || lastSourceOffset.isEmpty()) {
193+
offsetMap = new HashMap<>();
191194
} else {
192-
offsetMap = JSON_MAPPER.readValue(lastSourceOffset, Map.class);
195+
offsetMap = JSON_MAPPER.readValue(lastSourceOffset, Map.class);
193196
}
194197
return offsetMap;
195198
}
196-
repo_info:
199+
repo_info:
197200
repo: streamsets/datacollector
198201
path: stagesupport/src/main/java/com/.../OffsetUtil.java
199202
function_name: OffsetUtil.deserializeOffsetMap
@@ -203,7 +206,7 @@ we would like to express the following prompt and submit it to an LLM:
203206

204207
```
205208
Here is some info about the location of the function in the repo.
206-
repo:
209+
repo:
207210
streamsets/datacollector
208211
path: stagesupport/src/main/java/com/.../OffsetUtil.java
209212
Function_name: OffsetUtil.deserializeOffsetMap
@@ -237,7 +240,7 @@ text:
237240
input:
238241
- |
239242
Here is some info about the location of the function in the repo.
240-
repo:
243+
repo:
241244
${ CODE.repo_info.repo }
242245
path: ${ CODE.repo_info.path }
243246
Function_name: ${ CODE.repo_info.function_name }
@@ -263,10 +266,10 @@ When we execute this program with the PDL interpreter, we obtain the following t
263266
@SuppressWarnings("unchecked")
264267
public static Map<String, String> deserializeOffsetMap(String lastSourceOffset) throws IOException {
265268
Map<String, String> offsetMap;
266-
if (lastSourceOffset == null || lastSourceOffset.isEmpty()) {
267-
offsetMap = new HashMap<>();
269+
if (lastSourceOffset == null || lastSourceOffset.isEmpty()) {
270+
offsetMap = new HashMap<>();
268271
} else {
269-
offsetMap = JSON_MAPPER.readValue(lastSourceOffset, Map.class);
272+
offsetMap = JSON_MAPPER.readValue(lastSourceOffset, Map.class);
270273
}
271274
return offsetMap;
272275
}
@@ -302,7 +305,7 @@ text:
302305
def: EXPLANATION
303306
input: |
304307
Here is some info about the location of the function in the repo.
305-
repo:
308+
repo:
306309
${ CODE.repo_info.repo }
307310
path: ${ CODE.repo_info.path }
308311
Function_name: ${ CODE.repo_info.function_name }
@@ -340,10 +343,10 @@ When we execute this new program, we obtain the following:
340343
@SuppressWarnings("unchecked")
341344
public static Map<String, String> deserializeOffsetMap(String lastSourceOffset) throws IOException {
342345
Map<String, String> offsetMap;
343-
if (lastSourceOffset == null || lastSourceOffset.isEmpty()) {
344-
offsetMap = new HashMap<>();
346+
if (lastSourceOffset == null || lastSourceOffset.isEmpty()) {
347+
offsetMap = new HashMap<>();
345348
} else {
346-
offsetMap = JSON_MAPPER.readValue(lastSourceOffset, Map.class);
349+
offsetMap = JSON_MAPPER.readValue(lastSourceOffset, Map.class);
347350
}
348351
return offsetMap;
349352
}
@@ -386,7 +389,7 @@ text:
386389
input:
387390
|
388391
Here is some info about the location of the function in the repo.
389-
repo:
392+
repo:
390393
${ CODE.repo_info.repo }
391394
path: ${ CODE.repo_info.path }
392395
Function_name: ${ CODE.repo_info.function_name }
@@ -409,7 +412,7 @@ text:
409412
"""
410413
# (In PDL, set `result` to the output you wish for your code block.)
411414
result = textdistance.levenshtein.normalized_similarity(expl, truth)
412-
- data:
415+
- data:
413416
input: ${ CODE }
414417
output: ${ EXPLANATION }
415418
metric: ${ EVAL }
@@ -432,46 +435,51 @@ PDL has a Live Document visualizer to help in program understanding given an exe
432435
To produce an execution trace consumable by the Live Document, you can run the interpreter with the `--trace` argument:
433436
434437
```
435-
pdl --trace <file.json> <my-example>
438+
pdl --trace <file.json> <my-example>
436439
```
437440
438-
This produces an additional file named `my-example_trace.json` that can be uploaded to the [Live Document](https://ibm.github.io/prompt-declaration-language/viewer/) visualizer tool. Clicking on different parts of the Live Document will show the PDL code that produced that part
439-
in the right pane.
441+
This produces an additional file named `my-example_trace.json` that can be uploaded to the [Live Document](https://ibm.github.io/prompt-declaration-language/viewer/) visualizer tool. Clicking on different parts of the Live Document will show the PDL code that produced that part
442+
in the right pane.
440443
441444
This is similar to a spreadsheet for tabular data, where data is in the forefront and the user can inspect the formula that generates the data in each cell. In the Live Document, cells are not uniform but can take arbitrary extents. Clicking on them similarly reveals the part of the code that produced them.
442445
443446
## Best Practices
444447
445448
1. **Template Organization**:
446-
- Keep templates modular and reusable
447-
- Use variables for dynamic content
448-
- Document template purpose and requirements
449+
- Keep templates modular and reusable
450+
- Use variables for dynamic content
451+
- Document template purpose and requirements
449452
450453
2. **Error Handling**:
451-
- Validate model inputs/outputs
452-
- Include fallback logic
453-
- Log intermediate results
454+
455+
- Validate model inputs/outputs
456+
- Include fallback logic
457+
- Log intermediate results
454458
455459
3. **Performance**:
456-
- Cache frequent LLM calls
457-
- Use appropriate temperature settings
458-
- Implement retry logic for API calls
460+
461+
- Cache frequent LLM calls
462+
- Use appropriate temperature settings
463+
- Implement retry logic for API calls
459464
460465
4. **Security**:
461-
- Enabling sandbox mode for untrusted code
462-
- Validate all inputs
463-
- Follow API key best practices
466+
467+
- Enabling sandbox mode for untrusted code
468+
- Validate all inputs
469+
- Follow API key best practices
464470
465471
466472
## Additional Notes
467473
468474
When using Granite models, we use the following defaults for model parameters (except `granite-20b-code-instruct-r1.1`):
475+
469476
- `decoding_method`: `greedy`, (`temperature`: 0)
470477
- `max_new_tokens`: 1024
471478
- `min_new_tokens`: 1
472479
- `repetition_penalty`: 1.05
473-
480+
474481
Also if the `decoding_method` is `sample`, then the following defaults are used:
482+
475483
- `temperature`: 0.7
476484
- `top_p`: 0.85
477485
- `top_k`: 50

docs/contrib.md

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Contributing to PDL
22

3-
You can report issues or open a pull request (PR) to suggest changes.
3+
You can report issues or open a pull request (PR) to suggest changes.
44

55
## Reporting an issue
66

@@ -27,3 +27,50 @@ Also, please make sure that your changes pass static checks such as code styles
2727
```
2828
pre-commit run --all-files
2929
```
30+
31+
## Development environment
32+
33+
### PDL development
34+
35+
Follow the following instructions to set up a dev environment to get started with contributing to PDL.
36+
37+
1. Create a fork of https://github.com/IBM/prompt-declaration-language
38+
2. Clone your fork
39+
3. Set up a Python virtual environment and install dependencies
40+
41+
```
42+
cd prompt-declaration-language
43+
python -m venv .venv
44+
source .venv/bin/activate
45+
pip install -e .
46+
```
47+
48+
4. Test that you can run an editable version of PDL
49+
50+
```
51+
pdl examples/hello/hello.pdl
52+
53+
Hello
54+
Hello! How can I help you today?
55+
```
56+
57+
You are all set!
58+
59+
### Documentation updates
60+
61+
When you make changes to PDL, ensure to document any new features in the docs section. You can serve the docs locally to preview changes.
62+
63+
Install the required dependencies for documentation.
64+
65+
```
66+
pip install mkdocs-get-deps
67+
pip install $(mkdocs-get-deps)
68+
```
69+
70+
Then serve the docs to load a preview.
71+
72+
```
73+
mkdocs serve
74+
```
75+
76+
You are all set!

0 commit comments

Comments
 (0)