pdl notebook (#142)

vazirim · web-flow · commit 897163b459c8 · 2024-10-15T16:06:10.000-04:00
Signed-off-by: Mandana Vaziri &lt;mvaziri@us.ibm.com&gt;
diff --git a/examples/notebooks/pdl.ipynb b/examples/notebooks/pdl.ipynb
@@ -0,0 +1,392 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "50f4ce4a",
+   "metadata": {},
+   "source": [
+    "# Prompt Declaration Language\n",
+    "\n",
+    "Prompt engineering is difficult: minor variations in prompts have large impacts on the output of LLMs and prompts are model-dependent. In recent years <i> prompt programming languages </i> have emerged to bring discipline to prompt engineering. Many of them are embedded in an imperative language such as Python or TypeScript, making it difficult for users to directly interact with prompts and multi-turn LLM interactions.\n",
+    "\n",
+    "The Prompt Declaration Language (PDL) is a YAML-based declarative approach to prompt programming, where prompts are at the forefront. PDL facilitates model chaining and tool use, abstracting away the plumbing necessary for such compositions, enables type checking of the input and output of models, and is based on LiteLLM to support a variety of model providers. PDL has been used with RAG, CoT, ReAct, and an agent for solving SWE-bench. PDL is [open-source](https://github.com/IBM/prompt-declaration-language) and works well with watsonx.ai and Granite models.\n",
+    "\n",
+    "You can use PDL stand-alone or from a Python SDK or, as shown here, in a notebook via a notebook extension. In the cell output, model-generated text is rendered in green font, and tool-generated text is rendered in purple font."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bfc303da",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! pip install prompt-declaration-language\n",
+    "! pip install 'prompt-declaration-language[examples]'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "e25a6874-54d9-4167-82ed-ab2f4fdc0a6f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%load_ext pdl.pdl_notebook_ext"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2234ce9",
+   "metadata": {},
+   "source": [
+    "## Model call\n",
+    "\n",
+    "In PDL, the user specifies step-by-step the shape of data they want to generate. In the following, the `text` construct indicates a text block containing a prompt and a model call. Implicitly, PDL builds a background conversational context (list of role/content) which is used to make model calls. Each model call uses the context built so far as its input prompt."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "f3c62df1-0347-4711-acd7-3892cfd5df30",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "What is the meaning of life?\n",
+      "\u001b[32mThe meaning of life is a philosophical question that has been debated by many thinkers throughout history. There is no one definitive answer, as the answer may vary depending on one's personal beliefs, values, and experiences.\n",
+      "\u001b[0m"
+     ]
+    }
+   ],
+   "source": [
+    "%%pdl --reset-context\n",
+    "description: Model call\n",
+    "text: \n",
+    "- \"What is the meaning of life?\\n\"\n",
+    "- model: replicate/ibm-granite/granite-8b-code-instruct-128k\n",
+    "  parameters:\n",
+    "    stop_sequences: \"!\"\n",
+    "    include_stop_sequence: true"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9d405f8",
+   "metadata": {},
+   "source": [
+    "## Model chaining\n",
+    "Model chaining can be done by simply adding to the list of models to call declaratively. Since this cell has the `%%pdl` cell magic without `--reset-context`, it executes in the context created by the previous cell."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "d7149b3f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Say it like a poem\n",
+      "\u001b[32mThe meaning of life, a question so profound,\n",
+      "A mystery that has puzzled men and women for so long,\n",
+      "A path that we must tread, a goal to reach,\n",
+      "A journey that will bring us joy and pain,\n",
+      "\n",
+      "A road that twists and turns, a fork in the road,\u001b[0m\u001b[32m\n",
+      "Where we must choose, which way to go,\n",
+      "A decision that we must make, with our souls at stake,\n",
+      "A choice that will shape our destiny,\n",
+      "\n",
+      "The meaning of life, a question so grand,\n",
+      "A goal that we must strive for, to find,\n",
+      "A purpose that gives our hearts meaning,\n",
+      "A reason to live, a\u001b[0m\u001b[32m reason to die,\n",
+      "\n",
+      "A journey that will take us far, a journey that will bring,\n",
+      "A new understanding of the world we live in,\n",
+      "A new perspective on life, a new way of thinking,\n",
+      "A new path to follow, a new way to live,\n",
+      "\n",
+      "The meaning of life, a question so deep,\n",
+      "A mystery that will never be solved,\n",
+      "A journey that will\u001b[0m\u001b[32m take us far, a journey that will bring,\n",
+      "A new understanding of the world we live in,\n",
+      "\n",
+      "A road that twists and turns, a fork in the road,\n",
+      "Where we must choose, which way to go,\n",
+      "A decision that we must make, with our souls at stake,\n",
+      "A choice that will shape our destiny\u001b[0m\u001b[32m,\n",
+      "\n",
+      "The meaning of life, a question so grand,\n",
+      "A goal that we must strive for, to find,\n",
+      "A purpose that gives our hearts meaning,\n",
+      "A reason to live, a reason to die,\n",
+      "\n",
+      "A journey that will take us far, a journey that will bring,\n",
+      "A new understanding of the world we live in,\n",
+      "A new perspective on life, a new way of thinking,\n",
+      "A new\u001b[0m\u001b[32m path to follow, a new way to live,\n",
+      "\n",
+      "The meaning of life, a question so deep,\n",
+      "A mystery that will never be solved,\n",
+      "A journey that will take us far, a journey that will bring,\n",
+      "A new understanding of the world we live in,\n",
+      "\n",
+      "A road that twists and turns, a fork in the road,\n",
+      "Where we must choose, which way to go\u001b[0m\u001b[32m,\n",
+      "A decision that we must make, with our souls at stake,\n",
+      "A choice that will shape our destiny,\n",
+      "\n",
+      "The meaning of life, a question\u001b[0m\n",
+      "\n",
+      "What is the most important verse in this poem?\n",
+      "\u001b[32mThe most important verse in this poem is the first one: \"The meaning of life, a question so profound.\" This line sets the tone for the entire poem and emphasizes the central theme of the question of what gives life meaning. It also highlights the idea that the answer to this question is not straightforward\u001b[0m\u001b[32m and may vary depending on one's personal beliefs and experiences.\n",
+      "\u001b[0m"
+     ]
+    }
+   ],
+   "source": [
+    "%%pdl\n",
+    "text:\n",
+    "- \"\\nSay it like a poem\\n\"\n",
+    "- model: replicate/ibm-granite/granite-8b-code-instruct-128k\n",
+    "- \"\\n\\nWhat is the most important verse in this poem?\\n\"\n",
+    "- model: replicate/ibm-granite/granite-8b-code-instruct-128k"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86d5a0e1-606e-400a-90ac-6aa650e2eb1e",
+   "metadata": {},
+   "source": [
+    "## Chat templates\n",
+    "\n",
+    "The following example shows a full-fledged chatbot. In PDL roles are high level annotations and PDL takes care of applying the appropriate chat templates. This example illustrates the use of control structures such as the repeat-until block and reading from files or stdin with the read block. The chatbot repeatedly prompts the user for a query, which it submits to a model, and stops when the query is quit."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "455b2dbc-69fb-4164-9b8b-5817b3f33e9b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "You are Granite, an AI language model developed by IBM in 2024. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.Type `quit` to exit this chatbot.\n"
+     ]
+    },
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      ">>>  What is APR?\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\u001b[32mAPR stands for Annual Percentage Rate. It is a measure of the total cost of borrowing money, including interest and fees, expressed as a yearly rate. It is commonly used in the lending industry to compare the cost of different loans and credit products.\n",
+      "\u001b[0m\n",
+      "\n"
+     ]
+    },
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      ">>>  Say it like I'm 5 years old\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\u001b[32mThe meaning of life is like a big, big, big question mark. It's a question that has been asked for as long as people can remember, and it's still a question that people don't always know the answer to. Some people think the answer is to have fun and make friends, while others think the answer is to work hard\u001b[0m\u001b[32m and be smart. But no matter what the answer is, the question of what gives life meaning is a question that will always be with us.\n",
+      "\u001b[0m\n",
+      "\n"
+     ]
+    },
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      ">>>  quit\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\u001b[32mThank you for chatting with me! If you have any more questions or need further assistance, feel free to ask.\n",
+      "\u001b[0m"
+     ]
+    }
+   ],
+   "source": [
+    "%%pdl\n",
+    "text:\n",
+    "- role: system\n",
+    "  content: You are Granite, an AI language model developed by IBM in 2024. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.\n",
+    "- \"Type `quit` to exit this chatbot.\\n\"\n",
+    "- repeat:\n",
+    "    text:\n",
+    "    - read:\n",
+    "      message: \">>> \"\n",
+    "      def: query\n",
+    "      contribute: [context]\n",
+    "    - model: replicate/ibm-granite/granite-8b-code-instruct-128k\n",
+    "  until: ${ query == 'quit'}\n",
+    "  join:\n",
+    "    with: \"\\n\\n\"\n",
+    "role: user\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e1bc1a2",
+   "metadata": {},
+   "source": [
+    "## Chat templates\n",
+    "\n",
+    "The first call to the model in the above program submits the following prompt. PDL takes care of applying the appropriate chat templates and tags, and builds the background context implicitly. Chat templates make your program easier to port across models, since you do not need to specify control tokens by hand. All the user has to do is list the models they want to chain, PDL takes care of the rest.\n",
+    "\n",
+    "```\n",
+    "<|start_of_role|>system<|end_of_role|>You are Granite, an AI language model developed by IBM in 2024. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.<|end_of_text|>\n",
+    "<|start_of_role|>user<|end_of_role|>Type `quit` to exit this chatbot.\n",
+    "What is APR?<|end_of_text|><|start_of_role|>assistant<|end_of_role|>\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cfa00cfc",
+   "metadata": {},
+   "source": [
+    "## Data pipeline\n",
+    "\n",
+    "The following program shows a common prompting pattern: read some data, formulate a prompt using that data, submit to a model, and evaluate. In this program, we formulate a prompt for code explanation. The program first defines two variables: `code`, which holds the data we read, and `truth` for the ground truth. It then prints out the source code, formulates a prompts with the data, and calls a model to get an explanation. Finally, a Python code block uses the Levenshtein text distance metric and evaluate the explanation against the ground truth. This pipeline can similarly be applied to an entire data set to produce a jsonl file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "7f6c323b-ad1a-4434-8732-bc19c5c47883",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "@SuppressWarnings(\"unchecked\")\n",
+      "public static Map<String, String> deserializeOffsetMap(String lastSourceOffset) throws IOException {\n",
+      "  Map<String, String> offsetMap;\n",
+      "  if (lastSourceOffset == null || lastSourceOffset.isEmpty()) {    \n",
+      "    offsetMap = new HashMap<>();  \n",
+      "  } else {\n",
+      "    offsetMap = JSON_MAPPER.readValue(lastSourceOffset, Map.class);  \n",
+      "  }\n",
+      "  return offsetMap;\n",
+      "}\n",
+      "\n",
+      "\u001b[32mThe function `deserializeOffsetMap` is a method that takes a string `lastSourceOffset` as input and returns a `Map` of `String` keys and `String` values. The function is used to deserialize a JSON string into a `Map` object.\n",
+      " \n",
+      "\n",
+      " The function first checks if the `lastSourceOffset` is null or empty.\u001b[0m\u001b[32m If it is, it creates a new `HashMap` object and assigns it to the `offsetMap` variable. If the `lastSourceOffset` is not null or empty, the function uses the `JSON_MAPPER` object to read the JSON string and deserialize it into a `Map` object. The `JSON_MAPPER` object is assumed to be a pre-defined\u001b[0m\u001b[32m object that is used for JSON serialization and deserialization.\n",
+      " \n",
+      "\n",
+      " Finally, the function returns the `offsetMap` object.\n",
+      "\u001b[0m\n",
+      "\n",
+      "EVALUATION:\n",
+      "The similarity (Levenshtein) between this answer and the ground truth is:\n",
+      "\u001b[35m0.31163434903047094\u001b[0m"
+     ]
+    }
+   ],
+   "source": [
+    "%%pdl --reset-context\n",
+    "description: Code explanation example\n",
+    "defs:\n",
+    "  CODE:\n",
+    "    read: ./data.yaml\n",
+    "    parser: yaml\n",
+    "  TRUTH:\n",
+    "    read: ./ground_truth.txt\n",
+    "text:\n",
+    "- \"\\n${ CODE.source_code }\\n\"\n",
+    "- model: replicate/ibm-granite/granite-8b-code-instruct-128k\n",
+    "  def: EXPLANATION\n",
+    "  input: |\n",
+    "      Here is some info about the location of the function in the repo.\n",
+    "      repo: \n",
+    "      ${ CODE.repo_info.repo }\n",
+    "      path: ${ CODE.repo_info.path }\n",
+    "      Function_name: ${ CODE.repo_info.function_name }\n",
+    "\n",
+    "\n",
+    "      Explain the following code:\n",
+    "      ```\n",
+    "      ${ CODE.source_code }```\n",
+    "- |\n",
+    "\n",
+    "\n",
+    "  EVALUATION:\n",
+    "  The similarity (Levenshtein) between this answer and the ground truth is:\n",
+    "- def: EVAL\n",
+    "  lang: python\n",
+    "  code: |\n",
+    "    import textdistance\n",
+    "    expl = \"\"\"\n",
+    "    ${ EXPLANATION }\n",
+    "    \"\"\"\n",
+    "    truth = \"\"\"\n",
+    "    ${ TRUTH }\n",
+    "    \"\"\"\n",
+    "    result = textdistance.levenshtein.normalized_similarity(expl, truth)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "61c40266",
+   "metadata": {},
+   "source": [
+    "## Conclusion\n",
+    "\n",
+    "Since prompts are at the forefront, PDL makes users more productive in their trial-and-error with LLMs. Try it!\n",
+    "\n",
+    "https://github.com/IBM/prompt-declaration-language"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}