-
Notifications
You must be signed in to change notification settings - Fork 5.1k
refactor/move-prompts-to-jinja-templates #2164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
f69861d
eda6292
e25468c
4063767
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| { | ||
| "question": "'What is the deductible for the employee plan for a visit to Overlake in Bellevue?' | ||
|
|
||
| Sources: | ||
| info1.txt: deductibles depend on whether you are in-network or out-of-network. In-network deductibles are $500 for employee and $1000 for family. Out-of-network deductibles are $1000 for employee and $2000 for family. | ||
| info2.pdf: Overlake is in-network for the employee plan. | ||
| info3.pdf: Overlake is the name of the area that includes a park and ride near Bellevue. | ||
| info4.pdf: In-network institutions include Overlake, Swedish and others in the region.", | ||
| "answer": "In-network deductibles are $500 for employee and $1000 for family [info1.txt] and Overlake is in-network for the employee plan [info2.pdf][info4.pdf]." | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. | ||
| Use 'you' to refer to the individual asking the questions even if they ask with 'I'. | ||
| Answer the following question using only the data provided in the sources below. | ||
| Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. | ||
| If you cannot answer using the sources below, say you don't know. Use below example to answer | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you install pre-commit, it should fix the new lines. See CONTRIBUTING.md for installation instructions.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not entirely sure what you mean here. Could you clarify?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I notice that several files don't have newlines at the ends of the file, which usually means that the pre-commit hasn't run, as the pre-commit hooks fix that issue (and others). There are instructions for installing pre-commit hooks here: |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| You are an intelligent assistant helping analyze the Annual Financial Report of Contoso Ltd., The documents contain text, graphs, tables and images. | ||
| Each image source has the file name in the top left corner of the image with coordinates (10,10) pixels and is in the format SourceFileName:<file_name>. | ||
| Each text source starts in a new line and has the file name followed by colon and the actual information. | ||
| Always include the source name from the image or text for each fact you use in the response in the format: [filename]. | ||
| Answer the following question using only the data provided in the sources below. | ||
| The text and image source can be the same file name, don't use the image title when citing the image source, only use the file name as mentioned. | ||
| If you cannot answer using the sources below, say you don't know. Return just the answer without any input texts. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| Generate 3 very brief follow-up questions that the user would likely ask next. | ||
| Enclose the follow-up questions in double angle brackets. Example: | ||
| <<Are there exclusions for prescriptions?>> | ||
| <<Which pharmacies can be ordered from?>> | ||
| <<What is the limit for over-the-counter medication?>> | ||
| Do not repeat questions that have already been asked. | ||
| Make sure the last question ends with ">>". |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| [ | ||
| { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting, I hadn't considered making the few shots a Jinja as well. Do you imagine passing template variables into it?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes more sense indeed. I converted it to .jinja initially to stay consistent with the file extensions in the prompts folder. I've now updated these to .json. |
||
| "role": "user", | ||
| "content": "How did crypto do last year?" | ||
| }, | ||
| { | ||
| "role": "assistant", | ||
| "content": "Summarize Cryptocurrency Market Dynamics from last year" | ||
| }, | ||
| { | ||
| "role": "user", | ||
| "content": "What are my health plans?" | ||
| }, | ||
| { | ||
| "role": "assistant", | ||
| "content": "Show available health plans" | ||
| } | ||
| ] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| Below is a history of the conversation so far, and a new question asked by the user that needs to be answered by searching in a knowledge base. | ||
| You have access to Azure AI Search index with 100's of documents. | ||
| Generate a search query based on the conversation and the new question. | ||
| Do not include cited source filenames and document names e.g. info.txt or doc.pdf in the search query terms. | ||
| Do not include any text inside [] or <<>> in the search query terms. | ||
| Do not include any special characters like '+'. | ||
| If the question is not in English, translate the question to English before generating the search query. | ||
| If you cannot generate a search query, return just the number 0. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| Assistant helps the company employees with their healthcare plan questions, and questions about the employee handbook. Be brief in your answers. | ||
| Answer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. Do not generate answers that don't use the sources below. If asking a clarifying question to the user would help, ask the question. | ||
| If the question is not in English, answer in the language used in the question. | ||
| Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. Use square brackets to reference the source, for example [info1.txt]. Don't combine sources, list each source separately, for example [info1.txt][info2.pdf]. | ||
| {{ follow_up_questions_prompt }} | ||
| {{ injected_prompt }} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| You are an intelligent assistant helping analyze the Annual Financial Report of Contoso Ltd., The documents contain text, graphs, tables and images. | ||
| Each image source has the file name in the top left corner of the image with coordinates (10,10) pixels and is in the format SourceFileName:<file_name> | ||
| Each text source starts in a new line and has the file name followed by colon and the actual information | ||
| Always include the source name from the image or text for each fact you use in the response in the format: [filename] | ||
| Answer the following question using only the data provided in the sources below. | ||
| If asking a clarifying question to the user would help, ask the question. | ||
| Be brief in your answers. | ||
| The text and image source can be the same file name, don't use the image title when citing the image source, only use the file name as mentioned | ||
| If you cannot answer using the sources below, say you don't know. Return just the answer without any input texts. | ||
| {follow_up_questions_prompt} | ||
| {injected_prompt} | ||
|
|
||
| "You are an intelligent assistant helping analyze the Annual Financial Report of Contoso Ltd., The documents contain text, graphs, tables and images. " | ||
|
||
| + "Each image source has the file name in the top left corner of the image with coordinates (10,10) pixels and is in the format SourceFileName:<file_name> " | ||
| + "Each text source starts in a new line and has the file name followed by colon and the actual information " | ||
| + "Always include the source name from the image or text for each fact you use in the response in the format: [filename] " | ||
| + "Answer the following question using only the data provided in the sources below. " | ||
| + "The text and image source can be the same file name, don't use the image title when citing the image source, only use the file name as mentioned " | ||
| + "If you cannot answer using the sources below, say you don't know. Return just the answer without any input texts " | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,14 @@ | ||
| from typing import Any, Optional | ||
| import json | ||
| import re | ||
| import ast | ||
|
|
||
| from azure.search.documents.aio import SearchClient | ||
| from azure.search.documents.models import VectorQuery | ||
| from openai import AsyncOpenAI | ||
| from openai.types.chat import ChatCompletionMessageParam | ||
| from openai_messages_token_helper import build_messages, get_token_limit | ||
| from jinja2 import Environment, FileSystemLoader | ||
|
|
||
| from approaches.approach import Approach, ThoughtStep | ||
| from core.authentication import AuthenticationHelper | ||
|
|
@@ -17,26 +21,6 @@ class RetrieveThenReadApproach(Approach): | |
| (answer) with that prompt. | ||
| """ | ||
|
|
||
| system_chat_template = ( | ||
| "You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. " | ||
| + "Use 'you' to refer to the individual asking the questions even if they ask with 'I'. " | ||
| + "Answer the following question using only the data provided in the sources below. " | ||
| + "Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. " | ||
| + "If you cannot answer using the sources below, say you don't know. Use below example to answer" | ||
| ) | ||
|
|
||
| # shots/sample conversation | ||
| question = """ | ||
| 'What is the deductible for the employee plan for a visit to Overlake in Bellevue?' | ||
|
|
||
| Sources: | ||
| info1.txt: deductibles depend on whether you are in-network or out-of-network. In-network deductibles are $500 for employee and $1000 for family. Out-of-network deductibles are $1000 for employee and $2000 for family. | ||
| info2.pdf: Overlake is in-network for the employee plan. | ||
| info3.pdf: Overlake is the name of the area that includes a park and ride near Bellevue. | ||
| info4.pdf: In-network institutions include Overlake, Swedish and others in the region | ||
| """ | ||
| answer = "In-network deductibles are $500 for employee and $1000 for family [info1.txt] and Overlake is in-network for the employee plan [info2.pdf][info4.pdf]." | ||
|
|
||
| def __init__( | ||
| self, | ||
| *, | ||
|
|
@@ -68,6 +52,36 @@ def __init__( | |
| self.query_speller = query_speller | ||
| self.chatgpt_token_limit = get_token_limit(chatgpt_model, self.ALLOW_NON_GPT_MODELS) | ||
|
|
||
| self._initialize_templates() | ||
|
|
||
| def _initialize_templates(self): | ||
| self.env = Environment(loader=FileSystemLoader('approaches/prompts/ask')) | ||
| self.system_chat_template = self.env.get_template('system_message.jinja').render() | ||
| self.few_shots = self._process_few_shots() | ||
|
|
||
| def _process_few_shots(self) -> Any: | ||
|
||
| raw_few_shots = self.env.get_template('few_shots.jinja').render() | ||
| json_str_few_shots = self._clean_json_string(raw_few_shots) | ||
| processed_few_shots = self._escape_newlines_in_json(json_str_few_shots) | ||
| return json.loads(processed_few_shots) | ||
|
|
||
| @staticmethod | ||
| def _clean_json_string(json_str: str) -> str: | ||
| return re.sub(r'^\s+|\s+$', '', json_str) | ||
|
|
||
| @staticmethod | ||
| def _escape_newlines_in_json(json_str: str) -> str: | ||
| in_string = False | ||
| result = [] | ||
| for char in json_str: | ||
| if char == '"' and (not result or result[-1] != '\\'): | ||
| in_string = not in_string | ||
| elif char == '\n' and in_string: | ||
| result.append('\\n') | ||
| continue | ||
| result.append(char) | ||
| return ''.join(result) | ||
|
|
||
| async def run( | ||
| self, | ||
| messages: list[ChatCompletionMessageParam], | ||
|
|
@@ -118,7 +132,7 @@ async def run( | |
| updated_messages = build_messages( | ||
| model=self.chatgpt_model, | ||
| system_prompt=overrides.get("prompt_template", self.system_chat_template), | ||
| few_shots=[{"role": "user", "content": self.question}, {"role": "assistant", "content": self.answer}], | ||
| few_shots=[{"role": "user", "content": self.few_shots["question"]}, {"role": "assistant", "content": self.few_shots["answer"]}], | ||
| new_user_content=user_content, | ||
| max_tokens=self.chatgpt_token_limit - response_token_limit, | ||
| fallback_to_default=self.ALLOW_NON_GPT_MODELS, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NO_RESPONSE can stay as a class variable, no? There's no particular need for it to be an instance variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I reverted the change.