Skip to content

Commit 911720f

Browse files
authored
[Release] Docs Agent version 0.1.6 (google#182)
* [Release] Docs Agent version 0.1.4 What's changed: - Bug fix: Update `docs_agent.py` to remove a no-longer-in-use error message variable. - Clean up the prompt format in Docs Agent - Context is added to prompts as Markdown - Remove extra new lines in context - Add extra new lines after the instruction and before the question in prompts. - Remove the warning message in the Chroma module, which as displayed when launching the chatbot. - Update the embeddings diagrams in the main `README` file. - Minor updates in the main `README` file. * [Release] Docs Agent version 0.1.5 What's changed: - Update the `poetry.lock` file to bump up the version of `werkzeug` - Update the introduction paragraph of the main `README` file. - Add a new diagram on Docs Agent's pre-processing flow of various doc types. - Update the `README` file in the `script` directory to include the diagram. * [Release] Docs Agent version 0.1.6 What's changed: - Update the prompt condition to be more specific and follow best practices in prompting. - Enable the chatbot server to provide a custom condition string to the DocsAgent class. - Bug fix: Provide a custom condition when asking for 5 related questions to the PaLM model. - Add a new config variable to specify the log level to be "VERBOSE" - Improve the rendering of code text and code blocks on the chat app UI. - Rephrase the sentence that describes the page, section, and subsection structure of Markdown pages in the `markdown_to_plain_text.py` script. - Update the pre-processing diagram to fix a typo (`appscripts` to `apps_script`)
1 parent 5ed2855 commit 911720f

File tree

11 files changed

+143
-58
lines changed

11 files changed

+143
-58
lines changed

demos/palm/python/docs-agent/README.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ and is required that you have access to Google’s [PaLM API][genai-doc-site].
2424
Keep in mind that this approach does not involve “fine-tuning” an LLM (large language model).
2525
Instead, the Docs Agent sample app uses a mixture of prompt engineering and embedding techniques,
2626
also known as Retrieval Augmented Generation (RAG), on top of a publicly available LLM model
27-
like PaLM 2.
27+
like PaLM 2.
2828

2929
![Docs Agent architecture](docs/images/docs-agent-architecture-01.png)
3030

@@ -210,10 +210,10 @@ by the PaLM model:
210210
- Additional condition (for fact-checking):
211211

212212
```
213-
Can you compare the text below to the context provided
214-
in this prompt above and write a short message that warns the readers about
215-
which part of the text they should consider fact-checking? (Please keep your
216-
response concise and focus on only one important item.)"
213+
Can you compare the text below to the information provided in this prompt above
214+
and write a short message that warns the readers about which part of the text they
215+
should consider fact-checking? (Please keep your response concise and focus on only
216+
one important item.)"
217217
```
218218

219219
- Previously generated response
@@ -266,8 +266,7 @@ The following is the exact structure of this prompt:
266266
- Condition:
267267

268268
```
269-
You are a helpful chatbot answering questions from users. Read the following context first
270-
and answer the question at the end:
269+
Read the context below and answer the question at the end:
271270
```
272271

273272
- Context:
@@ -578,8 +577,10 @@ To customize settings in the Docs Agent chat app, do the following:
578577
condition for your custom dataset, for example:
579578

580579
```
581-
condition_text: "You are a helpful chatbot answering questions from developers working on
582-
Flutter apps. Read the following context first and answer the question at the end:"
580+
condition_text: "You are a helpful chatbot answering questions from **Flutter app developers**.
581+
Read the context below first and answer the user's question at the end.
582+
In your answer, provide a summary in three or five sentences. (BUT DO NOT USE
583+
ANY INFORMATION YOU KNOW ABOUT THE WORLD.)"
583584
```
584585

585586
### 2. Launch the Docs Agent chat app

demos/palm/python/docs-agent/chatbot/chatui.py

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
json,
2626
)
2727
import markdown
28+
import markdown.extensions.fenced_code
2829
from bs4 import BeautifulSoup
2930
import urllib
3031
import os
@@ -145,22 +146,31 @@ def ask_model(question):
145146
query_result = docs_agent.query_vector_store(question)
146147
context = query_result.fetch_formatted(Format.CONTEXT)
147148
context_with_instruction = docs_agent.add_instruction_to_context(context)
148-
response = docs_agent.ask_text_model_with_context(context_with_instruction, question)
149+
response = docs_agent.ask_text_model_with_context(
150+
context_with_instruction, question
151+
)
149152

150153
### PROMPT 2: FACT-CHECK THE PREVIOUS RESPONSE.
151154
fact_checked_response = docs_agent.ask_text_model_to_fact_check(
152155
context_with_instruction, response
153156
)
154157

155158
### PROMPT 3: GET 5 RELATED QUESTIONS.
156-
# 1. Prepare a new question asking the model to come up with 5 related questions.
157-
# 2. Ask the language model with the new question.
158-
# 3. Parse the model's response into a list in HTML format.
159+
# 1. Use the response from Prompt 1 as context and add a custom condition.
160+
# 2. Prepare a new question asking the model to come up with 5 related questions.
161+
# 3. Ask the language model with the new question.
162+
# 4. Parse the model's response into a list in HTML format.
163+
new_condition = "Read the context below and answer the user's question at the end."
164+
new_context_with_instruction = docs_agent.add_custom_instruction_to_context(
165+
new_condition, response
166+
)
159167
new_question = (
160168
"What are 5 questions developers might ask after reading the context?"
161169
)
162170
new_response = markdown.markdown(
163-
docs_agent.ask_text_model_with_context(response, new_question)
171+
docs_agent.ask_text_model_with_context(
172+
new_context_with_instruction, new_question
173+
)
164174
)
165175
related_questions = parse_related_questions_response_to_html_list(new_response)
166176

@@ -181,8 +191,8 @@ def ask_model(question):
181191
# - Convert the fact-check response from the model into HTML for rendering.
182192
# - A workaround to get the server's URL to work with the rewrite and like features.
183193
new_uuid = uuid.uuid1()
184-
context_in_html = markdown.markdown(context)
185-
response_in_html = markdown.markdown(response)
194+
context_in_html = markdown.markdown(context, extensions=["fenced_code"])
195+
response_in_html = markdown.markdown(response, extensions=["fenced_code"])
186196
fact_checked_response_in_html = markdown.markdown(fact_checked_response)
187197
server_url = request.url_root.replace("http", "https")
188198

demos/palm/python/docs-agent/chatbot/static/css/style.css

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,11 @@ li {
6767
margin: 0 0 0.3em;
6868
}
6969

70+
code {
71+
font-family: math;
72+
color: darkgreen;
73+
}
74+
7075
/* ======= Style layout by ID ======= */
7176

7277
#callout-box {

demos/palm/python/docs-agent/chroma.py

Lines changed: 23 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -47,17 +47,25 @@ def __init__(self, chroma_dir) -> None:
4747
def list_collections(self):
4848
return self.client.list_collections()
4949

50-
def get_collection(self, name, embedding_function=None):
50+
def get_collection(self, name, embedding_function=None, embedding_model=None):
5151
if embedding_function is not None:
5252
return ChromaCollection(
5353
self.client.get_collection(name, embedding_function=embedding_function),
5454
embedding_function,
5555
)
5656
# Read embedding meta information from the collection
5757
collection = self.client.get_collection(name, lambda x: None)
58-
embedding_model = None
59-
if collection.metadata:
58+
if embedding_model is None and collection.metadata:
6059
embedding_model = collection.metadata.get("embedding_model", None)
60+
if embedding_model is None:
61+
# If embedding_model is not found in the metadata,
62+
# use `models/embedding-gecko-001` by default.
63+
logging.info(
64+
"Embedding model is not specified in the metadata of "
65+
"the collection %s. Using the default PaLM embedding model.",
66+
name,
67+
)
68+
embedding_model = "models/embedding-gecko-001"
6169

6270
if embedding_model == "local/all-mpnet-base-v2":
6371
base_dir = os.path.dirname(os.path.abspath(__file__))
@@ -67,24 +75,19 @@ def get_collection(self, name, embedding_function=None):
6775
model_name=local_model_dir
6876
)
6977
)
70-
elif embedding_model is None or embedding_model == "palm/embedding-gecko-001":
71-
if embedding_model is None:
72-
logging.info(
73-
"Embedding model is not specified in the metadata of "
74-
"the collection %s. Using the default PaLM embedding model.",
75-
name,
76-
)
77-
palm = PaLM(embed_model="models/embedding-gecko-001", find_models=False)
78-
# We can not redefine embedding_function with def and
79-
# have to assign a lambda to it
80-
# pylint: disable-next=unnecessary-lambda-assignment
81-
embedding_function = lambda texts: [palm.embed(text) for text in texts]
82-
8378
else:
84-
raise ChromaEmbeddingModelNotSupportedError(
85-
f"Embedding model {embedding_model} specified by collection {name} "
86-
"is not supported."
87-
)
79+
print("Embedding model: " + str(embedding_model))
80+
try:
81+
palm = PaLM(embed_model=embedding_model, find_models=False)
82+
# We cannot redefine embedding_function with def and
83+
# have to assign a lambda to it
84+
# pylint: disable-next=unnecessary-lambda-assignment
85+
embedding_function = lambda texts: [palm.embed(text) for text in texts]
86+
except:
87+
raise ChromaEmbeddingModelNotSupportedError(
88+
f"Embedding model {embedding_model} specified by collection {name} "
89+
"is not supported."
90+
)
8891

8992
return ChromaCollection(
9093
self.client.get_collection(name, embedding_function=embedding_function),

demos/palm/python/docs-agent/config.yaml

Lines changed: 25 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,16 @@
1616

1717
### Configuration for Docs Agent ###
1818

19+
### PaLM environment
20+
#
21+
# api_endpoint: The PaLM API endpoint used by Docs Agent.
22+
#
23+
# embedding_model: The PaLM embedding model used to generate embeddings.
24+
#
25+
api_endpoint: "generativelanguage.googleapis.com"
26+
embedding_model: "models/embedding-gecko-001"
27+
28+
1929
### Docs Agent environment
2030
#
2131
# product_name: The name of your product to appears on the chatbot UI.
@@ -31,10 +41,14 @@
3141
# collection_name: The name used to identify a dataset collection by
3242
# the Chroma vector database.
3343
#
44+
# log_level: The verbosity level of logs printed on the terminal
45+
# by the chatbot app: NORMAL or VERBOSE
46+
#
3447
product_name: "My product"
3548
output_path: "data/plain_docs"
3649
vector_db_dir: "vector_stores/chroma"
3750
collection_name: "docs_collection"
51+
log_level: "NORMAL"
3852

3953

4054
### Documentation sources
@@ -70,14 +84,17 @@ input:
7084
# model_error_message: The error message returned to the user when language
7185
# models are unable to provide responses.
7286
#
73-
condition_text: "You are a helpful chatbot answering questions from users. Read
74-
the following context first and answer the question at the end:"
87+
condition_text: "You are a helpful chatbot answering questions from users.
88+
Read the context below first and answer the user's question at the end.
89+
In your answer, provide a summary in three or five sentences. (BUT DO NOT USE
90+
ANY INFORMATION YOU KNOW ABOUT THE WORLD.)"
7591

76-
fact_check_question: "Can you compare the text below to the context provided
77-
in this prompt above and write a short message that warns the readers about
78-
which part of the text they should consider fact-checking? (Please keep your
79-
response concise and focus on only one important item.)"
92+
fact_check_question: "Can you compare the text below to the information
93+
provided in this prompt above and write a short message that warns the readers
94+
about which part of the text they should consider fact-checking? (Please keep
95+
your response concise, focus on only one important item, but DO NOT USE BOLD
96+
TEXT IN YOUR RESPONSE.)"
8097

81-
model_error_message: "PaLM is not able to answer this question at the
82-
moment. Rephrase the question and try asking again."
98+
model_error_message: "PaLM is not able to answer this question at the moment.
99+
Rephrase the question and try asking again."
83100

43.8 KB
Loading
13.4 KB
Loading

demos/palm/python/docs-agent/docs_agent.py

Lines changed: 38 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,15 +34,16 @@
3434

3535
# Select your PaLM API endpoint.
3636
PALM_API_ENDPOINT = "generativelanguage.googleapis.com"
37-
38-
palm = PaLM(api_key=API_KEY, api_endpoint=PALM_API_ENDPOINT)
39-
40-
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
37+
EMBEDDING_MODEL = None
4138

4239
# Set up the path to the chroma vector database.
40+
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
4341
LOCAL_VECTOR_DB_DIR = os.path.join(BASE_DIR, "vector_stores/chroma")
4442
COLLECTION_NAME = "docs_collection"
4543

44+
# Set the log level for the DocsAgent class: NORMAL or VERBOSE
45+
LOG_LEVEL = "NORMAL"
46+
4647
IS_CONFIG_FILE = True
4748
if IS_CONFIG_FILE:
4849
config_values = read_config.ReadConfig()
@@ -51,10 +52,16 @@
5152
CONDITION_TEXT = config_values.returnConfigValue("condition_text")
5253
FACT_CHECK_QUESTION = config_values.returnConfigValue("fact_check_question")
5354
MODEL_ERROR_MESSAGE = config_values.returnConfigValue("model_error_message")
55+
LOG_LEVEL = config_values.returnConfigValue("log_level")
56+
PALM_API_ENDPOINT = config_values.returnConfigValue("api_endpoint")
57+
EMBEDDING_MODEL = config_values.returnConfigValue("embedding_model")
5458

5559
# Select the number of contents to be used for providing context.
5660
NUM_RETURNS = 5
5761

62+
# Initialize the PaLM instance.
63+
palm = PaLM(api_key=API_KEY, api_endpoint=PALM_API_ENDPOINT)
64+
5865

5966
class DocsAgent:
6067
"""DocsAgent class"""
@@ -65,7 +72,9 @@ def __init__(self):
6572
"Using the local vector database created at %s", LOCAL_VECTOR_DB_DIR
6673
)
6774
self.chroma = Chroma(LOCAL_VECTOR_DB_DIR)
68-
self.collection = self.chroma.get_collection(COLLECTION_NAME)
75+
self.collection = self.chroma.get_collection(
76+
COLLECTION_NAME, embedding_model=EMBEDDING_MODEL
77+
)
6978
# Update PaLM's custom prompt strings
7079
self.prompt_condition = CONDITION_TEXT
7180
self.fact_check_question = FACT_CHECK_QUESTION
@@ -74,6 +83,9 @@ def __init__(self):
7483
# Use this method for talking to PaLM (Text)
7584
def ask_text_model_with_context(self, context, question):
7685
new_prompt = f"{context}\n\nQuestion: {question}"
86+
# Print the prompt for debugging if the log level is VERBOSE.
87+
if LOG_LEVEL == "VERBOSE":
88+
self.print_the_prompt(new_prompt)
7789
try:
7890
response = palm.generate_text(
7991
prompt=new_prompt,
@@ -119,3 +131,24 @@ def add_instruction_to_context(self, context):
119131
new_context = ""
120132
new_context += self.prompt_condition + "\n\n" + context
121133
return new_context
134+
135+
# Add custom instruction as a prefix to the context
136+
def add_custom_instruction_to_context(self, condition, context):
137+
new_context = ""
138+
new_context += condition + "\n\n" + context
139+
return new_context
140+
141+
# Generate an embedding given text input
142+
def generate_embedding(self, text):
143+
return palm.embed(text)
144+
145+
# Print the prompt on the terminal for debugging
146+
def print_the_prompt(self, prompt):
147+
print("#########################################")
148+
print("# PROMPT #")
149+
print("#########################################")
150+
print(prompt)
151+
print("#########################################")
152+
print("# END OF PROMPT #")
153+
print("#########################################")
154+
print("\n")

demos/palm/python/docs-agent/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "docs-agent"
3-
version = "0.1.5"
3+
version = "0.1.6"
44
description = ""
55
authors = ["Docs Agent contributors"]
66
readme = "README.md"

demos/palm/python/docs-agent/scripts/markdown_to_plain_text.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@ def process_page_and_section_titles(markdown_text):
192192
new_line = (
193193
'# The "'
194194
+ page_title
195-
+ '" page contains the following content:\n\n'
195+
+ '" page includes the following information:\n'
196196
)
197197

198198
if section_title:
@@ -201,7 +201,7 @@ def process_page_and_section_titles(markdown_text):
201201
+ page_title
202202
+ '" page has the "'
203203
+ section_title
204-
+ '" section that contains the following content:\n'
204+
+ '" section that includes the following information:\n'
205205
)
206206

207207
if subsection_title:
@@ -212,7 +212,7 @@ def process_page_and_section_titles(markdown_text):
212212
+ section_title
213213
+ '" section has the "'
214214
+ subsection_title
215-
+ '" subsection that contains the following content:\n'
215+
+ '" subsection that includes the following information:\n'
216216
)
217217

218218
if skip_this_line is False:

0 commit comments

Comments
 (0)