kyolee415
diff --git a/‎demos/palm/python/docs-agent/README.md‎
Lines changed: 48 additions & 1 deletion b/‎demos/palm/python/docs-agent/README.md‎
Lines changed: 48 additions & 1 deletion
diff --git a/‎demos/palm/python/docs-agent/aqa.py‎
Lines changed: 127 additions & 0 deletions b/‎demos/palm/python/docs-agent/aqa.py‎
Lines changed: 127 additions & 0 deletions
diff --git a/‎demos/palm/python/docs-agent/chatbot/chatui.py‎
Lines changed: 14 additions & 2 deletions b/‎demos/palm/python/docs-agent/chatbot/chatui.py‎
Lines changed: 14 additions & 2 deletions
diff --git a/‎demos/palm/python/docs-agent/chatbot/static/css/style.css‎
Lines changed: 10 additions & 0 deletions b/‎demos/palm/python/docs-agent/chatbot/static/css/style.css‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎demos/palm/python/docs-agent/chatbot/static/javascript/app.js‎
Lines changed: 13 additions & 2 deletions b/‎demos/palm/python/docs-agent/chatbot/static/javascript/app.js‎
Lines changed: 13 additions & 2 deletions
diff --git a/‎demos/palm/python/docs-agent/chatbot/templates/chatui/result.html‎
Lines changed: 12 additions & 0 deletions b/‎demos/palm/python/docs-agent/chatbot/templates/chatui/result.html‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎demos/palm/python/docs-agent/config.yaml‎
Lines changed: 11 additions & 2 deletions b/‎demos/palm/python/docs-agent/config.yaml‎
Lines changed: 11 additions & 2 deletions
@@ -130,6 +130,11 @@ The following list summarizes the tasks and features of the Docs Agent sample ap
   Apps Script to convert Google Docs, PDF, and Gmail into Markdown files, which then
   can be used as input datasets for Docs Agent. For more information, see the
   [`README`][apps-script-readme] file in the `apps_script` directory.
+- **Use Gemini's Semantic Retrieval API and AQA model**: You can set up Docs Agent
+  to use Gemini's [Semantic Retrieval API][semantic-api] and [AQA model][aqa-model].
+  This API enables you to upload your source documents online, instead of using
+  a local vector database, and use Gemini's `aqa` model that is specifically
+  created for question-answering.
 
 ## Flow of events
 
@@ -314,6 +319,41 @@ use these Markdown files as additional input sources for Docs Agent. For more in
 
 **Figure 7**. Docs Agent's pre-processing flow for various doc types.
 
+### Using the Semantic Retrieval API and AQA model
+
+Docs Agent provides options to use Gemini's [Semantic Retrieval API][semantic-api] for storing text
+chunks in Google Cloud's online storage (and using this online storage for context retrieval),
+in combination with using the [AQA model][aqa-model] for question-answering.
+
+To use the Semantic Retrieval API, update the `config.yaml` file to include the following settings:
+
+```
+db_type: "ONLINE_STORAGE"
+is_aqa_used: "YES"
+```
+
+The setup above uses both the Semantic Retrieval API to store text chunks online and the AQA model.
+
+**Note**: At the moment, when `db_type` is set to `ONLINE_STORAGE`, running the
+`populate_vector_database.py` script will also create and popluate a local vector database using
+Chroma as well as creating and populating a corpus online using the Semantic Retrieval API.
+
+However, if you want to use only the AQA model for question-answering, but without creating a
+corpus online, update the `config.yaml` file to include the following settings instead:
+
+```
+db_type: "LOCAL_DB"
+is_aqa_used: "YES"
+```
+
+The setup above uses the AQA model with your local Chroma vector database. (For more information,
+see the [More Options: AQA Using Inline Passages][inline-passages] section on the
+_Semantic Retriever Quickstart_ page.)
+
+**Note**: To use the Semantic Retrieval API, you need to complete the OAuth setup for your Google
+Cloud project from your host machine. For detailed instructions, see the
+[Authentication with OAuth quickstart][oauth-quickstart] page.
+
 ## Issues identified
 
 The following issues have been identified and need to be worked on:
@@ -497,12 +537,15 @@ To convert Markdown files to plain text files:
 6. Run the Python script:
 
    ```
-   python3 scripts/markdown_to_plain_text.py
+   python3 scripts/files_to_plain_text.py
    ```
 
    For a large number of Markdown files, it may take a few minutes to process
    Markdown files.
 
+   **Important**: The `markdown_to_plain_text.py` script is being deprecated in
+   favor of the [`files_to_plain_text.py`][files-to-plain-text] script.
+
 ### 2. Populate a new vector database
 
 Once you have plain text files processed and stored in the `output_path` directory,
@@ -675,3 +718,7 @@ Meggin Kearney (`@Meggin`), and Kyo Lee (`@kyolee415`).
 [scripts-readme]: ./scripts/README.md
 [config-yaml]: config.yaml
 [gen-ai-docs-repo]: https://github.com/google/generative-ai-docs
+[semantic-api]: https://ai.google.dev/docs/semantic_retriever
+[aqa-model]: https://ai.google.dev/models/gemini#model_variations
+[oauth-quickstart]: https://ai.google.dev/docs/oauth_quickstart
+[inline-passages]: https://ai.google.dev/docs/semantic_retriever#more_options_aqa_using_inline_passages
@@ -0,0 +1,127 @@
+#
+# Copyright 2023 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""AQA module for using the Semantic Retrieval API"""
+
+import google.ai.generativelanguage as glm
+
+
+class AQA:
+    def __init__(self):
+        # Initialize variables for the Semantic Retrieval API
+        self.generative_service_client = glm.GenerativeServiceClient()
+        self.retriever_service_client = glm.RetrieverServiceClient()
+        self.permission_service_client = glm.PermissionServiceClient()
+
+    def list_existing_corpora(self):
+        corpora_list = glm.ListCorporaRequest()
+        response = self.retriever_service_client.list_corpora(corpora_list)
+        print("List of existing corpora:\n")
+        print(response)
+
+    def delete_a_corpus(self, corpus_name):
+        try:
+            delete = glm.DeleteCorpusRequest(name=corpus_name)
+            delete = glm.DeleteCorpusRequest(name=corpus_name, force=True)
+            delete_corpus_response = self.retriever_service_client.delete_corpus(delete)
+            print("Successfully deleted corpus: " + corpus_name)
+        except:
+            print("Failed to delete the corpus: " + corpus_name)
+
+    def create_a_new_corpus(self, corpus_display, corpus_name):
+        try:
+            # Get an existing corpus
+            get_corpus_request = glm.GetCorpusRequest(name=corpus_name)
+            get_corpus_response = self.retriever_service_client.get_corpus(
+                get_corpus_request
+            )
+            print()
+            print(f"{corpus_name} exists.\n{get_corpus_response}")
+        except:
+            # Create a new corpus
+            corpus = glm.Corpus(display_name=corpus_display, name=corpus_name)
+            create_corpus_request = glm.CreateCorpusRequest(corpus=corpus)
+            create_corpus_response = self.retriever_service_client.create_corpus(
+                create_corpus_request
+            )
+            print()
+            print(f"Created a new corpus {corpus_name}.\n{create_corpus_response}")
+
+    def create_a_doc(self, corpus_name, page_title, page_url):
+        document_resource_name = ""
+        try:
+            # Create a new document with a custom display name.
+            example_document = glm.Document(display_name=page_title)
+            # Add metadata.
+            document_metadata = [glm.CustomMetadata(key="url", string_value=page_url)]
+            example_document.custom_metadata.extend(document_metadata)
+            # Make the request
+            create_document_request = glm.CreateDocumentRequest(
+                parent=corpus_name, document=example_document
+            )
+            create_document_response = self.retriever_service_client.create_document(
+                create_document_request
+            )
+            # Set the `document_resource_name` for subsequent sections.
+            document_resource_name = create_document_response.name
+        except:
+            get_document_request = glm.GetDocumentRequest(name=document_resource_name)
+            # Make the request
+            get_document_response = self.retriever_service_client.get_document(
+                get_document_request
+            )
+            document_resource_name = get_document_response.name
+        return document_resource_name
+
+    def create_a_chunk(self, doc_name, text):
+        response = ""
+        try:
+            chunk = glm.Chunk(data={"string_value": text})
+            create_chunk_requests = []
+            create_chunk_requests.append(
+                glm.CreateChunkRequest(parent=doc_name, chunk=chunk)
+            )
+            # Make the request
+            request = glm.BatchCreateChunksRequest(
+                parent=doc_name, requests=create_chunk_requests
+            )
+            response = self.retriever_service_client.batch_create_chunks(request)
+            # Print the response
+            print("Created a new text chunk:\n")
+            print(response)
+        except:
+            print("[ERROR] Failed to create a text chunk for:\n\n" + text)
+            # Failed possibly because the size of the text is too large.
+            # Quick fix: Split the text into 2 chunks
+            lines = text.splitlines()
+            text_01 = ""
+            text_02 = ""
+            text_size = len(lines)
+            half_size = int(text_size / 2)
+            i = 0
+            for line in lines:
+                if i < half_size:
+                    text_01 += line + "\n"
+                else:
+                    text_02 += line + "\n"
+                i += 1
+            self.create_a_chunk(doc_name, text_01)
+            self.create_a_chunk(doc_name, text_02)
+        return response
+
+    def create_a_doc_chunk(self, corpus_name, page_title, page_url, text):
+        doc_name = self.create_a_doc(corpus_name, page_title, page_url)
+        return self.create_a_chunk(doc_name, text)
@@ -35,7 +35,7 @@
 import uuid
 from scripts import read_config
 
-from chroma import Format
+from modules.chroma import Format
 from docs_agent import DocsAgent
 
 
@@ -146,7 +146,9 @@ def ask_model(question):
     query_result = docs_agent.query_vector_store(question)
     context = query_result.fetch_formatted(Format.CONTEXT)
     context_with_instruction = docs_agent.add_instruction_to_context(context)
-    if "gemini" in docs_agent.get_language_model_name():
+    if docs_agent.check_if_aqa_is_used():
+        response = docs_agent.ask_aqa_model(question)
+    elif "gemini" in docs_agent.get_language_model_name():
         response = docs_agent.ask_content_model_with_context(
             context_with_instruction, question
         )
@@ -196,11 +198,20 @@ def ask_model(question):
 
     ### PREPARE OTHER ELEMENTS NEEDED BY UI.
     # - Create a uuid for this request.
+    # - (Optional) Prepare the AQA model's JSON response into HTML for rendering.
     # - Convert the context returned from the database into HTML for rendering.
     # - Convert the first response from the model into HTML for rendering.
     # - Convert the fact-check response from the model into HTML for rendering.
     # - A workaround to get the server's URL to work with the rewrite and like features.
     new_uuid = uuid.uuid1()
+    aqa_response_in_html = ""
+    if docs_agent.check_if_aqa_is_used():
+        aqa_response_json = docs_agent.get_saved_aqa_response_json()
+        if aqa_response_json:
+            aqa_response_in_html = "Grounding attributions:<br/><br/>"
+            aqa_response_in_html += str(aqa_response_json.answer.grounding_attributions)
+            aqa_response_in_html += "<br/><br/>Answerable probability: "
+            aqa_response_in_html += str(aqa_response_json.answerable_probability)
     context_in_html = markdown.markdown(context, extensions=["fenced_code"])
     response_in_html = markdown.markdown(response, extensions=["fenced_code"])
     fact_checked_response_in_html = markdown.markdown(fact_checked_response)
@@ -222,6 +233,7 @@ def ask_model(question):
         product=product,
         server_url=server_url,
         uuid=new_uuid,
+        aqa_response_in_html=aqa_response_in_html,
     )
 
 
 
@@ -327,3 +327,13 @@ code {
   margin-bottom: 0;
 }
 
+.accordion #aqa-label {
+  padding: 2px;
+  background: #add1e8;
+  border-radius: 0px;
+}
+
+.accordion .content #aqa-content{
+  background: #d4d4d4;
+}
+
@@ -20,14 +20,24 @@ let loadingDiv = document.getElementById('loading-div');
 
 if (askButton != null){
   askButton.addEventListener('click',function (){
-    console.log("here");
     if (loadingDiv.classList.contains("hidden")){
       loadingDiv.classList.remove("hidden");
-      console.log("there");
     }
   });
 }
 
+// Display the "aqa-box" div only if the aqa json response is included.
+let aqaContent = document.getElementById('aqa-content');
+let aqaBox = document.getElementById('aqa-box');
+
+if (aqaContent != null){
+  if (aqaContent.innerText.trim() != ""){
+    if (aqaBox.classList.contains("hidden")){
+      aqaBox.classList.remove("hidden");
+    }
+  }
+}
+
 // Toggle the hidden class on the `rewrite-box` div.
 let rewriteButton = document.getElementById('rewrite-button');
 
@@ -144,3 +154,4 @@ if (rewriteSubmitButton != null){
     rewriteSubmitButton.classList.add("disable");
   }, false);
 }
+
@@ -27,6 +27,18 @@ <h2 class="handle">
       <h4>Reference:</h4>
       {{ clickable_urls | safe }}
     </span>
+    <br/>
+    <div class="hidden" id="aqa-box">
+      <section class="accordion">
+        <input type="checkbox" name="collapse" id="handle2">
+        <h2 class="handle">
+          <label for="handle2" id="aqa-label">AQA response JSON</label>
+        </h2>
+        <div class="content" id="aqa-content">
+        {{ aqa_response_in_html | safe }}
+        </div>
+      </section>
+    </div>
   </div>
 </section>
 <div>
 
@@ -43,13 +43,22 @@ embedding_model: "models/embedding-001"
 #                  the Chroma vector database.
 #
 # log_level:       The verbosity level of logs printed on the terminal
-#                  by the chatbot app: NORMAL or VERBOSE
+#                  by the chatbot app: NORMAL, VERBOSE, or DEBUG
+#
+# db_type:         Use a local vector database or an online storage
+#                  using the Semantic Retrieval API:
+#                  LOCAL_DB or ONLINE_STORAGE
+#
+# is_aqa_used:     Use Gemini's AQA model for question-answering:
+#                  NO or YES
 #
 product_name: "My product"
 output_path: "data/plain_docs"
 vector_db_dir: "vector_stores/chroma"
 collection_name: "docs_collection"
 log_level: "NORMAL"
+db_type: "LOCAL_DB"
+is_aqa_used: "NO"
 
 
 ### Documentation sources
@@ -96,6 +105,6 @@ about which part of the text they should consider fact-checking? (Please keep
 your response concise, focus on only one important item, but DO NOT USE BOLD
 TEXT IN YOUR RESPONSE.)"
 
-model_error_message: "PaLM is not able to answer this question at the moment.
+model_error_message: "Gemini is not able to answer this question at the moment.
 Rephrase the question and try asking again."