Skip to content

Commit 998c2f5

Browse files
authored
[Release] 0.1.3 (google#164)
* [Release] 0.1.3 What's changed: - Created Apps Script to process Google docs, PDF, and Gmail threads. See `apps_script/README.md`. - Added `third_party/g2docsmd-html` which is the base for converting files to markdown. - Updated the main `README.md` for clarity. - Created a `scripts/README.md` to better explain content processing. * [Fix] Fix readme links. * [Fix] Fix symlink to third_party. * [Fix] Fix symlink to exportmd.gs to be relative. * [Fix] Fix symlink.
1 parent 13372ab commit 998c2f5

File tree

17 files changed

+2914
-387
lines changed

17 files changed

+2914
-387
lines changed

demos/palm/python/docs-agent/README.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Docs Agent
22

3-
The Docs Agent demo enables [PaLM API][genai-doc-site] users to launch a chat application
3+
The Docs Agent project enables [PaLM API][genai-doc-site] users to launch a chat application
44
on a Linux-based host machine using their own set of documents as a source dataset.
55

66
**Note**: If you're interested in setting up and launching the Docs Agent sample app on your
@@ -57,7 +57,7 @@ content from the source documents given user questions.
5757
Once the most relevant content is returned, the Docs Agent server uses the prompt structure
5858
shown in Figure 3 to augment the user question with a preset **condition** and a list of
5959
**context**. (When the Docs Agent server starts, the condition value is read from the
60-
[`config.yaml`][condition-txt] file.) Then the Docs Agent server sends this prompt to a
60+
[`config.yaml`][config-yaml] file.) Then the Docs Agent server sends this prompt to a
6161
PaLM 2 model using the PaLM API and receives a response generated by the model.
6262

6363
![Docs Agent prompt strcture](docs/images/docs-agent-prompt-structure-01.png)
@@ -82,6 +82,9 @@ running on the host machine.
8282
The embeddings in this vector database enable the Docs Agent server to perform semantic search
8383
and retrieve context related to user questions for augmenting prompts.
8484

85+
For more information on the processing of Markdown files, see the [`README`][scripts-readme]
86+
file in the `scripts` directory.
87+
8588
![Document to embeddings](docs/images/docs-agent-embeddings-01.png)
8689

8790
**Figure 4**. A document is split into small semantic chunks, which are then used to generate
@@ -296,6 +299,13 @@ event of "like" for the response.
296299
The user may click this like button multiple times to toggle the state of the like button. But when
297300
examining the logs, only the final state of the like button will be considered for the response.
298301

302+
### Using Google Docs, PDF, or Gmail as input sources
303+
304+
The project includes Apps Script files that allow you to convert various sources of content
305+
(including Google Docs and PDF) from your Google Drive and Gmail into Markdown files. You can then
306+
use these Markdown files as additional input sources for Docs Agent. For more information, see the
307+
[`README`][apps-script-readme] file in the `apps_script` directory.
308+
299309
## Issues identified
300310

301311
The following issues have been identified and need to be worked on:
@@ -427,7 +437,7 @@ To convert Markdown files to plain text files:
427437
cd $HOME/generative-ai-docs/demos/palm/python/docs-agent
428438
```
429439

430-
2. Open the `config.yaml` file using a text editor, for example:
440+
2. Open the [`config.yaml`][config-yaml] file using a text editor, for example:
431441

432442
```
433443
nano config.yaml
@@ -542,7 +552,7 @@ allowing you to easily bring up and destory the Flask app instance.
542552

543553
To customize settings in the Docs Agent chat app, do the following:
544554

545-
1. Edit the `config.yaml` file to update the following field:
555+
1. Edit the [`config.yaml`][config-yaml] file to update the following field:
546556

547557
```
548558
product_name: "My product"
@@ -636,7 +646,6 @@ Meggin Kearney (`@Meggin`), and Kyo Lee (`@kyolee415`).
636646
[set-up-docs-agent]: #set-up-docs-agent
637647
[markdown-to-plain-text]: ./scripts/markdown_to_plain_text.py
638648
[populate-vector-database]: ./scripts/populate_vector_database.py
639-
[condition-txt]: ./config.yaml
640649
[context-source-01]: http://eventhorizontelescope.org
641650
[fact-check-section]: #using-a-palm-2-model-to-fact-check-its-own-response
642651
[related-questions-section]: #using-a-palm-2-model-to-suggest-related-questions
@@ -650,4 +659,7 @@ Meggin Kearney (`@Meggin`), and Kyo Lee (`@kyolee415`).
650659
[flutter-docs-src]: https://github.com/flutter/website/tree/main/src
651660
[flutter-docs-site]: https://docs.flutter.dev/
652661
[poetry-known-issue]: https://github.com/python-poetry/poetry/issues/1917
662+
[apps-script-readme]: ./apps_script/README.md
663+
[scripts-readme]: ./scripts/README.md
664+
[config-yaml]: config.yaml
653665
[gen-ai-docs-repo]: https://github.com/google/generative-ai-docs
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# Convert Google Docs, PDF, and Gmail to Markdown files
2+
3+
The collection of scripts in this `apps_script` directory allows you to convert
4+
the contents of Google Drive folders and Gmail to Markdown files that are
5+
compatible with Docs Agent.
6+
7+
The steps are:
8+
9+
1. [Prepare a Google Drive folder](#1-prepare-a-google-driver-folder).
10+
2. [Mount Google Drive on your host machine](#2-mount-google-drive-on-your-host-machine).
11+
3. [Create an Apps Script project](#3-create-an-apps-script-project).
12+
4. [Edit and run main.gs on Apps Script](#4-edit-and-run-maings-on-apps-script).
13+
5. [Update config.yaml to include the mounted directory](#5-update-configyaml-to-include-the-mounted-directory).
14+
15+
## 1. Prepare a Google Drive folder
16+
17+
First, create a new folder in Google Drive and add your Google Docs (which will be
18+
used as source documents to Docs Agent) to the folder.
19+
20+
Do the following:
21+
22+
1. Browser to https://drive.google.com/.
23+
1. Click **+ New** on the top left corner.
24+
1. Click **New folder**.
25+
1. Name your new folder (for example, `my source Google Docs`).
26+
1. To enter the newly created folder, double click the folder.
27+
1. Add (or move) your source Google Docs to this new folder.
28+
29+
## 2. Mount Google Drive on your host machine
30+
31+
Mount your Google Drive to your host machine, so that it becomes easy to access the
32+
folders in Google Drive from your host machine (later in step 5).
33+
34+
There are a variety of methods and tools available online that enable this setup
35+
(for example, see [`google-drive-ocamlfuse`][google-drive-ocamlfuse] for Linux machines).
36+
37+
## 3. Create an Apps Script project
38+
39+
Create a new Apps Script project and copy all the `.gs` scripts in this
40+
`apps_script` directory to your new Apps Script project.
41+
42+
Do the following:
43+
44+
1. Browse to https://script.google.com/.
45+
1. Click **New Project**.
46+
1. At the top of the page, click **Untitled Project** and enter a meaningful
47+
title (for example, `gDocs to Docs Agent`).
48+
1. Click the **+** icon next to **Files**.
49+
1. Click **Script**.
50+
1. Name the new script to be one of the `.gs` files in this `apps_script` directory
51+
(for example, `drive_to_markdown`).
52+
1. Copy the content of the `.gs` file to the new script on your Apps Script project.
53+
1. To save, click the "Save project" icon in the toolbar.
54+
1. Repeat the steps until all the `.gs` files are copied to your Apps Script project.
55+
1. Click the **+** icon next to **Services**.
56+
1. Scroll down and click **Drive API**.
57+
1. Click **Add**.
58+
59+
You are now ready to edit the parameters on the `main.gs` file to select a folder
60+
in Google Drive and export emails from Gmail.
61+
62+
![Apps Script project](../docs/images/apps-script-screenshot-01.png)
63+
64+
**Figure 1**. A screenshot of an example Apps Script project.
65+
66+
## 4. Edit and run main.gs on Apps Script
67+
68+
Edit the `main.gs` file on your Apps Script project to select which functions
69+
(features) you want to run.
70+
71+
Do the following:
72+
73+
1. Browse to your project on https://script.google.com/.
74+
75+
1. Open the `main.gs` file.
76+
77+
1. In the `main` function, comment out any functions that you don't want to run
78+
(see Figure 1):
79+
80+
* `convertDriveFolderToMDForDocsAgent(folderInput)`: This function converts
81+
the contents of a Google Drive folder to Markdown files (currently only Google
82+
Docs and PDF). Make sure to specify a valid Google Drive folder in the `folderInput`
83+
variable. Use the name of the folder created in **step 1** above, for example:
84+
85+
```
86+
var folderInput = "my source Google Docs"
87+
function main() {
88+
convertDriveFolderToMDForDocsAgent(folderInput);
89+
//exportEmailsToMarkdown(SEARCH_QUERY, folderOutput);
90+
}
91+
```
92+
93+
* `exportEmailsToMarkdown(SEARCH_QUERY, folderOutput)`: This function converts
94+
the emails returned from a Gmail search query into Markdown files. Make sure to
95+
specify a search query in the `SEARCH_QUERY` variable. You can test this search
96+
query directly in the Gmail search bar. Also, specify an output directory for the
97+
resulting emails.
98+
99+
1. To save, click the "Save project" icon in the toolbar.
100+
101+
1. Click the "Run" icon in the toolbar.
102+
103+
When this script runs successfully, the Execution log panel prints output similar
104+
to the following:
105+
106+
```
107+
9:55:59 PM Notice Execution completed
108+
```
109+
110+
Also, the script creates a new folder in your Google Drive and stores the converted
111+
Markdown files in this folder. The name of this new folder has `-output` as a postfix.
112+
For example, with the folder name `my source Google Docs`, the name of the new folder
113+
is `my source Google Docs-output`.
114+
115+
With Google Drive mounted on your host machine in step 2, you can now directly access
116+
this folder from the host machine, for example:
117+
118+
```
119+
user@hostname:~/DriveFileStream/My Drive/my source Google Docs-output$ ls
120+
Copy_of_My_Google_Docs_To_Be_Converted.md
121+
```
122+
123+
## 5. Update config.yaml to include the mounted directory
124+
125+
Once you have your Google Drive mounted on the host machine, you can now
126+
specify one of its folders as an input source directory for Docs Agent.
127+
128+
Do the following:
129+
130+
1. In the Docs Agent project, open the [`config.yaml`][config-yaml] file
131+
with a text editor.
132+
133+
1. Specify your mounted Google Drive folder as an `input` group, for example:
134+
135+
```
136+
input:
137+
- path: "/home/user/DriveFileStream/My Drive/my source Google Docs-output"
138+
url_prefix: "docs.google.com"
139+
```
140+
141+
You **must** specify a value to the `url_prefix` field, such as `docs.google.com`.
142+
Currently this value is used to generate hashes for the content.
143+
144+
1. (**Optional**) Add an additional Google Drive folder for your exported emails,
145+
for example:
146+
147+
```
148+
input:
149+
- path: "/home/user/DriveFileStream/My Drive/my source Google Docs-output"
150+
url_prefix: "docs.google.com"
151+
- path: "/home/user/DriveFileStream/My Drive/psa-output"
152+
url_prefix: "mail.google.com"
153+
```
154+
155+
1. Save the changes in the `config.yaml` file.
156+
157+
You're all set with a new documentation source for Docs Agent. You can now follow the
158+
instructions in the project's main [`README`][main-readme] file to launch the Docs Agent app.
159+
160+
<!-- Reference links -->
161+
162+
[config-yaml]: ../config.yaml
163+
[main-readme]: ../README.md
164+
[google-drive-ocamlfuse]: https://github.com/astrada/google-drive-ocamlfuse

0 commit comments

Comments
 (0)