Skip to content

Commit b6446bf

Browse files
committed
[docs-agent] Release of Docs Agent v.0.3.3
What's changed: - Added a new post-processing feature to delete existing chunks in databases if they are no longer found in source dataset. - Updated the pre-processing module to make sure text chunks are not bigger than 5 KB for generating embeddings. - Added a feature to display the distribution of chunk sizes after running `agent chunk`. - Added more `agent` commandlines and options. - Updated the "Rewrite" button to the "Feedback" button by default. - Bug fixes.
1 parent f16339b commit b6446bf

File tree

30 files changed

+1534
-317
lines changed

30 files changed

+1534
-317
lines changed

examples/gemini/python/docs-agent/README.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,13 @@ Authorize Google Cloud credentials on your host machine:
172172

173173
2. Copy the `client_secret.json` file to your host machine.
174174

175-
3. To authenticate credentials, run the following command in the directory of
175+
3. Install the Google Cloud SDK on your host machine:
176+
177+
```
178+
sudo apt install google-cloud-sdk
179+
```
180+
181+
4. To authenticate credentials, run the following command in the directory of
176182
the host machine where the `client_secret.json` file is located:
177183

178184
```
@@ -181,10 +187,7 @@ Authorize Google Cloud credentials on your host machine:
181187

182188
This command opens a browser and asks to log in using your Google account.
183189

184-
**Note**: If the `gcloud` command doesn’t exist, install the Google Cloud SDK
185-
on your host machine: `sudo apt install google-cloud-sdk`
186-
187-
4. Follow the instructions on the browser and click **Allow** to authenticate.
190+
5. Follow the instructions on the browser and click **Allow** to authenticate.
188191

189192
This saves the authenticated credentials for Docs Agent
190193
(`application_default_credentials.json`) in the `$HOME/.config/gcloud/`

examples/gemini/python/docs-agent/apps_script/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ Do the following:
5454
1. Repeat the steps until all the `.gs` files are copied to your Apps Script project.
5555
1. Click the **+** icon next to **Services**.
5656
1. Scroll down and click **Drive API**.
57+
1. Select **v2**.
5758
1. Click **Add**.
5859

5960
You are now ready to edit the parameters on the `main.gs` file to select a folder

examples/gemini/python/docs-agent/apps_script/drive_to_markdown.gs

Lines changed: 59 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -13,25 +13,55 @@
1313
* See the License for the specific language governing permissions and
1414
* limitations under the License.
1515
*/
16+
function convertDriveFolderToMDForDocsAgent(folderName, outputFolderName=""){
17+
gdoc_count = 0;
18+
pdf_count = 0;
19+
new_file_count = 0;
20+
updated_file_count = 0;
21+
unchanged_file_count = 0;
22+
gdoc_count, pdf_count, new_file_count, updated_file_count, unchanged_file_count = convertDriveFolder(folderName, outputFolderName=outputFolderName)
23+
let conversion_count = pdf_count + gdoc_count
24+
let file_count = new_file_count + updated_file_count + unchanged_file_count
25+
Logger.log("Converted a total of: " + gdoc_count + " Google Doc files.");
26+
Logger.log("Converted a total of: " + pdf_count + " PDF files.");
27+
Logger.log("Converted a grand total of: " + conversion_count + " files.");
28+
Logger.log("New files: " + new_file_count)
29+
Logger.log("Updated a total of: " + updated_file_count + " files.")
30+
Logger.log("Files that haven't changed: " + unchanged_file_count);
31+
Logger.log("Input directory had a total of: " + file_count + " files.")
32+
}
33+
34+
function convertDriveFolder(folderName, outputFolderName="", indexFile="") {
1635

17-
function convertDriveFolderToMDForDocsAgent(folderName) {
1836
//Checks if input folder exists or exits
1937
if(folderExistsInput(folderName)){
2038
var file_count = 0;
2139
var folders = DriveApp.getFoldersByName(folderName);
22-
Logger.log("Output directory: "+ folderName + "-output");
23-
var folderOutput = folderName + "-output";
24-
var output_file_name = folderName + "-index";
40+
if (outputFolderName=="") {
41+
var folderOutput = folderName + "-output";
42+
var output_file_name = folderName + "-index";
43+
}
44+
else {
45+
var folderOutput = outputFolderName + "-output";
46+
var output_file_name = outputFolderName + "-index";
47+
}
48+
Logger.log("Output directory: "+ folderOutput);
2549
folderExistsOrCreate(folderOutput);
2650
var folderOutputObj = DriveApp.getFoldersByName(folderOutput);
2751
if (folderOutputObj.hasNext()){
2852
var folderOutputName = folderOutputObj.next();
2953
}
30-
var sheet = checkIndexOutputOrCreate(output_file_name, folderOutputName);
31-
var timeZone = Session.getScriptTimeZone();
32-
var date = Utilities.formatDate(new Date(), timeZone, "MM-dd-yyyy HH:mm:ss z");
33-
sheet.appendRow(["Created: ", date])
34-
sheet.appendRow(["Name","ID", "URL", "Markdown ID", "Markdown Output", "Date Created", "Last Updated", "Type", "Folder", "MD5 hash", "Status"]);
54+
if (indexFile=="") {
55+
var sheet = checkIndexOutputOrCreate(output_file_name, folderOutputName);
56+
var timeZone = Session.getScriptTimeZone();
57+
var date = Utilities.formatDate(new Date(), timeZone, "MM-dd-yyyy HH:mm:ss z");
58+
sheet.appendRow(["Created: ", date])
59+
sheet.appendRow(["Name","ID", "URL", "Markdown ID", "Markdown Output", "Date Created", "Last Updated", "Type", "Folder", "MD5 hash", "Status"]);
60+
}
61+
else {
62+
var sheet = indexFile
63+
}
64+
// var sheet_id = sheet.getId();
3565
var foldersnext = folders.next();
3666
var myfiles = foldersnext.getFiles();
3767
var new_file_count = 0;
@@ -54,6 +84,22 @@ function convertDriveFolderToMDForDocsAgent(folderName) {
5484
else{
5585
var fid = myfile.getId();
5686
}
87+
if (ftype == "application/vnd.google-apps.folder") {
88+
var folder = DriveApp.getFolderById(fid);
89+
Logger.log("Sub-directory: " + folder);
90+
sub_gdoc_count = 0;
91+
sub_pdf_count = 0;
92+
sub_new_file_count = 0;
93+
sub_updated_file_count = 0;
94+
sub_unchanged_file_count = 0;
95+
sub_gdoc_count, sub_pdf_count, sub_new_file_count, sub_updated_file_count, sub_unchanged_file_count = convertDriveFolder(folder, outputFolderName=foldersnext, indexFile=sheet);
96+
gdoc_count += sub_gdoc_count;
97+
pdf_count += sub_pdf_count;
98+
new_file_count += sub_new_file_count;
99+
updated_file_count += sub_updated_file_count;
100+
unchanged_file_count += sub_unchanged_file_count;
101+
continue;
102+
}
57103
var fname = sanitizeFileName(myfile.getName());
58104
var fdate = myfile.getLastUpdated();
59105
var furl = myfile.getUrl();
@@ -158,7 +204,7 @@ function convertDriveFolderToMDForDocsAgent(folderName) {
158204
var saved_file_id = saved_file.getId();
159205
Logger.log("Finished converting file: "+ fname + " to markdown.");
160206
Logger.log("Markdown file: " + saved_file);
161-
Logger.log("Clearing temporary gdoc: " );
207+
Logger.log("Clearing temporary gdoc" );
162208
let output_file = DriveApp.getFileById(output_id);
163209
output_file.setTrashed(true);
164210
status = "New content";
@@ -182,19 +228,13 @@ function convertDriveFolderToMDForDocsAgent(folderName) {
182228
hash_str,
183229
status,
184230
];
185-
row_number = file_count + start_data_row;
186231
sheet.appendRow(metadata);
232+
// Return final row to inserRichText into correct rows
233+
row_number = sheet.getLastRow();
187234
insertRichText(sheet, original_chip, "C", row_number);
188235
insertRichText(sheet, md_chip, "E", row_number);
189236
insertRichText(sheet, folder_chip, "I", row_number);
190237
}
191238
}
192-
let conversion_count = pdf_count + gdoc_count
193-
Logger.log("Converted a total of: " + gdoc_count + " Google Doc files.");
194-
Logger.log("Converted a total of: " + pdf_count + " PDF files.");
195-
Logger.log("Converted a grand total of: " + conversion_count + " files.");
196-
Logger.log("New files: " + new_file_count)
197-
Logger.log("Updated a total of: " + updated_file_count + " files.")
198-
Logger.log("Files that haven't changed: " + unchanged_file_count);
199-
Logger.log("Input directory had a total of: " + file_count + " files.")
239+
return gdoc_count, pdf_count, new_file_count, updated_file_count, unchanged_file_count
200240
}

examples/gemini/python/docs-agent/apps_script/helper_functions.gs

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,18 +67,24 @@ function checkFileExists(fileName,folderName){
6767

6868
// Function to check if an index output sheet exists or creates it. Returns the file object
6969
// Specify the file output name and outputdirectory
70-
function checkIndexOutputOrCreate(fileName, folderOutput) {
70+
function checkIndexOutputOrCreate(fileName, folderOutput, indexFileID="") {
7171
var timeZone = Session.getScriptTimeZone();
7272
var date = Utilities.formatDate(new Date(), timeZone, "MM-dd-yyyy hh:mm:ss");
7373
let file = {title: fileName, mimeType: MimeType.GOOGLE_SHEETS, parents: [{id: folderOutput.getId()}]}
7474
let params = "title='" + fileName + "' and parents in '" + folderOutput.getId() + "'";
7575
let file_search = DriveApp.searchFiles(params);
7676
if (file_search.hasNext()) {
77-
let fileId = file_search.next().getId();
77+
if (indexFileID=="") {
78+
var fileId = file_search.next().getId();
79+
}
80+
else {
81+
var fileId = indexFileID;
82+
}
7883
var sheet = SpreadsheetApp.openById(fileId);
7984
Logger.log("File index: " + fileName + " exists.");
8085
var sheet_index = sheet.getSheetByName("Index");
81-
if (sheet.getSheetByName("Backup")){
86+
// Checks to see if this is a sub directory
87+
if (sheet.getSheetByName("Backup")) {
8288
var sheet_backup = sheet.getSheetByName("Backup");
8389
sheet.deleteSheet(sheet_backup);
8490
}

examples/gemini/python/docs-agent/docs/cli-reference.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,15 @@ by running the `agent chunk` command):
2626
agent populate
2727
```
2828

29+
### Populate a vector database and delete stale text chunks
30+
31+
The command below deletes stale entries in the existing vector database
32+
before populating it with the new text chunks:
33+
34+
```sh
35+
agent populate --enable_delete_chunks
36+
```
37+
2938
### Show the Docs Agent configuration
3039

3140
The command below prints all the fields and values in the current
@@ -35,6 +44,15 @@ The command below prints all the fields and values in the current
3544
agent show-config
3645
```
3746

47+
### Clean up the Docs Agent development environment
48+
49+
The command below deletes development databases specified in the
50+
`config.yaml` file:
51+
52+
```sh
53+
agent cleanup-dev
54+
```
55+
3856
## Docs Agent chatbot web app
3957

4058
### Launch the Docs Agent web app
@@ -53,6 +71,24 @@ The command below launches the Docs Agent web app to run on port 5005:
5371
agent chatbot --port 5005
5472
```
5573

74+
### Launch the Docs Agent web app as a widget
75+
76+
The command below launches the Docs Agent web app to use
77+
a widget-friendly template:
78+
79+
```sh
80+
agent chatbot --app_mode widget
81+
```
82+
83+
### Launch the Docs Agent web app with a log view enabled
84+
85+
The command below launches the Docs Agent web app while enabling
86+
a log view page (which is accessible at `<APP_URL>/logs`):
87+
88+
```sh
89+
agent chatbot --enable_show_logs
90+
```
91+
5692
## Docs Agent benchmark test
5793

5894
### Run the Docs Agent benchmark test
@@ -106,6 +142,22 @@ You may also specify multiple products, for example:
106142
agent tellme which modules are available? --product=Flutter --product=Angular --product=Android
107143
```
108144

145+
### Ask for advice
146+
147+
The command below reads a request and a filename from the arguments,
148+
asks the Gemini model, and prints its response:
149+
150+
```sh
151+
agent helpme <REQUEST> --file <PATH_TO_FILE>
152+
```
153+
154+
Replace `REQUEST` with a prompt and `PATH_TO_FILE` with a file's
155+
absolure or relative path, for example:
156+
157+
```sh
158+
agent helpme write comments for this C++ file? --file ../my-project/test.cc
159+
```
160+
109161
## Online corpus management
110162

111163
### List all existing online corpora

0 commit comments

Comments
 (0)