-
Notifications
You must be signed in to change notification settings - Fork 12
Uploading documents to a Google Cloud RAG Engine Corpus
First, when logged in to Google Cloud Console as a user in the tenantfirstaid.com organization, navigate to the Google Cloud Storage Buckets page (found here). For TFA, the bucket is called "tenantfirstaid".
Find the bucket you want to upload to and click to access it. Then, click the "Upload" dropdown and select "Upload Files" to upload new documents.
Now navigate to the RAG Engine page, found here. Click the corpus you want to add to (for TFA, called "tenantfirstaid").
Now click the "Import Data" button to add the file you just uploaded, and use the "Import from Google Cloud Storage" option. That's it!"
The current system relies on a structured JSONL file, metadata.jsonl, to add metadata such as "city" and "state" to our documents (see example below). It follows the format documented in: https://docs.cloud.google.com/generative-ai-app-builder/docs/prepare-data#storage-unstructured
{"id":"ORS090","structData":{"city":"null","state":"or"},"content":{"mimeType":"text/plain","uri":"gs://<source_bucket>/ORS090.txt"}}
{"id":"ORS105","structData":{"city":"null","state":"or"},"content":{"mimeType":"text/plain","uri":"gs://<source_bucket>/ORS105.txt"}}
{"id":"PCC30_01","structData":{"city":"portland","state":"or"},"content":{"mimeType":"text/plain","uri":"gs://<source_bucket>/PCC30.01.txt"}}