Cloud Function that demonstrates how to process an uploaded file in Google Cloud Storage and perform summarization using PaLM Vertex AI API on the contents
| Author(s) | Romin Irani |
This application demonstrates a Cloud Function written in Python that gets triggered when a file is uploaded to a specific Google Cloud Storage bucket that is configured with. It does the following:
- Reads the content of the file.
- Invokes the PaLM Text Bison model with a Prompt to summarize the contents.
- Writes the summarized data into another Google Cloud Storage (GCS) bucket.
NOTE: Before you move forward, ensure that you have followed the instructions in SETUP.md. Additionally, ensure that you have cloned this repository and are currently in the
summarization-gcs-cloudfunctionfolder. This should be your active working directory for the rest of the commands.
Your Cloud Function requires access to two environment variables:
GCP_PROJECT: This the Google Cloud Project Id.GCP_REGION: This is the region in which you are deploying your Cloud Function. For e.g. us-central1.
These variables are needed since the Vertex AI initialization needs the Google Cloud Project Id and the region. The specific code line from the main.py function is shown here:
vertexai.init(project=PROJECT_ID, location=LOCATION)
In Cloud Shell, execute the following commands:
export GCP_PROJECT='<Your GCP Project Id>' # Change this
export GCP_REGION='us-central1' # If you change this, make sure region is supported by Model Garden. When in doubt, keep this.These variables can be set via the following instructions via any of the following ways:
- At the time of deploying the Google Cloud Function. We will be using this method in the next section when we deploy the Cloud Function.
- Updating the environment variables after deploying the Google Cloud Function.
We will need to create 2 GCS buckets:
- The first bucket will be used to upload the files to summarize. Let us call the bucket
$BUCKETNAME. Create the environment variable to store your Bucket name as shown below:
export BUCKET_NAME='Your GCS Bucket Name'- The second bucket will having a prefix
-summaries.
You can create a bucket either from Google Cloud Console or from the command line via the gsutil command. Execute the commands below in Cloud Shell.
gsutil mb -l $GCP_REGION gs://"$BUCKET_NAME"
gsutil mb -l $GCP_REGION gs://"$BUCKET_NAME"-summariesAssuming that you have a copy of this project on your local machine with gcloud SDK setup on the machine, follow these steps:
-
Go to the root folder of this project.
-
You should have both the
main.pyandrequirements.txtfile present in this folder. -
Provide the following command:
gcloud functions deploy summarizeArticles \ --gen2 \ --runtime=python311 \ --source=. \ --region=$GCP_REGION \ --project=$GCP_PROJECT \ --entry-point=summarize_gcs_object \ --trigger-bucket=$BUCKET_NAME \ --set-env-vars=GCP_PROJECT=$GCP_PROJECT,GCP_REGION=$GCP_REGION \ --max-instances=1 \ --quiet
Since this Cloud Function is deployed with a GCS trigger, you will need to do the following to see the entire flow in action:
- Ensure that you have the following GCS buckets created
$BUCKET_NAMEand$BUCKET_NAME-summaries. - Upload a file (a sample file story.md has been provided) with some text in the
$BUCKET_NAMEbucket. - This should trigger the
summarizeArticlesfunction and within a few seconds, you should see astory.md(summarized form) file created in the$BUCKET-summariesbucket.