This project implements a serverless data pipeline on Google Cloud Platform (GCP) using Terraform-based Infrastructure as Code. The system ingests CSV datasets uploaded to Cloud Storage, triggers event-driven processing, applies lightweight transformations, and loads structured results into BigQuery for downstream analytics.
Designed as a portfolio project, it emphasizes practical cloud architecture concerns including cost efficiency, least-privilege IAM, scalability, and operational simplicity.
- ☁️ Serverless architecture
- Cloud Storage triggers serverless compute to process CSV uploads
- Transformed data is stored in BigQuery for analysis
- 🔒 Security by design
- Least privilege IAM roles
- Service account isolation
- Secure GCS buckets
- 🏗️ Infrastructure as Code
- Modular Terraform configuration
- Remote backend with GCS for state management
These enhancements are intentionally deferred to keep the initial design focused on core data flow and infrastructure boundaries.
- 🚀 Deployment automation
- Add Cloud Build triggers for automated infrastructure and service deployments
- 📊 Observability
- Integrate Cloud Logging and Cloud Monitoring dashboards
- Add uptime checks for critical endpoints
- Google Cloud Platform (GCP)
- Cloud Storage
- Cloud Run Functions (Python)
- BigQuery
- IAM
- Terraform (modular setup with GCS remote backend)
- Python for serverless data transformation
[Client Uploads CSV] → [Cloud Storage Bucket] → [Cloud Function (ETL)] ↓ [Processed Data in BigQuery] → [Downstream Analytics / BI Tools]
Cloud Storage bucket acts as an ingestion point for CSV uploads.
- Cloud Run Function is triggered on object creation:
- Validates and transforms data.
- Loads processed data into BigQuery.
- The pipeline is designed to be stateless and event-driven, with IAM-scoped service accounts and no long-lived credentials.
- Terraform v1.5+
- gcloud CLI
- Python 3.10+
- GCP project with billing enabled
git clone https://github.com/moondial-pal/gcp-serverless-data-pipeline.git
cd gcp-serverless-data-pipelineInitialize Terraform
terraform initDeploy Infrastructure
terraform applyDeploy Cloud Function
gcloud functions deploy process_csv \
--runtime python310 \
--trigger-bucket <your-ingestion-bucket> \
--entry-point main \
--source cloud_run_function/