Skip to content

Commit aea8405

Browse files
authored
Create a shared module for deploying simple scheduled jobs (#1)
## Summary: This module provides a complete, reusable Terraform solution for scheduled Google Cloud Functions. It encapsulates all the infrastructure complexity into a simple, configurable module that reduces scheduled function setup from >100 lines of copy-paste code to just 10-25 lines. This module emerged from patterns identified while building scheduled functions across multiple repositories, where teams were repeatedly copying the same complex infrastructure configurations with slight variations. ## What This Module Provides **Complete Infrastructure Setup:** - Cloud Function (2nd gen) with configurable runtime and resources - Cloud Scheduler with cron-based scheduling - PubSub topic for reliable event triggering - Service account with least-privilege permissions - Storage bucket with automatic lifecycle management - Secret Manager IAM bindings for secure access **Developer-Friendly Features:** - Automatic dependency installation (pip, custom scripts) - Source code change detection for redeployment - Multiple secrets support with version control - Configurable resource limits (memory, timeout, instances) - File exclusion patterns for clean deployments - Comprehensive validation and error handling ## Usage ### Basic Example ```hcl module "daily_backup" { source = "git::https://github.com/Khan/terraform-scheduled-function-module.git?ref=v1.0.0" function_name = "daily-backup" project_id = "my-gcp-project" secrets_project_id = "my-secrets-project" source_dir = "./functions/backup" main_file = "backup.py" schedule = "0 2 * * *" # 2 AM daily description = "Daily database backup" secrets = [ { env_var_name = "DATABASE_URL" secret_id = "postgres-connection" version = "latest" } ] } ``` ### Advanced Configuration ```hcl module "data_processor" { source = "git::https://github.com/Khan/terraform-scheduled-function-module.git?ref=v1.0.0" function_name = "data-processor" project_id = var.project_id secrets_project_id = var.secrets_project_id source_dir = "./functions/processor" main_file = "processor.py" schedule = "0 */6 * * *" # Every 6 hours # Resource configuration memory = "4096M" timeout_seconds = 300 max_instance_count = 3 # Multiple secrets secrets = [ { env_var_name = "DATABASE_URL" secret_id = "postgres-connection" version = "latest" }, { env_var_name = "API_KEY" secret_id = "external-api-key" version = "2" } ] # Custom dependency installation dependency_install_script = "pip install -r requirements.txt -t . && pip install tensorflow -t ." } ``` ## Problem Solved **Before this module**, creating a scheduled function required: - 145+ lines of Terraform code - 8 individual resources to configure manually - Complex dependency management - Repetitive IAM and security setup - Error-prone copy-paste between projects **With this module**: - 10-25 lines of declarative configuration - Single module call with clear parameters - Automatic best practices and validation - Consistent patterns across all functions - Cross-repository sharing with version control ## Cross-Repository Usage This module is designed for sharing across multiple repositories: ```hcl # Pin to specific version for production module "my_function" { source = "git::https://github.com/Khan/terraform-scheduled-function-module.git?ref=v1.0.0" # ... configuration } # Use latest for development module "test_function" { source = "git::https://github.com/Khan/terraform-scheduled-function-module.git" # ... configuration } ``` ## Key Benefits - **Dramatic code reduction**: 85%+ less infrastructure code per function - **Consistent patterns**: Same approach across all repositories - **Built-in best practices**: Security, IAM, lifecycle management - **Version control**: Semantic versioning with Git tags - **Easy maintenance**: Update once, benefit everywhere - **Faster development**: New functions in minutes, not hours ## Repository Structure ``` terraform-scheduled-function-module/ ├── README.md # Complete documentation ├── main.tf # Core infrastructure resources ├── variables.tf # Configurable parameters ├── outputs.tf # Module outputs └── examples/ └── simple-function/ # Working example ├── main.tf ├── variables.tf └── function-code/ ├── health_check.py └── requirements.txt ``` ## Function Code Structure The module expects your function code to follow this pattern: ```python import functions_framework @functions_framework.cloud_event def main(cloud_event): """Function entry point""" print("Task running!") return "Success" ``` This module establishes a foundation for rapid, consistent scheduled function development across all Khan Academy repositories. Issue: INFRA-10715 ## Test plan: - [x] [deploy culture cron using this module](https://github.com/Khan/culture-cron/actions/runs/16757981631/job/47445557836?pr=5) Author: jwbron Reviewers: csilvers, jwbron Required Reviewers: Approved By: csilvers Checks: ✅ 1 check was successful Pull Request URL: #1
1 parent 321aea8 commit aea8405

File tree

8 files changed

+705
-0
lines changed

8 files changed

+705
-0
lines changed
Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
# terraform-scheduled-function-module
2+
3+
A reusable Terraform module for scheduled Google Cloud Functions.
4+
5+
## Features
6+
7+
Creates a complete scheduled function setup:
8+
- Cloud Function (2nd gen) with configurable runtime
9+
- Cloud Scheduler with cron-based scheduling
10+
- PubSub topic for reliable triggering
11+
- Service account with least-privilege permissions
12+
- Storage bucket with lifecycle management
13+
- Secret Manager IAM bindings
14+
- Source code change detection
15+
16+
## Quick Start
17+
18+
```hcl
19+
module "my_daily_task" {
20+
source = "git::https://github.com/Khan/terraform-scheduled-function-module.git?ref=v1.0.0"
21+
22+
function_name = "my-daily-task"
23+
project_id = "my-gcp-project"
24+
secrets_project_id = "my-secrets-gcp-project"
25+
source_dir = "./functions/my-task"
26+
main_file = "main.py"
27+
schedule = "0 9 * * 1-5" # 9 AM weekdays
28+
description = "My daily automated task"
29+
30+
environment_variables = {
31+
ENV = "production"
32+
}
33+
34+
secrets = [
35+
{
36+
env_var_name = "API_TOKEN"
37+
secret_id = "my-api-token"
38+
version = "latest"
39+
}
40+
]
41+
}
42+
```
43+
44+
## Examples
45+
46+
Complete working examples are available in the [`examples/`](./examples/) directory:
47+
48+
- **[`simple-function/`](./examples/simple-function/)** - Basic scheduled function with minimal configuration
49+
50+
Each example includes:
51+
- Complete Terraform configuration
52+
- Sample function code with `requirements.txt`
53+
- Documentation on how to deploy and test
54+
55+
## Cross-Repository Usage
56+
57+
### Use Everywhere
58+
```hcl
59+
# Production: Pin to specific version
60+
module "backup" {
61+
source = "git::https://github.com/YourOrg/terraform-scheduled-function-module.git?ref=v1.0.0"
62+
# ... config
63+
}
64+
65+
# Development: Use latest
66+
module "test_function" {
67+
source = "git::https://github.com/YourOrg/terraform-scheduled-function-module.git"
68+
# ... config
69+
}
70+
```
71+
72+
## Usage Examples
73+
74+
### Multiple Functions
75+
```hcl
76+
module "daily_backup" {
77+
source = "git::https://github.com/YourOrg/terraform-scheduled-function-module.git?ref=v1.0.0"
78+
79+
function_name = "daily-backup"
80+
schedule = "0 2 * * *" # 2 AM daily
81+
source_dir = "./functions/backup"
82+
main_file = "backup.py"
83+
# ... other config
84+
}
85+
86+
module "weekly_reports" {
87+
source = "git::https://github.com/YourOrg/terraform-scheduled-function-module.git?ref=v1.0.0"
88+
89+
function_name = "weekly-reports"
90+
schedule = "0 9 * * 1" # Monday 9 AM
91+
source_dir = "./functions/reports"
92+
main_file = "reports.py"
93+
memory = "4096M"
94+
# ... other config
95+
}
96+
```
97+
98+
### Advanced Configuration
99+
```hcl
100+
module "data_processor" {
101+
source = "git::https://github.com/YourOrg/terraform-scheduled-function-module.git?ref=v1.0.0"
102+
103+
function_name = "data-processor"
104+
project_id = var.project_id
105+
secrets_project_id = var.secrets_project_id
106+
source_dir = "./functions/processor"
107+
main_file = "processor.py"
108+
schedule = "0 */6 * * *" # Every 6 hours
109+
110+
# Resource configuration
111+
memory = "4096M"
112+
timeout_seconds = 300
113+
max_instance_count = 3
114+
115+
# Multiple secrets
116+
secrets = [
117+
{
118+
env_var_name = "DATABASE_URL"
119+
secret_id = "postgres-connection"
120+
version = "latest"
121+
},
122+
{
123+
env_var_name = "API_KEY"
124+
secret_id = "external-api-key"
125+
version = "2"
126+
}
127+
]
128+
129+
}
130+
```
131+
132+
## Requirements & Inputs
133+
134+
### Required
135+
- `function_name` - Unique name for your function
136+
- `project_id` - GCP project for resources
137+
- `secrets_project_id` - GCP project containing secrets
138+
- `source_dir` - Path to function code
139+
- `main_file` - Python file name (e.g., "main.py"), relative to `source_dir`.
140+
- `schedule` - Cron expression (e.g., "0 9 * * 1-5")
141+
- `description` - Function description
142+
143+
### Optional (with defaults)
144+
- `region` - GCP region ("us-central1")
145+
- `runtime` - Function runtime ("python311")
146+
- `memory` - Memory allocation ("2048M")
147+
- `timeout_seconds` - Timeout (60)
148+
- `environment_variables` - Environment vars ({})
149+
- `secrets` - Secret Manager secrets ([])
150+
151+
## Outputs
152+
153+
- `function_name` - Name of deployed function
154+
- `function_url` - Function URL
155+
- `service_account_email` - Function service account
156+
- `scheduler_job_name` - Scheduler job name
157+
158+
## Repository Structure
159+
160+
```
161+
your-app-repo/
162+
├── terraform/
163+
│ └── main.tf # Uses module
164+
├── functions/
165+
│ ├── daily-backup/
166+
│ │ ├── main.py
167+
│ │ └── requirements.txt
168+
│ └── reports/
169+
│ ├── main.py
170+
│ └── requirements.txt
171+
└── README.md
172+
```
173+
174+
## Function Code Structure
175+
176+
### Required Files
177+
178+
Your source directory must contain:
179+
180+
```
181+
functions/my-task/
182+
├── main.py # Entry point function
183+
└── requirements.txt # Python dependencies (if needed)
184+
```
185+
186+
### Python Function Code
187+
188+
```python
189+
# functions/my-task/main.py
190+
import functions_framework
191+
192+
@functions_framework.cloud_event
193+
def main(cloud_event):
194+
"""Function entry point"""
195+
print("Task running!")
196+
return "Success"
197+
```
198+
199+
### Dependencies
200+
201+
If your function uses external packages, include a `requirements.txt` file. Cloud Functions automatically install dependencies from `requirements.txt` during deployment. No local installation is needed - the module simply packages your source code and lets Cloud Functions handle dependency management.
202+
203+
## Common Cron Patterns
204+
205+
| Schedule | Description |
206+
|----------|-------------|
207+
| `"0 9 * * 1-5"` | 9 AM weekdays |
208+
| `"0 */6 * * *"` | Every 6 hours |
209+
| `"0 2 * * *"` | 2 AM daily |
210+
| `"0 9 * * 1"` | Monday 9 AM |
211+
| `"*/15 * * * *"` | Every 15 minutes |
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
"""
2+
Simple health check function example for the scheduled function module.
3+
This function demonstrates how to structure your code for the module.
4+
"""
5+
6+
import os
7+
import json
8+
import logging
9+
import functions_framework
10+
from google.cloud import logging as cloud_logging
11+
12+
# Set up logging
13+
cloud_logging.Client().setup_logging()
14+
logger = logging.getLogger(__name__)
15+
16+
@functions_framework.cloud_event
17+
def main(cloud_event):
18+
"""Main function entry point for the scheduled health check.
19+
20+
Args:
21+
cloud_event: Cloud Functions event object
22+
23+
Returns:
24+
dict: Status information
25+
"""
26+
logger.info("Health check function started")
27+
28+
try:
29+
# Get environment variables
30+
env = os.environ.get('ENV', 'unknown')
31+
log_level = os.environ.get('LOG_LEVEL', 'INFO')
32+
api_token = os.environ.get('API_TOKEN', 'not-configured')
33+
34+
# Log configuration (don't log secrets!)
35+
logger.info(f"Environment: {env}")
36+
logger.info(f"Log level: {log_level}")
37+
logger.info(f"API token configured: {'Yes' if api_token != 'not-configured' else 'No'}")
38+
39+
# Perform health checks
40+
checks = {
41+
'environment_configured': env != 'unknown',
42+
'secrets_available': api_token != 'not-configured',
43+
'function_responsive': True,
44+
}
45+
46+
# Determine overall health
47+
all_healthy = all(checks.values())
48+
49+
result = {
50+
'status': 'healthy' if all_healthy else 'unhealthy',
51+
'timestamp': cloud_event['time'] if cloud_event else 'unknown',
52+
'checks': checks,
53+
'environment': env
54+
}
55+
56+
if all_healthy:
57+
logger.info("All health checks passed")
58+
else:
59+
logger.warning(f"Some health checks failed: {checks}")
60+
61+
return result
62+
63+
except Exception as e:
64+
logger.error(f"Health check failed with error: {str(e)}")
65+
return {
66+
'status': 'error',
67+
'error': str(e),
68+
'timestamp': cloud_event['time'] if cloud_event else 'unknown'
69+
}
70+
71+
if __name__ == "__main__":
72+
# For local testing
73+
print("Testing health check function locally...")
74+
result = main(None)
75+
print(json.dumps(result, indent=2))
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
functions-framework>=3.0.0
2+
google-cloud-logging>=3.0.0
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Simple example of using the scheduled function module
2+
# This shows the minimum configuration needed
3+
4+
terraform {
5+
required_providers {
6+
google = {
7+
source = "hashicorp/google"
8+
version = ">= 6.0.0"
9+
}
10+
}
11+
required_version = ">= 1.3.0"
12+
}
13+
14+
provider "google" {
15+
project = var.project_id
16+
region = var.region
17+
}
18+
19+
provider "google" {
20+
alias = "secrets"
21+
project = var.secrets_project_id
22+
region = var.region
23+
}
24+
25+
# Simple daily function example
26+
module "daily_health_check" {
27+
# When used from another repository, this would be:
28+
# source = "git::https://github.com/Khan/terraform-scheduled-function-module.git?ref=v1.0.0"
29+
source = "../.."
30+
31+
function_name = "daily-health-check"
32+
project_id = var.project_id
33+
secrets_project_id = var.secrets_project_id
34+
source_dir = "./function-code"
35+
main_file = "health_check.py"
36+
schedule = "0 9 * * *" # 9 AM daily
37+
description = "Daily health check function"
38+
39+
environment_variables = {
40+
ENV = "example"
41+
LOG_LEVEL = "INFO"
42+
}
43+
44+
secrets = [
45+
{
46+
env_var_name = "API_TOKEN"
47+
secret_id = "health-check-api-token"
48+
version = "latest"
49+
}
50+
]
51+
}
52+
53+
# Output the function details
54+
output "function_info" {
55+
description = "Information about the deployed function"
56+
value = {
57+
function_name = module.daily_health_check.function_name
58+
function_url = module.daily_health_check.function_url
59+
service_account_email = module.daily_health_check.service_account_email
60+
scheduler_job_name = module.daily_health_check.scheduler_job_name
61+
}
62+
}
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Variables for the simple function example
2+
3+
variable "project_id" {
4+
description = "The GCP project ID where resources will be created"
5+
type = string
6+
}
7+
8+
variable "region" {
9+
description = "The GCP region where resources will be created"
10+
type = string
11+
default = "us-central1"
12+
}
13+
14+
variable "secrets_project_id" {
15+
description = "The GCP project ID where secrets are stored"
16+
type = string
17+
}

0 commit comments

Comments
 (0)