Skip to content

Commit 03ac9c9

Browse files
authored
Add database backup functionality with GCS integration (#297)
<!-- Provide a brief summary of your changes --> Implements automated database backup functionality with Google Cloud Storage integration, including retention policies and MinIO support for local development. ## Motivation and Context <!-- Why is this change needed? What problem does it solve? --> This change adds critical database backup capabilities to ensure data durability and disaster recovery. The solution provides automated backups with configurable retention policies and supports both production (GCS) and development (MinIO) environments. ## How Has This Been Tested? <!-- Have you tested this in a real application? Which scenarios were tested? --> - Tested backup creation and restoration with MinIO in local development - Verified GCS bucket lifecycle policies for automatic deletion after 60 days - Tested backup retention and cleanup logic ## Breaking Changes <!-- Will users need to update their code or configurations? --> No breaking changes. This is an additive feature that doesn't affect existing functionality. ## Types of changes <!-- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> - [ ] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [x] Documentation update ## Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply. --> - [x] I have read the [MCP Documentation](https://modelcontextprotocol.io) - [x] My code follows the repository's style guidelines - [x] New and existing tests pass locally - [x] I have added appropriate error handling - [x] I have added or updated documentation as needed ## Additional context <!-- Add any other context, implementation notes, or design decisions --> - Implements 60-day retention period for backups as a safety net - Uses GCS lifecycle rules for automatic cleanup - Includes MinIO setup instructions for local development testing - Port-forwarding commands updated for consistency across documentation - Fixes #184
1 parent 8d7e471 commit 03ac9c9

File tree

12 files changed

+610
-132
lines changed

12 files changed

+610
-132
lines changed

deploy/README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ Pre-requisites:
4646
gcloud projects add-iam-policy-binding mcp-registry-prod --member="serviceAccount:[email protected]" --role="roles/container.admin"
4747
gcloud projects add-iam-policy-binding mcp-registry-prod --member="serviceAccount:[email protected]" --role="roles/compute.admin"
4848
gcloud projects add-iam-policy-binding mcp-registry-prod --member="serviceAccount:[email protected]" --role="roles/storage.admin"
49+
gcloud projects add-iam-policy-binding mcp-registry-prod --member="serviceAccount:[email protected]" --role="roles/storage.hmacKeyAdmin"
4950
gcloud iam service-accounts add-iam-policy-binding $(gcloud projects describe mcp-registry-prod --format="value(projectNumber)")[email protected] --member="serviceAccount:[email protected]" --role="roles/iam.serviceAccountUser"
5051
gcloud iam service-accounts keys create sa-key.json --iam-account=pulumi-svc@mcp-registry-prod.iam.gserviceaccount.com
5152
```
@@ -100,6 +101,7 @@ Pre-requisites:
100101
├── go.sum # Go module checksums
101102
└── pkg/ # Infrastructure packages
102103
├── k8s/ # Kubernetes deployment components
104+
│ ├── backup.go # Database backup configuration
103105
│ ├── cert_manager.go # SSL certificate management
104106
│ ├── deploy.go # Deployment orchestration
105107
│ ├── ingress.go # Ingress controller setup
@@ -123,6 +125,7 @@ Pre-requisites:
123125
- Certificate manager for SSL/TLS
124126
- Ingress controller for external access
125127
- Database for data persistence
128+
- Backup infrastructure for database
126129
- MCP Registry application
127130
128131
## Configuration
@@ -136,6 +139,48 @@ Pre-requisites:
136139
| `gcpProjectId` | GCP Project ID (required when provider=gcp) | No |
137140
| `gcpRegion` | GCP Region (default: us-central1) | No |
138141
142+
## Database Backups
143+
144+
The deployment uses [K8up](https://k8up.io/) (a Kubernetes backup operator) that uses [Restic](https://restic.net/) under the hood.
145+
146+
When running locally they are stored in a Minio bucket. In staging and production, backups are stored in a GCS bucket.
147+
148+
### Accessing Backup Files
149+
150+
#### Local Development (MinIO)
151+
152+
```bash
153+
# Expose MinIO web console
154+
kubectl port-forward -n minio svc/minio 9000:9000 9001:9001
155+
```
156+
157+
Then open [localhost:9001](http://localhost:9001), login with username `minioadmin` and password `minioadmin`, and navigate to the k8up-backups bucket.
158+
159+
##### Staging and Production (GCS)
160+
161+
- [Staging](https://console.cloud.google.com/storage/browser/mcp-registry-staging-backups?project=mcp-registry-staging)
162+
- [Production](https://console.cloud.google.com/storage/browser/mcp-registry-prod-backups?project=mcp-registry-prod)
163+
164+
#### Decrypting and Restoring Backups
165+
166+
Backups are encrypted using Restic. To access the backup data:
167+
168+
1. **Download the backup files from the bucket:**
169+
```bash
170+
# Local (MinIO) - ensure port-forward is active: kubectl port-forward -n minio svc/minio 9000:9000 9001:9001
171+
AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin \
172+
aws --endpoint-url http://localhost:9000 s3 sync s3://k8up-backups/ ./backup-files/
173+
174+
# GCS (staging/production)
175+
gsutil -m cp -r gs://mcp-registry-{staging|prod}-backups/* ./backup-files/
176+
```
177+
2. **[Install Restic](https://restic.readthedocs.io/en/latest/020_installation.html)**
178+
3. **Restore the backup:**
179+
```bash
180+
RESTIC_PASSWORD=password restic -r ./backup-files restore latest --target ./restored-files
181+
```
182+
PostgreSQL data will be in `./restored-files/data/registry-pg-1/pgdata/`
183+
139184
## Troubleshooting
140185
141186
### Check Status
@@ -154,3 +199,9 @@ kubectl get svc -n ingress-nginx
154199
kubectl logs -l app=mcp-registry
155200
kubectl logs -l app=postgres
156201
```
202+
203+
### Check Backup Status
204+
```bash
205+
kubectl describe schedule.k8up.io
206+
kubectl get backup
207+
```

deploy/go.mod

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,6 @@ go 1.23.0
55
toolchain go1.24.1
66

77
require (
8-
github.com/pulumi/pulumi-azure-native-sdk/containerservice v1.104.0
9-
github.com/pulumi/pulumi-azure-native-sdk/resources v1.104.0
108
github.com/pulumi/pulumi-gcp/sdk/v8 v8.39.0
119
github.com/pulumi/pulumi-kubernetes/sdk/v4 v4.18.2
1210
github.com/pulumi/pulumi/sdk/v3 v3.175.0
@@ -65,7 +63,6 @@ require (
6563
github.com/pkg/term v1.1.0 // indirect
6664
github.com/pulumi/appdash v0.0.0-20231130102222-75f619a67231 // indirect
6765
github.com/pulumi/esc v0.14.2 // indirect
68-
github.com/pulumi/pulumi-azure-native-sdk v1.104.0 // indirect
6966
github.com/rivo/uniseg v0.4.4 // indirect
7067
github.com/rogpeppe/go-internal v1.14.1 // indirect
7168
github.com/sabhiram/go-gitignore v0.0.0-20210923224102-525f6e181f06 // indirect

deploy/go.sum

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -154,12 +154,6 @@ github.com/pulumi/appdash v0.0.0-20231130102222-75f619a67231 h1:vkHw5I/plNdTr435
154154
github.com/pulumi/appdash v0.0.0-20231130102222-75f619a67231/go.mod h1:murToZ2N9hNJzewjHBgfFdXhZKjY3z5cYC1VXk+lbFE=
155155
github.com/pulumi/esc v0.14.2 h1:xHpjJXzKs1hk/QPpgwe1Rmif3VWA0QcZ7jDvTFYX/jM=
156156
github.com/pulumi/esc v0.14.2/go.mod h1:0dNzCWIiRUmdfFrhHdeBzU4GiDPBhSfpeWDNApZwZ08=
157-
github.com/pulumi/pulumi-azure-native-sdk v1.104.0 h1:vyD4PvKSOkwL1z9WTis3ZE9XC73UM/7AyMNek4Vm1+E=
158-
github.com/pulumi/pulumi-azure-native-sdk v1.104.0/go.mod h1:ZfkbJPR8poiJgy4IlNaa2NBjHLW37nsLY2BIbZp3lHc=
159-
github.com/pulumi/pulumi-azure-native-sdk/containerservice v1.104.0 h1:grLVzWH6pS5os8ZfAbEkdEbaF4BFoLMwDai9ZsOINqo=
160-
github.com/pulumi/pulumi-azure-native-sdk/containerservice v1.104.0/go.mod h1:QRMkKXRX3suaDR13VgN9jEkPl66YI/MZOyu0hxZzwuk=
161-
github.com/pulumi/pulumi-azure-native-sdk/resources v1.104.0 h1:oaqgOMuGswJooAyFFWkSn9r/m1IBVBxbEL7LIXgTjqI=
162-
github.com/pulumi/pulumi-azure-native-sdk/resources v1.104.0/go.mod h1:CTbJkLYp5Foi5ccHeDfowJ+lpeX9ciaz16VeIVBhqng=
163157
github.com/pulumi/pulumi-gcp/sdk/v8 v8.39.0 h1:3i6MPUPGkltfOkl8UphrYne0D4h6RcJ2oKBP0eqGLS8=
164158
github.com/pulumi/pulumi-gcp/sdk/v8 v8.39.0/go.mod h1:MJPBwFykzyl5Lp70PP3Ds32y5XrWOPvNuDcfpdXPY8o=
165159
github.com/pulumi/pulumi-kubernetes/sdk/v4 v4.18.2 h1:WKxxqw+94H4KhBWKRN79G9IhmBZewj8sPYQJclgHJx0=

deploy/main.go

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@ import (
88

99
"github.com/modelcontextprotocol/registry/deploy/infra/pkg/k8s"
1010
"github.com/modelcontextprotocol/registry/deploy/infra/pkg/providers"
11-
"github.com/modelcontextprotocol/registry/deploy/infra/pkg/providers/aks"
1211
"github.com/modelcontextprotocol/registry/deploy/infra/pkg/providers/gcp"
1312
"github.com/modelcontextprotocol/registry/deploy/infra/pkg/providers/local"
1413
)
@@ -22,8 +21,6 @@ func createProvider(ctx *pulumi.Context) (providers.ClusterProvider, error) {
2221
}
2322

2423
switch providerName {
25-
case "aks":
26-
return &aks.Provider{}, nil
2724
case "gcp":
2825
return &gcp.Provider{}, nil
2926
case "local":
@@ -51,8 +48,14 @@ func main() {
5148
return err
5249
}
5350

51+
// Create backup storage
52+
storage, err := provider.CreateBackupStorage(ctx, cluster, environment)
53+
if err != nil {
54+
return err
55+
}
56+
5457
// Deploy to Kubernetes
55-
_, err = k8s.DeployAll(ctx, cluster, environment)
58+
_, err = k8s.DeployAll(ctx, cluster, storage, environment)
5659
if err != nil {
5760
return err
5861
}

deploy/pkg/k8s/backup.go

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
package k8s
2+
3+
import (
4+
"fmt"
5+
6+
"github.com/pulumi/pulumi-kubernetes/sdk/v4/go/kubernetes/apiextensions"
7+
corev1 "github.com/pulumi/pulumi-kubernetes/sdk/v4/go/kubernetes/core/v1"
8+
"github.com/pulumi/pulumi-kubernetes/sdk/v4/go/kubernetes/helm/v3"
9+
metav1 "github.com/pulumi/pulumi-kubernetes/sdk/v4/go/kubernetes/meta/v1"
10+
"github.com/pulumi/pulumi-kubernetes/sdk/v4/go/kubernetes/yaml"
11+
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
12+
13+
"github.com/modelcontextprotocol/registry/deploy/infra/pkg/providers"
14+
)
15+
16+
// DeployK8up installs the k8up backup operator and configures scheduled backups
17+
func DeployK8up(ctx *pulumi.Context, cluster *providers.ProviderInfo, environment string, storage *providers.BackupStorageInfo) error {
18+
if storage == nil {
19+
ctx.Log.Info("No backup storage configured, skipping k8up deployment", nil)
20+
return nil
21+
}
22+
23+
// Install the k8up CRDs before the helm chart
24+
// Related: https://github.com/k8up-io/k8up/issues/1050
25+
k8upCRDs, err := yaml.NewConfigFile(ctx, "k8up-crds", &yaml.ConfigFileArgs{
26+
File: "https://github.com/k8up-io/k8up/releases/download/k8up-4.8.4/k8up-crd.yaml",
27+
}, pulumi.Provider(cluster.Provider))
28+
if err != nil {
29+
return fmt.Errorf("failed to install k8up CRDs: %w", err)
30+
}
31+
32+
// Install k8up operator
33+
k8upValues := pulumi.Map{
34+
"k8up": pulumi.Map{
35+
"backupCommandAnnotation": pulumi.String("k8up.io/backup-command"),
36+
"fileExtensionAnnotation": pulumi.String("k8up.io/file-extension"),
37+
},
38+
}
39+
40+
k8up, err := helm.NewChart(ctx, "k8up", helm.ChartArgs{
41+
Chart: pulumi.String("k8up"),
42+
Version: pulumi.String("4.8.4"),
43+
FetchArgs: helm.FetchArgs{
44+
Repo: pulumi.String("https://k8up-io.github.io/k8up"),
45+
},
46+
Values: k8upValues,
47+
}, pulumi.Provider(cluster.Provider), pulumi.DependsOn([]pulumi.Resource{k8upCRDs}))
48+
if err != nil {
49+
return fmt.Errorf("failed to install k8up: %w", err)
50+
}
51+
52+
// Create restic repository password secret
53+
repoPassword, err := corev1.NewSecret(ctx, "k8up-repo-password", &corev1.SecretArgs{
54+
Metadata: &metav1.ObjectMetaArgs{
55+
Name: pulumi.String("k8up-repo-password"),
56+
Namespace: pulumi.String("default"),
57+
Labels: pulumi.StringMap{
58+
"k8up.io/backup": pulumi.String("true"),
59+
},
60+
},
61+
Type: pulumi.String("Opaque"),
62+
StringData: pulumi.StringMap{
63+
"password": pulumi.String("password"), // In production we use GCS, which is already encrypted
64+
},
65+
}, pulumi.Provider(cluster.Provider))
66+
if err != nil {
67+
return fmt.Errorf("failed to create repository password secret: %w", err)
68+
}
69+
70+
// Determine schedule based on environment
71+
backupSchedule := "46 4 * * *" // Daily at 4:46 AM
72+
pruneSchedule := "46 5 * * *" // Daily at 5:46 AM
73+
keepDaily := 28 // Keep daily backups for 28 days
74+
75+
if environment == "local" || environment == "dev" {
76+
backupSchedule = "* * * * *" // Every minute for testing
77+
pruneSchedule = "*/5 * * * *" // Every 5 minutes
78+
keepDaily = 1
79+
}
80+
81+
// Create Schedule for automated backups
82+
_, err = apiextensions.NewCustomResource(ctx, "k8up-schedule", &apiextensions.CustomResourceArgs{
83+
ApiVersion: pulumi.String("k8up.io/v1"),
84+
Kind: pulumi.String("Schedule"),
85+
Metadata: &metav1.ObjectMetaArgs{
86+
Name: pulumi.String("backup-schedule"),
87+
Namespace: pulumi.String("default"),
88+
Labels: pulumi.StringMap{
89+
"environment": pulumi.String(environment),
90+
},
91+
},
92+
OtherFields: map[string]any{
93+
"spec": map[string]any{
94+
"backend": map[string]any{
95+
"repoPasswordSecretRef": map[string]any{
96+
"name": repoPassword.Metadata.Name().Elem(),
97+
"key": "password",
98+
},
99+
"s3": map[string]any{
100+
"endpoint": storage.Endpoint,
101+
"bucket": storage.BucketName,
102+
"accessKeyIDSecretRef": map[string]any{
103+
"name": storage.Credentials.Metadata.Name().Elem(),
104+
"key": "AWS_ACCESS_KEY_ID",
105+
},
106+
"secretAccessKeySecretRef": map[string]any{
107+
"name": storage.Credentials.Metadata.Name().Elem(),
108+
"key": "AWS_SECRET_ACCESS_KEY",
109+
},
110+
},
111+
},
112+
"backup": map[string]any{
113+
"schedule": backupSchedule,
114+
"podSecurityContext": map[string]any{
115+
"runAsUser": 0, // Run as root to access all files
116+
},
117+
"successfulJobsHistoryLimit": 3,
118+
"failedJobsHistoryLimit": 3,
119+
},
120+
"prune": map[string]any{
121+
"schedule": pruneSchedule,
122+
"retention": map[string]any{
123+
"keepDaily": keepDaily,
124+
},
125+
"successfulJobsHistoryLimit": 1,
126+
"failedJobsHistoryLimit": 1,
127+
},
128+
},
129+
},
130+
}, pulumi.Provider(cluster.Provider), pulumi.DependsOn([]pulumi.Resource{k8up, storage.Credentials, repoPassword}))
131+
if err != nil {
132+
return fmt.Errorf("failed to create k8up schedule: %w", err)
133+
}
134+
135+
return nil
136+
}

deploy/pkg/k8s/deploy.go

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ import (
88
)
99

1010
// DeployAll orchestrates the complete deployment of the MCP Registry to Kubernetes
11-
func DeployAll(ctx *pulumi.Context, cluster *providers.ProviderInfo, environment string) (service *corev1.Service, err error) {
11+
func DeployAll(ctx *pulumi.Context, cluster *providers.ProviderInfo, backupStorage *providers.BackupStorageInfo, environment string) (service *corev1.Service, err error) {
1212
// Setup cert-manager
1313
err = SetupCertManager(ctx, cluster)
1414
if err != nil {
@@ -27,6 +27,12 @@ func DeployAll(ctx *pulumi.Context, cluster *providers.ProviderInfo, environment
2727
return nil, err
2828
}
2929

30+
// Deploy k8up backup operator
31+
err = DeployK8up(ctx, cluster, environment, backupStorage)
32+
if err != nil {
33+
return nil, err
34+
}
35+
3036
// Deploy MCP Registry
3137
service, err = DeployMCPRegistry(ctx, cluster, environment, ingressNginx, pgCluster)
3238
if err != nil {

deploy/pkg/k8s/ingress.go

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,16 @@ func SetupIngressController(ctx *pulumi.Context, cluster *providers.ProviderInfo
3030
return nil, err
3131
}
3232

33+
// Usually we should expose the ingress to a LoadBalancer
34+
// This works in GCP and most local setups e.g. minikube (with minikube tunnel)
35+
// Kind unfortunately does not support LoadBalancer type, and hangs indefinitely. This is a workaround for that.
36+
serviceType := cluster.Name.ApplyT(func(name string) string {
37+
if name == "kind-kind" {
38+
return "NodePort"
39+
}
40+
return "LoadBalancer"
41+
}).(pulumi.StringOutput)
42+
3343
// Install NGINX Ingress Controller
3444
ingressNginx, err := helm.NewChart(ctx, "ingress-nginx", helm.ChartArgs{
3545
Chart: pulumi.String("ingress-nginx"),
@@ -41,7 +51,7 @@ func SetupIngressController(ctx *pulumi.Context, cluster *providers.ProviderInfo
4151
Values: pulumi.Map{
4252
"controller": pulumi.Map{
4353
"service": pulumi.Map{
44-
"type": pulumi.String("LoadBalancer"),
54+
"type": serviceType,
4555
"annotations": pulumi.Map{
4656
// Add Azure Load Balancer health probe annotation as otherwise it defaults to / which fails
4757
"service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path": pulumi.String("/healthz"),

deploy/pkg/k8s/postgres.go

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,7 @@ func DeployPostgresDatabases(ctx *pulumi.Context, cluster *providers.ProviderInf
4040
return nil, err
4141
}
4242

43-
// Create PostgreSQL cluster with proper timeout handling
44-
// Note: This may fail on first run until CloudNativePG operator is fully ready
43+
// Create PostgreSQL cluster
4544
pgCluster, err := apiextensions.NewCustomResource(ctx, "registry-pg", &apiextensions.CustomResourceArgs{
4645
ApiVersion: pulumi.String("postgresql.cnpg.io/v1"),
4746
Kind: pulumi.String("Cluster"),
@@ -67,4 +66,4 @@ func DeployPostgresDatabases(ctx *pulumi.Context, cluster *providers.ProviderInf
6766
}
6867

6968
return pgCluster, nil
70-
}
69+
}

0 commit comments

Comments
 (0)