Skip to content

Commit 879e3be

Browse files
committed
chore: refactor and productionize
1 parent d1daf2d commit 879e3be

28 files changed

+682
-190
lines changed

k8s/README.md

Lines changed: 144 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,90 +1,193 @@
11
# Service Quality Oracle - Kubernetes Deployment
22

3-
This directory contains Kubernetes manifests for deploying the Service Quality Oracle with persistent state management.
3+
This directory contains Kubernetes manifests for deploying the Service Quality Oracle in different environments using Kustomize with persistent state management.
4+
5+
## Structure
6+
7+
```
8+
k8s/
9+
├── README.md # This file
10+
├── auth.sh # Global auth script (configure for your cluster)
11+
├── base/ # Common base resources
12+
│ ├── kustomization.yaml
13+
│ ├── namespace.yaml
14+
│ ├── deployment.yaml
15+
│ ├── service.yaml
16+
│ ├── servicemonitor.yaml
17+
│ ├── serviceaccount.yaml
18+
│ └── podmonitor.yaml
19+
└── environments/
20+
├── mainnet/ # Production environment
21+
│ ├── kustomization.yaml
22+
│ ├── config.yaml # Mainnet configuration
23+
│ ├── config.secret.yaml # Mainnet secrets (configure before use)
24+
│ ├── persistent-volume-claim.yaml
25+
│ ├── auth.sh
26+
│ ├── apply.sh
27+
│ ├── diff.sh
28+
│ └── restart-deployments.sh
29+
└── testnet/ # Staging environment
30+
├── kustomization.yaml
31+
├── config.yaml # Testnet configuration
32+
├── config.secret.yaml # Testnet secrets (configure before use)
33+
├── persistent-volume-claim.yaml
34+
├── auth.sh
35+
├── apply.sh
36+
├── diff.sh
37+
└── restart-deployments.sh
38+
```
439

540
## Prerequisites
641

742
- Kubernetes cluster (version 1.19+)
843
- `kubectl` configured to access your cluster
44+
- Kustomize (built into kubectl v1.14+)
945
- Docker image published to `ghcr.io/graphprotocol/service-quality-oracle`
1046
- **Storage class configured** (see Storage Configuration below)
1147

1248
## Quick Start
1349

14-
### 1. Create Secrets (Required)
50+
### 1. Configure Cluster Access
51+
52+
Update `auth.sh` with your GKE cluster details:
53+
54+
```bash
55+
# Edit auth.sh
56+
vim auth.sh
57+
58+
# Connect to your cluster
59+
./auth.sh
60+
```
61+
62+
### 2. Deploy to Testnet
63+
64+
```bash
65+
cd environments/testnet
66+
67+
# Configure secrets (replace placeholder values)
68+
vim config.secret.yaml
69+
70+
# Preview changes
71+
./diff.sh
72+
73+
# Deploy
74+
./apply.sh
75+
76+
# Monitor
77+
kubectl logs -f deployment/service-quality-oracle -n service-quality-oracle
78+
```
79+
80+
### 3. Deploy to Mainnet
1581

1682
```bash
17-
# Copy the example secrets file
18-
cp k8s/secrets.yaml.example k8s/secrets.yaml
83+
cd environments/mainnet
1984

20-
# Edit with your actual credentials
21-
# IMPORTANT: Never commit secrets.yaml to version control
22-
nano k8s/secrets.yaml
85+
# Configure secrets (replace placeholder values with production keys)
86+
vim config.secret.yaml
87+
88+
# Configure mainnet contract address
89+
vim config.yaml
90+
# Update BLOCKCHAIN_CONTRACT_ADDRESS with actual mainnet contract
91+
92+
# Preview changes
93+
./diff.sh
94+
95+
# Deploy (includes safety checks)
96+
./apply.sh
97+
98+
# Monitor
99+
kubectl logs -f deployment/service-quality-oracle -n service-quality-oracle
23100
```
24101

25-
**Required secrets:**
102+
## Environment Configuration
103+
104+
### Environment Differences
105+
106+
| Setting | Testnet | Mainnet |
107+
|---------|---------|---------|
108+
| Chain | Arbitrum Sepolia | Arbitrum One |
109+
| Contract | 0x6d5...91f6 | Configure in config.yaml |
110+
| Image Tag | testnet-latest | mainnet-latest |
111+
| Labels | environment: testnet, variant: staging | environment: mainnet, variant: production |
112+
113+
### Secret Configuration
114+
115+
Before deploying, you must configure the following secrets in each environment's `config.secret.yaml`:
116+
26117
- **`google-credentials`**: Service account JSON for BigQuery access
27-
- **`blockchain-private-key`**: Private key for Arbitrum Sepolia transactions
118+
- **`blockchain-private-key`**: Private key for blockchain transactions (64 chars, no 0x)
119+
- **`etherscan-api-key`**: Etherscan API key
28120
- **`arbitrum-api-key`**: API key for Arbiscan contract verification
29-
- **`slack-webhook-url`**: Webhook URL for operational notifications
121+
- **`studio-api-key`**: The Graph Studio API key
122+
- **`studio-deploy-key`**: The Graph Studio deploy key
123+
- **`slack-webhook-url`**: Slack webhook for notifications
30124

31-
### 2. Configure Storage (Required)
125+
## Storage Configuration
32126

33127
```bash
34128
# Check available storage classes
35129
kubectl get storageclass
36130

37-
# If you see a default storage class (marked with *), skip to step 3
38-
# Otherwise, edit persistent-volume-claim.yaml and uncomment the appropriate storageClassName
131+
# The manifests use 'ssd-retain' storage class by default
132+
# Edit environments/{mainnet,testnet}/persistent-volume-claim.yaml if needed
39133
```
40134

41135
**Common storage classes by platform:**
42136
- **AWS EKS**: `gp2`, `gp3`, `ebs-csi`
43-
- **Google GKE**: `standard`, `ssd`
137+
- **Google GKE**: `standard`, `ssd`
44138
- **Azure AKS**: `managed-premium`, `managed`
45139
- **Local/Development**: `hostpath`, `local-path`
46140

47-
### 3. Deploy to Kubernetes
141+
## Operations
142+
143+
### Restart Deployments
48144

49145
```bash
50-
# Apply all manifests
51-
kubectl apply -f k8s/
146+
./restart-deployments.sh
147+
```
148+
149+
### View Logs
52150

53-
# Verify deployment
54-
kubectl get pods -l app=service-quality-oracle
55-
kubectl get pvc -l app=service-quality-oracle
151+
```bash
152+
kubectl logs -f deployment/service-quality-oracle -n service-quality-oracle
56153
```
57154

58-
### 4. Monitor Deployment
155+
### Check Status
59156

60157
```bash
61-
# Check pod status
62-
kubectl describe pod -l app=service-quality-oracle
158+
kubectl get all -n service-quality-oracle
159+
```
63160

64-
# View logs
65-
kubectl logs -l app=service-quality-oracle -f
161+
### Delete Environment
66162

67-
# Check persistent volumes
68-
kubectl get pv
163+
```bash
164+
kubectl delete -k .
69165
```
70166

167+
## Monitoring
168+
169+
- Prometheus scraping enabled via annotations
170+
- ServiceMonitor and PodMonitor configured for metrics collection
171+
- Metrics exposed on port 8000 at `/metrics` endpoint
172+
- Labels applied for environment-specific alerting
173+
71174
## Architecture
72175

73176
### Persistent Storage
74177

75178
The service uses **two persistent volumes** to maintain state across pod restarts:
76179

77-
- **`service-quality-oracle-data` (5GB)**: Circuit breaker state, last run tracking, BigQuery cache, CSV outputs
78-
- **`service-quality-oracle-logs` (2GB)**: Application logs
180+
- **`service-quality-oracle-data` (10GB)**: Circuit breaker state, last run tracking, BigQuery cache, CSV outputs
181+
- **`service-quality-oracle-logs` (5GB)**: Application logs
79182

80183
**Mount points:**
81184
- `/app/data` → Critical state files (circuit breaker, cache, outputs)
82185
- `/app/logs` → Application logs
83186

84187
### Configuration Management
85188

86-
**Non-sensitive configuration**`ConfigMap` (`configmap.yaml`)
87-
**Sensitive credentials**`Secret` (`secrets.yaml`)
189+
**Non-sensitive configuration**`ConfigMap` (generated from `config.yaml`)
190+
**Sensitive credentials**`Secret` (generated from `config.secret.yaml`)
88191

89192
This separation provides:
90193
- ✅ Easy configuration updates without rebuilding images
@@ -98,7 +201,7 @@ This separation provides:
98201
- Memory: 512M
99202

100203
**Limits (maximum):**
101-
- CPU: 1000m (1.0 core)
204+
- CPU: 1000m (1.0 core)
102205
- Memory: 1G
103206

104207
## State Persistence Benefits
@@ -127,7 +230,7 @@ kubectl describe pod -l app=service-quality-oracle
127230

128231
# Common issues:
129232
# - Missing secrets
130-
# - PVC provisioning failures
233+
# - PVC provisioning failures
131234
# - Image pull errors
132235
```
133236

@@ -151,13 +254,16 @@ kubectl exec -it deployment/service-quality-oracle -- env | grep -E "(BIGQUERY|B
151254
kubectl exec -it deployment/service-quality-oracle -- ls -la /etc/secrets
152255
```
153256

154-
## Security Best Practices
257+
## Security
155258

156-
**Secrets never committed** to version control
259+
**Never commit actual secrets** - `config.secret.yaml` files contain placeholders only
260+
**Mainnet deployment safety checks** for production secrets
261+
**Non-root containers** with dropped capabilities
157262
**Service account** with minimal BigQuery permissions
158263
**Private key** stored in Kubernetes secrets (base64 encoded)
159264
**Resource limits** prevent resource exhaustion
160-
**Read-only filesystem** where possible
265+
**Workload Identity** configured for secure GCP access
266+
**SSD storage with retention** for data persistence
161267

162268
## Production Considerations
163269

@@ -171,7 +277,7 @@ kubectl exec -it deployment/service-quality-oracle -- ls -la /etc/secrets
171277
## Next Steps
172278

173279
1. **Test deployment** in staging environment
174-
2. **Verify state persistence** across pod restarts
280+
2. **Verify state persistence** across pod restarts
175281
3. **Set up monitoring** and alerting
176282
4. **Configure backup** for persistent volumes
177-
5. **Enable quality checking** after successful validation
283+
5. **Enable quality checking** after successful validation

k8s/auth.sh

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/bin/bash
2+
3+
# Get cluster credentials for service-quality-oracle deployment
4+
# Update the following values based on your GKE cluster configuration:
5+
# - PROJECT: Your GCP project ID
6+
# - CLUSTER: Your GKE cluster name
7+
# - ZONE: Your GKE cluster zone/region
8+
9+
PROJECT="graph-mainnet"
10+
CLUSTER="network"
11+
ZONE="us-central1-a"
12+
13+
gcloud container clusters get-credentials $CLUSTER --project $PROJECT --zone $ZONE

k8s/deployment.yaml renamed to k8s/base/deployment.yaml

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,20 @@ spec:
1313
metadata:
1414
labels:
1515
app: service-quality-oracle
16+
annotations:
17+
prometheus.io/scrape: "true"
18+
prometheus.io/path: "/metrics"
19+
prometheus.io/port: "8000"
1620
spec:
21+
imagePullSecrets:
22+
- name: docker-registry
1723
containers:
1824
- name: service-quality-oracle
1925
image: ghcr.io/graphprotocol/service-quality-oracle:latest
26+
imagePullPolicy: IfNotPresent
27+
ports:
28+
- containerPort: 8000
29+
name: metrics
2030
envFrom:
2131
# Load all non-sensitive configuration from ConfigMap
2232
- configMapRef:
@@ -89,11 +99,21 @@ spec:
8999
- "import os; assert os.path.exists('/app/healthcheck'), 'Healthcheck file missing'"
90100
initialDelaySeconds: 10
91101
periodSeconds: 30
102+
securityContext:
103+
allowPrivilegeEscalation: false
104+
runAsNonRoot: true
105+
runAsUser: 1000
106+
runAsGroup: 1000
107+
readOnlyRootFilesystem: false
108+
capabilities:
109+
drop:
110+
- ALL
92111
volumes:
93112
- name: data-volume
94113
persistentVolumeClaim:
95114
claimName: service-quality-oracle-data
96115
- name: logs-volume
97116
persistentVolumeClaim:
98117
claimName: service-quality-oracle-logs
99-
restartPolicy: Always
118+
serviceAccountName: service-quality-oracle
119+
restartPolicy: Always

k8s/base/kustomization.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
4+
resources:
5+
- namespace.yaml
6+
- deployment.yaml
7+
- service.yaml
8+
- servicemonitor.yaml
9+
- serviceaccount.yaml
10+
- podmonitor.yaml

k8s/base/namespace.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
apiVersion: v1
2+
kind: Namespace
3+
metadata:
4+
name: service-quality-oracle
5+
labels:
6+
name: service-quality-oracle

k8s/base/podmonitor.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
apiVersion: monitoring.coreos.com/v1
2+
kind: PodMonitor
3+
metadata:
4+
name: service-quality-oracle
5+
labels:
6+
app: service-quality-oracle
7+
spec:
8+
selector:
9+
matchLabels:
10+
app: service-quality-oracle
11+
podMetricsEndpoints:
12+
- port: metrics
13+
path: /metrics
14+
interval: 30s
15+
scrapeTimeout: 10s

k8s/base/service.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
name: service-quality-oracle
5+
labels:
6+
app: service-quality-oracle
7+
spec:
8+
selector:
9+
app: service-quality-oracle
10+
ports:
11+
- name: metrics
12+
port: 8000
13+
targetPort: 8000
14+
protocol: TCP
15+
type: ClusterIP

k8s/base/serviceaccount.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
apiVersion: v1
2+
kind: ServiceAccount
3+
metadata:
4+
name: service-quality-oracle
5+
labels:
6+
app: service-quality-oracle
7+
annotations:
8+
iam.gke.io/gcp-service-account: [email protected]
9+
automountServiceAccountToken: false

0 commit comments

Comments
 (0)