Skip to content

Commit 07dffe0

Browse files
Nathan ParkerNathan Parker
authored andcommitted
e2e testing in cloud run
1 parent 5b210dd commit 07dffe0

File tree

7 files changed

+849
-14
lines changed

7 files changed

+849
-14
lines changed

.github/workflows/ci-cd.yml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -529,3 +529,39 @@ jobs:
529529
echo ""
530530
echo "✅ All smoke tests passed"
531531
532+
- name: Run comprehensive E2E tests
533+
env:
534+
PROJECT_ID: ${{ env.GCP_PROJECT_ID }}
535+
SERVICE_URL: ${{ steps.deploy.outputs.url }}
536+
run: |
537+
if [ -z "${SERVICE_URL}" ]; then
538+
echo "Error: SERVICE_URL is empty"
539+
exit 1
540+
fi
541+
542+
echo "Running comprehensive E2E tests against: ${SERVICE_URL}"
543+
544+
# Get an identity token for authenticated requests
545+
echo "Getting identity token..."
546+
IDENTITY_TOKEN=$(gcloud auth print-identity-token 2>&1)
547+
if [ $? -ne 0 ]; then
548+
echo "Error: Failed to get identity token"
549+
echo "${IDENTITY_TOKEN}"
550+
exit 1
551+
fi
552+
553+
# Install Python dependencies for pytest
554+
python -m pip install --upgrade pip
555+
pip install pytest requests
556+
557+
# Run Python E2E tests
558+
echo "Running Python E2E test suite..."
559+
export IDENTITY_TOKEN="${IDENTITY_TOKEN}"
560+
export USE_AUTH="true"
561+
pytest tests/integration/test_cloud_run_e2e.py -v --tb=short || {
562+
echo "❌ E2E tests failed"
563+
exit 1
564+
}
565+
566+
echo "✅ All E2E tests passed"
567+
Lines changed: 303 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,303 @@
1+
# Cloud-Agnostic Deployment Plan
2+
3+
## Overview
4+
5+
This document outlines the plan to make the supply-graph-ai system cloud-agnostic, allowing deployment to any cloud provider (GCP, AWS, Azure) or container hosting service (Digital Ocean, Heroku, etc.).
6+
7+
## Current State Analysis
8+
9+
### Already Abstracted ✅
10+
11+
1. **Storage Providers**: Already have abstraction layer
12+
- `src/core/storage/providers/` - GCP, AWS, Azure, Local
13+
- `StorageProvider` base class with unified interface
14+
15+
2. **Secrets Management**: Already have abstraction layer
16+
- `src/core/utils/secrets_manager.py` - Supports AWS, GCP, Azure, ENV
17+
- `SecretsProvider` enum with auto-detection
18+
19+
### GCP-Specific Components ❌
20+
21+
1. **Deployment Scripts**
22+
- `deploy/prometheus/deploy.sh` - Uses `gcloud` commands
23+
- CI/CD workflow (`.github/workflows/ci-cd.yml`) - GCP-specific
24+
25+
2. **Documentation**
26+
- `docs/development/gcp-cloud-run-setup.md` - GCP-specific setup
27+
- `docs/development/prometheus-cloud-run-setup.md` - GCP-specific
28+
29+
3. **Configuration**
30+
- Environment variables hardcoded for GCP
31+
- Service account setup scripts
32+
33+
4. **Monitoring/Logging**
34+
- Cloud Logging integration (GCP-specific)
35+
- Prometheus deployment script (GCP-specific)
36+
37+
## Task 1: Comprehensive End-to-End Testing
38+
39+
### Goals
40+
- Test all major API endpoints on deployed Cloud Run service
41+
- Verify authentication works correctly
42+
- Test actual functionality (not just health checks)
43+
- Create reusable test suite for any deployment
44+
45+
### Test Categories
46+
47+
#### 1.1 Health & System Endpoints
48+
- [ ] `GET /health` - Basic health check
49+
- [ ] `GET /health/readiness` - Readiness check
50+
- [ ] `GET /` - API information
51+
- [ ] `GET /v1` - API version info
52+
53+
#### 1.2 Authentication Tests
54+
- [ ] Test with valid API key
55+
- [ ] Test with invalid API key (401)
56+
- [ ] Test without authentication (401)
57+
- [ ] Test role-based access (read/write/admin)
58+
59+
#### 1.3 Core API Endpoints
60+
61+
**OKH Routes:**
62+
- [ ] `POST /v1/api/okh/create` - Create OKH manifest
63+
- [ ] `GET /v1/api/okh/{id}` - Retrieve OKH
64+
- [ ] `GET /v1/api/okh` - List OKH manifests
65+
- [ ] `PUT /v1/api/okh/{id}` - Update OKH
66+
- [ ] `DELETE /v1/api/okh/{id}` - Delete OKH
67+
68+
**OKW Routes:**
69+
- [ ] `POST /v1/api/okw/create` - Create OKW facility
70+
- [ ] `GET /v1/api/okw/{id}` - Retrieve OKW
71+
- [ ] `GET /v1/api/okw` - List/search OKW facilities
72+
- [ ] `PUT /v1/api/okw/{id}` - Update OKW
73+
- [ ] `DELETE /v1/api/okw/{id}` - Delete OKW
74+
75+
**Match Routes:**
76+
- [ ] `POST /v1/api/match` - Match OKH to OKW
77+
- [ ] `GET /v1/api/match/domains` - List domains
78+
- [ ] `POST /v1/api/match/validate` - Validate matching
79+
80+
**Supply Tree Routes:**
81+
- [ ] `POST /v1/api/supply-tree/create` - Create supply tree
82+
- [ ] `GET /v1/api/supply-tree/{id}` - Retrieve supply tree
83+
- [ ] `GET /v1/api/supply-tree` - List supply trees
84+
85+
**Utility Routes:**
86+
- [ ] `GET /v1/api/utility/domains` - List domains
87+
- [ ] `GET /v1/api/utility/contexts?domain=manufacturing` - Get contexts
88+
89+
#### 1.4 Error Handling Tests
90+
- [ ] 404 for non-existent resources
91+
- [ ] 422 for validation errors
92+
- [ ] 500 error handling (if possible to trigger safely)
93+
94+
#### 1.5 Integration Tests
95+
- [ ] Create OKH → Match → Generate Supply Tree workflow
96+
- [ ] Storage operations (create, read, update, delete)
97+
- [ ] Verify data persistence across requests
98+
99+
### Implementation Plan
100+
101+
1. **Create test script** (`scripts/test-cloud-run-e2e.sh`)
102+
- Accepts SERVICE_URL and API_KEY as parameters
103+
- Tests all endpoints systematically
104+
- Generates test report
105+
106+
2. **Create Python test suite** (`tests/integration/test_cloud_run_e2e.py`)
107+
- Uses pytest for structured testing
108+
- Can be run locally or in CI/CD
109+
- Generates detailed test reports
110+
111+
3. **Add to CI/CD pipeline**
112+
- Run after successful deployment
113+
- Fail pipeline if critical tests fail
114+
- Generate test reports as artifacts
115+
116+
## Task 2: Cloud-Agnostic Deployment Architecture
117+
118+
### Goals
119+
- Abstract all cloud-specific deployment logic
120+
- Support multiple cloud providers (GCP, AWS, Azure)
121+
- Support container hosting (Digital Ocean, Heroku, etc.)
122+
- Maintain backward compatibility with existing GCP deployment
123+
124+
### Architecture Design
125+
126+
#### 2.1 Deployment Abstraction Layer
127+
128+
```
129+
deploy/
130+
├── base/
131+
│ ├── __init__.py
132+
│ ├── deployer.py # Base Deployer class
133+
│ └── config.py # Base deployment config
134+
├── providers/
135+
│ ├── __init__.py
136+
│ ├── gcp/
137+
│ │ ├── __init__.py
138+
│ │ ├── cloud_run.py # GCP Cloud Run deployer
139+
│ │ ├── config.py # GCP-specific config
140+
│ │ └── iam.py # GCP IAM setup
141+
│ ├── aws/
142+
│ │ ├── __init__.py
143+
│ │ ├── ecs.py # AWS ECS deployer
144+
│ │ ├── fargate.py # AWS Fargate deployer
145+
│ │ └── config.py
146+
│ ├── azure/
147+
│ │ ├── __init__.py
148+
│ │ ├── container_apps.py # Azure Container Apps
149+
│ │ └── config.py
150+
│ └── digitalocean/
151+
│ ├── __init__.py
152+
│ ├── app_platform.py # Digital Ocean App Platform
153+
│ └── config.py
154+
├── scripts/
155+
│ ├── deploy.sh # Universal deployment script
156+
│ ├── setup-gcp.sh # GCP-specific setup
157+
│ ├── setup-aws.sh # AWS-specific setup
158+
│ └── setup-azure.sh # Azure-specific setup
159+
└── config/
160+
├── deployment.yaml # Deployment configuration
161+
└── providers/ # Provider-specific configs
162+
├── gcp.yaml
163+
├── aws.yaml
164+
└── azure.yaml
165+
```
166+
167+
#### 2.2 Configuration Management
168+
169+
**Unified Configuration Format** (`deploy/config/deployment.yaml`):
170+
```yaml
171+
provider: gcp # gcp, aws, azure, digitalocean, etc.
172+
environment: production
173+
region: us-west1
174+
175+
# Common settings
176+
service:
177+
name: supply-graph-ai
178+
image: ghcr.io/helpfulengineering/supply-graph-ai:latest
179+
port: 8080
180+
memory: 1Gi
181+
cpu: 2
182+
min_instances: 1
183+
max_instances: 100
184+
timeout: 300
185+
186+
# Provider-specific settings
187+
providers:
188+
gcp:
189+
project_id: nathan-playground-368310
190+
service_account: supply-graph-ai@${project_id}.iam.gserviceaccount.com
191+
artifact_registry: us-west1-docker.pkg.dev/${project_id}/cloud-run-source-deploy
192+
aws:
193+
cluster: supply-graph-ai-cluster
194+
task_definition: supply-graph-ai-task
195+
ecr_repository: supply-graph-ai
196+
azure:
197+
resource_group: supply-graph-ai-rg
198+
container_registry: supplygraphai.azurecr.io
199+
```
200+
201+
#### 2.3 Deployment Interface
202+
203+
**Base Deployer Class** (`deploy/base/deployer.py`):
204+
```python
205+
from abc import ABC, abstractmethod
206+
from typing import Dict, Any, Optional
207+
208+
class BaseDeployer(ABC):
209+
"""Base class for cloud provider deployers"""
210+
211+
@abstractmethod
212+
def setup(self, config: Dict[str, Any]) -> None:
213+
"""Setup cloud resources (IAM, storage, etc.)"""
214+
pass
215+
216+
@abstractmethod
217+
def deploy(self, config: Dict[str, Any]) -> str:
218+
"""Deploy service and return service URL"""
219+
pass
220+
221+
@abstractmethod
222+
def get_service_url(self, service_name: str) -> str:
223+
"""Get the deployed service URL"""
224+
pass
225+
226+
@abstractmethod
227+
def update(self, config: Dict[str, Any]) -> None:
228+
"""Update existing deployment"""
229+
pass
230+
231+
@abstractmethod
232+
def delete(self, service_name: str) -> None:
233+
"""Delete deployment"""
234+
pass
235+
```
236+
237+
#### 2.4 CI/CD Abstraction
238+
239+
**Multi-Provider CI/CD** (`.github/workflows/`):
240+
- `ci-cd-gcp.yml` - GCP-specific workflow
241+
- `ci-cd-aws.yml` - AWS-specific workflow
242+
- `ci-cd-azure.yml` - Azure-specific workflow
243+
- `ci-cd-base.yml` - Shared workflow steps (lint, test, build)
244+
245+
**Workflow Selection**:
246+
- Use matrix strategy to test multiple providers
247+
- Or use separate workflows triggered by branch/tag
248+
- Or use environment-based selection
249+
250+
### Implementation Steps
251+
252+
#### Phase 1: Refactor Existing GCP Deployment
253+
1. Extract GCP-specific logic into `deploy/providers/gcp/`
254+
2. Create base deployer interface
255+
3. Implement GCP deployer using base interface
256+
4. Update CI/CD to use new deployer
257+
5. Test backward compatibility
258+
259+
#### Phase 2: Add AWS Support
260+
1. Implement AWS deployer (`deploy/providers/aws/`)
261+
2. Create AWS setup scripts
262+
3. Add AWS CI/CD workflow
263+
4. Test AWS deployment
264+
265+
#### Phase 3: Add Azure Support
266+
1. Implement Azure deployer (`deploy/providers/azure/`)
267+
2. Create Azure setup scripts
268+
3. Add Azure CI/CD workflow
269+
4. Test Azure deployment
270+
271+
#### Phase 4: Add Container Hosting Support
272+
1. Implement Digital Ocean deployer
273+
2. Add support for generic Docker/Kubernetes
274+
3. Create universal deployment script
275+
276+
#### Phase 5: Documentation & Testing
277+
1. Update all deployment documentation
278+
2. Create provider comparison guide
279+
3. Add migration guides
280+
4. Comprehensive testing across providers
281+
282+
### Key Design Principles
283+
284+
1. **Backward Compatibility**: Existing GCP deployment should continue to work
285+
2. **Progressive Enhancement**: Add providers incrementally
286+
3. **Configuration Over Code**: Use YAML/config files for provider settings
287+
4. **Fail Fast**: Validate configuration before deployment
288+
5. **Clear Abstractions**: Each provider implements same interface
289+
6. **Documentation**: Each provider has setup guide
290+
291+
### Migration Strategy
292+
293+
1. **Phase 1**: Refactor GCP deployment (no breaking changes)
294+
2. **Phase 2**: Add new providers alongside GCP
295+
3. **Phase 3**: Deprecate old GCP scripts (with migration guide)
296+
4. **Phase 4**: Full multi-provider support
297+
298+
## Next Steps
299+
300+
1. **Start with Task 1**: Implement comprehensive E2E tests
301+
2. **Then Task 2**: Begin cloud-agnostic refactoring
302+
3. **Iterate**: Add providers incrementally based on needs
303+

0 commit comments

Comments
 (0)