Skip to content

Commit 9188388

Browse files
committed
docs: add E2E tests split refactoring plan
Add comprehensive refactoring document for splitting E2E tests into provision and configuration phases to address GitHub Actions network connectivity issues with LXD VMs. - Document problem: LXD VMs lack network connectivity on GitHub runners - Reference related issues and reproduction repositories - Propose solution: split into provision tests (LXD) and config tests (Docker) - Include detailed 16-step implementation plan with trackable progress - Add risk analysis and mitigation strategies - Reference proven Docker feasibility from virtualization support research
1 parent 51c1c64 commit 9188388

File tree

1 file changed

+231
-0
lines changed

1 file changed

+231
-0
lines changed
Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
# E2E Tests Split: Provision vs Configuration
2+
3+
## Summary
4+
5+
### Problem
6+
7+
The current E2E tests are failing on GitHub Actions runners due to network connectivity issues within LXD virtual machines. After successful VM provisioning, the VMs cannot install dependencies because they lack network connectivity in the GitHub Actions environment.
8+
9+
**Related Issues:**
10+
11+
- [GitHub Actions Runner Images Issue #13003](https://github.com/actions/runner-images/issues/13003) - Network connectivity issues with LXD VMs on GitHub runners
12+
- [Reproduction Repository](https://github.com/josecelano/test-docker-install-inside-vm-in-runner) - Test repository demonstrating the network connectivity issues
13+
- [Virtualization Support Research](https://github.com/josecelano/github-actions-virtualization-support) - Comprehensive testing of virtualization tools on GitHub Actions, demonstrating Docker feasibility
14+
- [Original Virtualization Investigation](https://github.com/actions/runner-images/issues/12933) - Background context on GitHub Actions virtualization support
15+
16+
### Current Deployment Phases
17+
18+
Our deployment workflow consists of these sequential phases:
19+
20+
1. **Provision** - Create infrastructure (VMs/containers) using OpenTofu/LXD
21+
2. **Configure** - Install and configure software using Ansible
22+
3. **Release** - Deploy application artifacts
23+
4. **Run** - Start and validate running services
24+
25+
Currently, all phases are tested together in a single E2E test suite, which fails due to the network connectivity issue in phase 2 (Configure).
26+
27+
## Solution
28+
29+
Split the E2E testing into two independent test suites:
30+
31+
### 1. E2E Provision Tests (`e2e-provision`)
32+
33+
- **Scope**: Test only the provisioning phase
34+
- **Technology**: Continue using LXD VMs via GitHub Actions
35+
- **Coverage**:
36+
- VM/container creation
37+
- Cloud-init completion
38+
- Basic infrastructure validation
39+
- **Success Criteria**: VM is created and cloud-init has finished successfully
40+
41+
### 2. E2E Configuration Tests (`e2e-config`)
42+
43+
- **Scope**: Test configuration, release, and run phases
44+
- **Technology**: Use Docker containers instead of VMs (proven feasible per [virtualization research](https://github.com/josecelano/github-actions-virtualization-support))
45+
- **Coverage**:
46+
- Ansible playbook execution
47+
- Software installation (Docker, Docker Compose, etc.)
48+
- Application deployment
49+
- Service validation
50+
- **Success Criteria**: All software is installed and services are running correctly
51+
52+
### Benefits
53+
54+
1. **Reliability**: Provision tests continue working on GitHub Actions
55+
2. **Speed**: Configuration tests run faster in Docker containers
56+
3. **Isolation**: Issues in one test suite don't block the other
57+
4. **Maintainability**: Each test suite has a single, focused responsibility
58+
5. **Debugging**: Easier to identify whether issues are in provisioning or configuration
59+
60+
## Implementation Plan
61+
62+
### Phase A: Create E2E Provision Tests
63+
64+
#### A.1: Define naming and structure
65+
66+
- [ ] **Task**: Define binary and workflow names
67+
- Binary: `e2e-provision-tests`
68+
- Workflow: `.github/workflows/test-e2e-provision.yml`
69+
- Purpose: Test infrastructure provisioning only
70+
71+
#### A.2: Create provision-only workflow
72+
73+
- [ ] **Task**: Create `.github/workflows/test-e2e-provision.yml`
74+
- Copy structure from existing `test-e2e.yml`
75+
- Use `cargo run --bin e2e-provision-tests`
76+
- Keep all LXD/OpenTofu setup steps
77+
- Remove Ansible installation (not needed for provision-only tests)
78+
79+
#### A.3: Create provision-only binary
80+
81+
- [ ] **Task**: Create `src/bin/e2e_provision_tests.rs`
82+
- Copy code from `src/bin/e2e_tests.rs`
83+
- Remove `configure_infrastructure` call in `run_full_deployment_test()`
84+
- Focus only on:
85+
- `cleanup_lingering_resources()`
86+
- `provision_infrastructure()`
87+
- Basic validation that VM is created and cloud-init completed
88+
- `cleanup_infrastructure()`
89+
90+
#### A.4: Update provision test validation
91+
92+
- [ ] **Task**: Modify validation logic in provision tests
93+
- Check VM/container exists and is running
94+
- Verify cloud-init has completed successfully
95+
- Validate basic network interface setup
96+
- Skip application-level validations
97+
98+
#### A.5: Test and commit provision workflow
99+
100+
- [ ] **Task**: Verify provision-only workflow works
101+
- Test locally: `cargo run --bin e2e-provision-tests`
102+
- Commit changes with conventional commit format
103+
- Verify new GitHub workflow passes
104+
- Update workflow status badges in README if needed
105+
106+
### Phase B: Create E2E Configuration Tests
107+
108+
#### B.1: Research Docker container approach
109+
110+
- [ ] **Task**: Design Docker-based test environment
111+
- **Reference**: Use proven approach from [virtualization support research](https://github.com/josecelano/github-actions-virtualization-support)
112+
- Create Ubuntu 24.04 base container configuration
113+
- Investigate cloud-init support in Docker (or alternative initialization)
114+
- Research testcontainers integration for Rust
115+
- Document container networking requirements for Ansible
116+
- **Advantage**: Docker is well-established and reliable on GitHub Actions
117+
118+
#### B.2: Create Docker configuration
119+
120+
- [ ] **Task**: Create `docker/test-ubuntu/Dockerfile`
121+
- Ubuntu 24.04 base image
122+
- Cloud-init installation (if feasible) or alternative init system
123+
- SSH server configuration for Ansible connectivity
124+
- Network configuration for container accessibility
125+
- Required system dependencies
126+
127+
#### B.3: Create configuration-only binary
128+
129+
- [ ] **Task**: Create `src/bin/e2e_config_tests.rs`
130+
- Copy code from original `src/bin/e2e_tests.rs` (before provision-only changes)
131+
- Replace LXD VM provisioning with Docker container setup
132+
- Implement Docker container lifecycle management
133+
- Keep all configuration, release, and run phase testing
134+
- Update infrastructure cleanup to handle Docker containers
135+
136+
#### B.4: Integrate testcontainers (optional)
137+
138+
- [ ] **Task**: Evaluate and potentially integrate testcontainers-rs
139+
- Add `testcontainers` crate dependency if beneficial
140+
- Implement container management through testcontainers API
141+
- Compare with direct Docker CLI approach
142+
- Document decision and rationale
143+
144+
#### B.5: Test configuration workflow locally
145+
146+
- [ ] **Task**: Validate configuration tests work locally
147+
- Test: `cargo run --bin e2e-config-tests`
148+
- Verify container creation and networking
149+
- Validate Ansible connectivity to container
150+
- Confirm all configuration/release/run phases complete
151+
- Test cleanup procedures
152+
153+
#### B.6: Create configuration workflow
154+
155+
- [ ] **Task**: Create `.github/workflows/test-e2e-config.yml`
156+
- Remove LXD/OpenTofu setup steps
157+
- Keep Ansible installation
158+
- Add Docker setup if needed
159+
- Use `cargo run --bin e2e-config-tests`
160+
- Configure appropriate timeout limits
161+
162+
#### B.7: Test and commit configuration workflow
163+
164+
- [ ] **Task**: Verify configuration workflow on GitHub Actions
165+
- Commit configuration test changes
166+
- Verify new GitHub workflow passes
167+
- Test that Docker containers work correctly in GitHub Actions
168+
- Validate all software installation steps complete
169+
170+
### Phase C: Integration and Documentation
171+
172+
#### C.1: Update documentation
173+
174+
- [ ] **Task**: Update relevant documentation
175+
- Update `docs/e2e-testing.md` to reflect new split approach
176+
- Document how to run each test suite independently
177+
- Update `README.md` workflow badges for both test suites
178+
- Add troubleshooting guide for each test type
179+
180+
#### C.2: Update legacy workflow
181+
182+
- [ ] **Task**: Update or deprecate original E2E workflow
183+
- Option 1: Remove `.github/workflows/test-e2e.yml` entirely
184+
- Option 2: Convert to meta-workflow that runs both new test suites
185+
- Update any CI dependencies or status checks
186+
187+
#### C.3: Cleanup old binary (optional)
188+
189+
- [ ] **Task**: Remove or repurpose `src/bin/e2e_tests.rs`
190+
- Remove if no longer needed
191+
- Or repurpose as meta-test runner for both suites
192+
- Update any related documentation
193+
194+
#### C.4: Validate complete solution
195+
196+
- [ ] **Task**: End-to-end validation
197+
- Verify both test suites pass independently
198+
- Test that they can run in parallel without conflicts
199+
- Validate comprehensive coverage across all deployment phases
200+
- Confirm GitHub Actions reliability improvements
201+
202+
## Success Criteria
203+
204+
1. **Provision Tests**: Consistently pass on GitHub Actions, testing VM creation and cloud-init
205+
2. **Configuration Tests**: Consistently pass on GitHub Actions, testing software installation and deployment
206+
3. **Independence**: Each test suite can run independently without interference
207+
4. **Coverage**: Combined test suites provide equivalent or better coverage than original tests
208+
5. **Performance**: Overall test execution time is equal or improved
209+
6. **Maintainability**: Clear separation of concerns makes debugging and maintenance easier
210+
211+
## Risks and Mitigations
212+
213+
### Risk: Docker environment differs from LXD VMs
214+
215+
- **Mitigation**: Carefully configure Docker container to match LXD VM environment
216+
- **Validation**: Cross-reference configurations between Docker and LXD templates
217+
218+
### Risk: Testcontainers adds complexity
219+
220+
- **Mitigation**: Start with direct Docker approach, only add testcontainers if clearly beneficial
221+
- **Fallback**: Direct Docker CLI integration is simpler and well-documented
222+
223+
### Risk: Loss of end-to-end coverage
224+
225+
- **Mitigation**: Ensure that provision tests validate infrastructure is ready for configuration
226+
- **Validation**: Document the interface contract between provision and configuration phases
227+
228+
### Risk: Increased maintenance burden
229+
230+
- **Mitigation**: Share common code between test suites through library modules
231+
- **Best Practice**: Keep test configurations as similar as possible between suites

0 commit comments

Comments
 (0)