Skip to content

Commit 63eee73

Browse files
committed
docs: update Ansible testing strategy to LXD-only approach
- Update ansible-testing-strategy.md to reflect LXD-exclusive testing - Add decision record rejecting Docker containers for Ansible testing - Document comprehensive research comparing Docker vs LXD performance - Establish VM reuse strategy and sequential playbook execution methodology Key changes: - Rejected hybrid Docker/LXD approach due to Docker-in-Docker limitations - Adopted single-platform LXD strategy for complete testing coverage - Performance: ~17.6s setup + ~3-28s per playbook vs Docker's limitations - Sequential testing validates real deployment integration scenarios Resolves infrastructure testing strategy decisions for Torrust project.
1 parent aa7fb78 commit 63eee73

File tree

3 files changed

+1851
-51
lines changed

3 files changed

+1851
-51
lines changed
Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
# Decision Record: Rejecting Docker Containers for Ansible Testing
2+
3+
**Date**: September 2, 2025
4+
**Status**: Accepted
5+
**Decision Makers**: Infrastructure Team
6+
**Related Research**: [Docker vs LXD Ansible Testing Research](../research/docker-vs-lxd-ansible-testing.md)
7+
8+
## Context and Problem Statement
9+
10+
During the development of the Torrust Testing Infrastructure project, we needed to establish a testing strategy for Ansible playbooks. The initial hypothesis was that Docker containers could provide faster testing cycles compared to full VMs or LXD containers, potentially offering significant development velocity improvements.
11+
12+
The core question was: **Should we use lightweight Docker containers for Ansible playbook testing to achieve faster feedback loops?**
13+
14+
## Decision Drivers
15+
16+
- **Development Speed**: Faster test cycles enable quicker iteration
17+
- **Resource Efficiency**: Lower resource consumption for CI/CD pipelines
18+
- **Comprehensive Testing**: Need to test both infrastructure and application deployment playbooks
19+
- **Production Parity**: Test environment should behave like production cloud VMs
20+
- **Maintenance Overhead**: Simpler testing infrastructure reduces long-term costs
21+
22+
## Considered Options
23+
24+
### Option A: Docker Containers for Testing
25+
26+
**Approach**: Use Docker containers as Ansible testing targets
27+
28+
**Pros**:
29+
30+
- Fast container startup (~2-5 seconds)
31+
- Lightweight resource usage
32+
- Easy to create multiple test scenarios
33+
- Good integration with CI/CD systems
34+
- Familiar technology stack
35+
36+
**Cons**:
37+
38+
- Limited systemd support in containers
39+
- Cannot test Docker-in-Docker scenarios reliably
40+
- Restricted networking capabilities
41+
- Missing kernel-level features
42+
- Cannot validate cloud-init integration
43+
44+
### Option B: LXD Containers for Testing
45+
46+
**Approach**: Use LXD containers as Ansible testing targets
47+
48+
**Pros**:
49+
50+
- Full systemd support
51+
- Real Docker daemon capabilities
52+
- Complete networking stack
53+
- Cloud-init compatibility
54+
- Production-equivalent behavior
55+
56+
**Cons**:
57+
58+
- Slower than Docker containers (~17 seconds setup)
59+
- Higher resource usage
60+
- More complex initial setup
61+
62+
### Option C: Hybrid Approach
63+
64+
**Approach**: Use Docker for basic playbooks, LXD for complex scenarios
65+
66+
**Pros**:
67+
68+
- Optimized speed for simple tests
69+
- Complete coverage for complex scenarios
70+
71+
**Cons**:
72+
73+
- Increased maintenance complexity
74+
- Dual testing infrastructure
75+
- Potential inconsistencies between environments
76+
77+
## Decision Outcome
78+
79+
**Chosen Option**: Option B - LXD Containers Exclusively
80+
81+
**Rationale**: After comprehensive research and testing, we reject Docker containers for Ansible testing in favor of LXD containers for the following critical reasons:
82+
83+
## Key Findings That Led to Rejection
84+
85+
### 1. Docker-in-Docker Impossibility
86+
87+
**Problem**: Core Torrust infrastructure requires Docker daemon functionality for application deployment.
88+
89+
**Evidence**:
90+
91+
```bash
92+
# Docker container test result
93+
TASK [Start Docker service] *****
94+
fatal: [torrust-docker]: FAILED! => {
95+
"msg": "Could not start docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock"
96+
}
97+
```
98+
99+
**Impact**: Cannot test real application deployment scenarios that require Docker Compose stack management.
100+
101+
### 2. Systemd Service Management Failures
102+
103+
**Problem**: Many infrastructure playbooks require systemd service management.
104+
105+
**Evidence**:
106+
107+
```bash
108+
# Docker container limitation
109+
TASK [Enable UFW firewall service] *****
110+
fatal: [torrust-docker]: FAILED! => {
111+
"msg": "Could not enable service ufw: System has not been booted with systemd"
112+
}
113+
```
114+
115+
**Impact**: Cannot test essential infrastructure services like firewalls, networking, or service management.
116+
117+
### 3. Limited Network Configuration Testing
118+
119+
**Problem**: Firewall and networking configuration requires kernel-level capabilities.
120+
121+
**Evidence**:
122+
123+
- UFW firewall cannot be enabled in Docker containers
124+
- iptables manipulation is restricted
125+
- Network interface management is limited
126+
127+
**Impact**: Cannot validate network security configurations that are critical for production deployment.
128+
129+
### 4. Cloud-Init Integration Gap
130+
131+
**Problem**: Cloud-init testing cannot be properly simulated in Docker containers.
132+
133+
**Evidence**:
134+
135+
- No real cloud-init process execution
136+
- Missing cloud metadata simulation
137+
- Cannot test cloud-init dependent initialization sequences
138+
139+
**Impact**: Cannot validate the complete VM initialization process that occurs in production cloud environments.
140+
141+
## Performance Analysis
142+
143+
Despite Docker's speed advantage, the performance difference is not significant enough to justify the functional limitations:
144+
145+
| Metric | Docker Container | LXD Container | Difference |
146+
| -------------------- | ---------------- | -------------- | -------------- |
147+
| Initial Setup | ~3-5 seconds | ~17.6 seconds | +12-14 seconds |
148+
| Playbook Execution | ~4-5 seconds | ~3-28 seconds | Variable |
149+
| **Total Test Cycle** | ~7-10 seconds | ~20-45 seconds | +13-35 seconds |
150+
151+
**Analysis**: The 13-35 second overhead is acceptable when weighed against the comprehensive testing capabilities that LXD provides.
152+
153+
## Alternative Approaches Considered and Rejected
154+
155+
### 1. Simulation-Based Testing
156+
157+
**Approach**: Mock Docker daemon and systemd services in Docker containers
158+
**Rejection Reason**: Testing mocks instead of real services provides false confidence
159+
160+
### 2. Staged Testing Pipeline
161+
162+
**Approach**: Basic tests in Docker, comprehensive tests in LXD
163+
**Rejection Reason**: Dual infrastructure complexity outweighs benefits; inconsistent results between environments
164+
165+
### 3. Enhanced Docker Images
166+
167+
**Approach**: Pre-install Docker daemon and systemd in Docker containers
168+
**Rejection Reason**: Cannot overcome fundamental Docker-in-Docker and kernel-level limitations
169+
170+
## Implementation Consequences
171+
172+
### Positive Consequences
173+
174+
- **Complete Test Coverage**: All infrastructure and application playbooks can be tested
175+
- **Production Parity**: Test results accurately predict production behavior
176+
- **Single Testing Platform**: Reduced complexity and maintenance overhead
177+
- **Real Integration Testing**: Can validate complete deployment workflows
178+
179+
### Negative Consequences
180+
181+
- **Slower Initial Feedback**: ~17 seconds setup vs ~3 seconds for Docker
182+
- **Higher Resource Usage**: LXD containers consume more memory and CPU
183+
- **Setup Complexity**: LXD requires more initial configuration than Docker
184+
185+
### Mitigation Strategies
186+
187+
- **VM Reuse**: Reuse LXD containers across multiple playbook tests to amortize setup costs
188+
- **Sequential Testing**: Execute playbooks in deployment order to test real integration scenarios
189+
- **Parallel CI**: Run multiple LXD containers in parallel for different test scenarios
190+
191+
## Monitoring and Review
192+
193+
This decision will be reviewed if:
194+
195+
- Docker container technology evolves to support Docker-in-Docker reliably
196+
- Alternative lightweight virtualization technologies emerge
197+
- Performance requirements change significantly
198+
- Testing requirements become less comprehensive
199+
200+
## References
201+
202+
- [Docker vs LXD Ansible Testing Research](../research/docker-vs-lxd-ansible-testing.md)
203+
- [Ansible Testing Strategy](../research/ansible-testing-strategy.md)
204+
- [Docker-in-Docker Limitations Documentation](https://docs.docker.com/engine/security/rootless/#known-limitations)
205+
- [LXD vs Docker Comparison](https://ubuntu.com/blog/lxd-vs-docker)

0 commit comments

Comments
 (0)