|
| 1 | +# E2E Configuration Testing with Docker Containers |
| 2 | + |
| 3 | +## Executive Summary |
| 4 | + |
| 5 | +For Phase B.1 of the E2E test split, this document outlines the research findings and approach for implementing Docker container-based configuration testing to replace LXD VM-based testing for the configuration, release, and run phases. |
| 6 | + |
| 7 | +## Background & Problem Statement |
| 8 | + |
| 9 | +- **Current Issue**: LXD VMs work for provisioning but fail during configuration phase due to network connectivity issues within VMs on GitHub Actions runners |
| 10 | +- **Root Cause**: GitHub Actions runners are themselves VMs, creating nested virtualization issues that prevent network connectivity required for software installation |
| 11 | +- **Solution**: Split testing into provision (LXD VMs) and configuration (Docker containers) phases |
| 12 | + |
| 13 | +## Configuration Testing Requirements |
| 14 | + |
| 15 | +Based on analysis of the current E2E workflow, the configuration tests need to validate: |
| 16 | + |
| 17 | +### 1. Software Installation (Configure Phase) |
| 18 | + |
| 19 | +- **Docker Installation**: Via Ansible playbook `install-docker.yml` |
| 20 | +- **Docker Compose Installation**: Via Ansible playbook `install-docker-compose.yml` |
| 21 | +- **APT Cache Updates**: Via `update-apt-cache.yml` |
| 22 | +- **Network connectivity**: For package downloads |
| 23 | + |
| 24 | +### 2. Infrastructure Validation (Test Phase) |
| 25 | + |
| 26 | +- **Cloud-init completion**: Verify initialization completed |
| 27 | +- **Docker service**: Verify Docker daemon is running |
| 28 | +- **Docker Compose**: Verify Docker Compose binary is functional |
| 29 | +- **SSH connectivity**: Ensure Ansible can connect and execute commands |
| 30 | + |
| 31 | +### 3. Current Ansible Workflow Integration |
| 32 | + |
| 33 | +- **Inventory management**: Dynamic inventory generation with container IP |
| 34 | +- **SSH-based execution**: Ansible connects via SSH to execute playbooks |
| 35 | +- **Privilege escalation**: Requires `sudo` access within container |
| 36 | +- **Ubuntu 24.04 target**: Current templates target Ubuntu 24.04 LTS |
| 37 | + |
| 38 | +## Docker Container Approach |
| 39 | + |
| 40 | +### Container Requirements |
| 41 | + |
| 42 | +1. **Base Image**: Ubuntu 24.04 LTS (to match current VM environment) |
| 43 | +2. **SSH Server**: OpenSSH server for Ansible connectivity |
| 44 | +3. **Systemd**: For service management (Docker daemon, etc.) |
| 45 | +4. **Sudo Access**: For privilege escalation during software installation |
| 46 | +5. **Network Access**: For package downloads and installations |
| 47 | +6. **Init System**: Alternative to cloud-init for container initialization |
| 48 | + |
| 49 | +### Container Configuration Strategy |
| 50 | + |
| 51 | +#### Option 1: Custom Dockerfile (Recommended) |
| 52 | + |
| 53 | +- **Base**: `ubuntu:24.04` |
| 54 | +- **SSH Setup**: Install and configure OpenSSH server |
| 55 | +- **Systemd**: Enable systemd for service management |
| 56 | +- **User Setup**: Create user with sudo access |
| 57 | +- **Network**: Default Docker networking (sufficient for GitHub Actions) |
| 58 | + |
| 59 | +#### Option 2: Pre-built Image with SSH |
| 60 | + |
| 61 | +- **Base**: Existing Ubuntu images with SSH enabled |
| 62 | +- **Pros**: Faster setup, less maintenance |
| 63 | +- **Cons**: Less control, may not match exact VM environment |
| 64 | + |
| 65 | +### Cloud-init Alternative |
| 66 | + |
| 67 | +Since cloud-init is VM-specific, containers need alternative initialization: |
| 68 | + |
| 69 | +1. **Container Init Scripts**: Custom initialization via entrypoint script |
| 70 | +2. **SSH Key Injection**: Mount SSH keys via Docker volumes or copy |
| 71 | +3. **User Provisioning**: Direct user/key setup instead of cloud-init |
| 72 | +4. **Service Initialization**: Direct systemd service management |
| 73 | + |
| 74 | +## Research References |
| 75 | + |
| 76 | +### Docker-in-VM Testing Research |
| 77 | + |
| 78 | +- **[Virtualization Support Research](https://github.com/josecelano/github-actions-virtualization-support)**: Comprehensive testing of virtualization tools on GitHub Actions, demonstrating Docker feasibility |
| 79 | +- **[Docker-in-VM Test Repository](https://github.com/josecelano/test-docker-install-inside-vm-in-runner)**: Specific research on Docker installation within VMs on GitHub Actions runners, documenting the network connectivity issues |
| 80 | + |
| 81 | +### Related Issues |
| 82 | + |
| 83 | +- **[GitHub Actions Runner Images Issue #13003](https://github.com/actions/runner-images/issues/13003)**: Network connectivity issues with LXD VMs on GitHub runners |
| 84 | +- **[Original Virtualization Investigation](https://github.com/actions/runner-images/issues/12933)**: Background context on GitHub Actions virtualization support |
| 85 | + |
| 86 | +## Testcontainers Integration Analysis |
| 87 | + |
| 88 | +### Benefits of testcontainers-rs |
| 89 | + |
| 90 | +- **Container Lifecycle Management**: Automatic startup/cleanup |
| 91 | +- **Network Management**: Automatic port mapping and network configuration |
| 92 | +- **Integration**: Well-integrated with Rust testing ecosystem |
| 93 | +- **Parallel Testing**: Multiple containers can run in parallel |
| 94 | + |
| 95 | +### Implementation Approach |
| 96 | + |
| 97 | +- **Generic Image**: Use `testcontainers::GenericImage` for Ubuntu container |
| 98 | +- **Custom Configuration**: Configure SSH, systemd, and networking |
| 99 | +- **Volume Mounting**: SSH keys and test artifacts |
| 100 | +- **Port Mapping**: SSH port (22) mapping for Ansible connectivity |
| 101 | + |
| 102 | +### Alternative: Direct Docker CLI |
| 103 | + |
| 104 | +- **Simpler Setup**: Direct `docker run` commands |
| 105 | +- **Less Dependencies**: No additional crates required |
| 106 | +- **Manual Management**: Explicit container lifecycle management |
| 107 | +- **More Control**: Direct control over Docker operations |
| 108 | + |
| 109 | +## Network Configuration |
| 110 | + |
| 111 | +### Ansible Connectivity Requirements |
| 112 | + |
| 113 | +1. **SSH Access**: Container must accept SSH connections |
| 114 | +2. **Port Mapping**: Map container SSH port to host |
| 115 | +3. **IP Address**: Deterministic container IP for Ansible inventory |
| 116 | +4. **DNS Resolution**: Container must resolve package repositories |
| 117 | + |
| 118 | +### GitHub Actions Networking |
| 119 | + |
| 120 | +- **Docker Networking**: Works reliably on GitHub Actions |
| 121 | +- **Port Mapping**: Standard Docker port mapping supported |
| 122 | +- **Internet Access**: Containers have internet access for package downloads |
| 123 | +- **No Nested Virtualization**: Avoids the LXD VM networking issues |
| 124 | + |
| 125 | +## Implementation Plan Summary |
| 126 | + |
| 127 | +### Phase B.1 Deliverables |
| 128 | + |
| 129 | +1. **Docker Configuration**: Create `docker/test-ubuntu/Dockerfile` |
| 130 | +2. **Container Setup**: Ubuntu 24.04 with SSH, systemd, sudo user |
| 131 | +3. **Integration Strategy**: Document testcontainers vs direct Docker approach |
| 132 | +4. **Network Requirements**: Document Ansible connectivity requirements |
| 133 | +5. **Cloud-init Alternative**: Design container initialization approach |
| 134 | + |
| 135 | +### Next Steps (B.2+) |
| 136 | + |
| 137 | +1. **Docker Implementation**: Build and test Docker configuration |
| 138 | +2. **Binary Creation**: Implement `e2e-config-tests` binary |
| 139 | +3. **Container Management**: Integrate container lifecycle with tests |
| 140 | +4. **Local Testing**: Validate complete workflow locally |
| 141 | +5. **CI Integration**: Create GitHub Actions workflow |
| 142 | + |
| 143 | +## Technical Architecture |
| 144 | + |
| 145 | +```text |
| 146 | +┌─────────────────────────────────────────────────────────┐ |
| 147 | +│ GitHub Actions Runner │ |
| 148 | +│ ┌─────────────────────────────────────────────────────┐│ |
| 149 | +│ │ e2e-config-tests binary ││ |
| 150 | +│ │ ┌─────────────────────────────────────────────────┐ ││ |
| 151 | +│ │ │ Docker Container │ ││ |
| 152 | +│ │ │ ┌─────────────────────────────────────────────┐ │ ││ |
| 153 | +│ │ │ │ Ubuntu 24.04 LTS │ │ ││ |
| 154 | +│ │ │ │ - SSH Server (port 22) │ │ ││ |
| 155 | +│ │ │ │ - Systemd (service management) │ │ ││ |
| 156 | +│ │ │ │ - Sudo user (ansible connectivity) │ │ ││ |
| 157 | +│ │ │ │ - Package management (apt) │ │ ││ |
| 158 | +│ │ │ └─────────────────────────────────────────────┘ │ ││ |
| 159 | +│ │ └─────────────────────────────────────────────────┘ ││ |
| 160 | +│ │ ▲ ││ |
| 161 | +│ │ │ SSH Connection ││ |
| 162 | +│ │ ▼ ││ |
| 163 | +│ │ ┌─────────────────────────────────────────────────┐ ││ |
| 164 | +│ │ │ Ansible Client │ ││ |
| 165 | +│ │ │ - install-docker.yml │ ││ |
| 166 | +│ │ │ - install-docker-compose.yml │ ││ |
| 167 | +│ │ │ - inventory generation │ ││ |
| 168 | +│ │ └─────────────────────────────────────────────────┘ ││ |
| 169 | +│ └─────────────────────────────────────────────────────┘│ |
| 170 | +└─────────────────────────────────────────────────────────┘ |
| 171 | +``` |
| 172 | + |
| 173 | +## Risk Assessment |
| 174 | + |
| 175 | +### Low Risk |
| 176 | + |
| 177 | +- **Docker Support**: Well-established and reliable on GitHub Actions |
| 178 | +- **Network Connectivity**: Docker containers have consistent internet access |
| 179 | +- **Package Installation**: No nested virtualization issues |
| 180 | + |
| 181 | +### Medium Risk |
| 182 | + |
| 183 | +- **Systemd in Containers**: May require special configuration |
| 184 | +- **SSH Setup**: Need to ensure SSH server starts correctly |
| 185 | +- **Performance**: Container overhead vs VM performance |
| 186 | + |
| 187 | +### Mitigation Strategies |
| 188 | + |
| 189 | +- **Systemd**: Use proven systemd-in-Docker patterns |
| 190 | +- **SSH Testing**: Validate SSH connectivity in local testing phase |
| 191 | +- **Documentation**: Comprehensive troubleshooting documentation |
| 192 | + |
| 193 | +## Conclusion |
| 194 | + |
| 195 | +Docker containers provide a viable and reliable alternative to LXD VMs for configuration testing. The approach addresses the core network connectivity issues while maintaining compatibility with the existing Ansible-based configuration workflow. The implementation should start with a custom Ubuntu 24.04 Dockerfile and consider testcontainers-rs integration for better test lifecycle management. |
0 commit comments