Skip to content

Commit be00228

Browse files
committed
docs: add DRAFT issue spec for Docker and UFW firewall security strategy
Critical security issue discovered during Grafana implementation (#246): Docker bypasses UFW firewall rules when publishing ports, exposing services even with UFW default deny policy. This draft issue specification documents: - Problem: Docker manipulates iptables directly, bypassing UFW - Discovery: Prometheus port 9090 exposed despite UFW deny incoming policy - Original assumption: UFW would secure entire instance (INVALID) - Proposed solution: Layered approach (UFW for SSH, Docker for services) - Questions to investigate before making architectural decision - Required research, analysis, and ADR creation phases Related issues: - #246 - Grafana slice (where this was discovered) - torrust-demo#72 - Docker bypassing systemd-resolved Priority: CRITICAL - Affects security of all Docker-based deployments Status: DRAFT - Needs thorough analysis before implementation Next steps: Research → Analysis → ADR → Implementation
1 parent 99b1339 commit be00228

File tree

1 file changed

+341
-0
lines changed

1 file changed

+341
-0
lines changed
Lines changed: 341 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,341 @@
1+
# DRAFT: Docker and UFW Firewall Security Strategy
2+
3+
**Status**: DRAFT - Needs Analysis
4+
**Priority**: CRITICAL - Security Issue
5+
**Issue Type**: Architecture / Security
6+
**Related Issues**:
7+
8+
- [#246 - Grafana slice](./246-grafana-slice-release-run-commands.md) (where this was discovered)
9+
- [torrust-demo#72 - Docker bypassing systemd-resolved](https://github.com/torrust/torrust-demo/issues/72)
10+
11+
## Problem Statement
12+
13+
During implementation of issue #246 (Grafana slice), we discovered that **Docker bypasses UFW firewall rules**, exposing services even when UFW is configured with "deny incoming" default policy.
14+
15+
### Current Architecture Assumption (INVALID)
16+
17+
The original deployment strategy assumed:
18+
19+
1. Use UFW firewall to secure the entire VM instance
20+
2. Only open specific ports that should be publicly accessible
21+
3. Avoid provider-specific firewalls to maintain provider-agnostic deployment
22+
4. Default deny all incoming traffic except explicitly allowed services
23+
24+
**This assumption is INVALID** because Docker manipulates iptables directly, bypassing UFW rules.
25+
26+
### Discovered Security Issue
27+
28+
**Scenario**: Prometheus service configured in docker-compose with port binding:
29+
30+
```yaml
31+
prometheus:
32+
ports:
33+
- "9090:9090" # Binds to 0.0.0.0:9090
34+
```
35+
36+
**Expected Behavior**:
37+
38+
- UFW default policy: deny incoming
39+
- Port 9090 NOT explicitly allowed in UFW
40+
- Port 9090 should be inaccessible from external network
41+
42+
**Actual Behavior**:
43+
44+
- Prometheus UI accessible at `http://<vm-ip>:9090` from external network
45+
- UFW rules completely bypassed
46+
- Security breach - internal service exposed publicly
47+
48+
**Root Cause**: Docker creates iptables rules that take precedence over UFW rules when publishing ports with `0.0.0.0:<port>:<container-port>` binding.
49+
50+
### Where This Was Discovered
51+
52+
**File**: `templates/docker-compose/docker-compose.yml.tera`
53+
**Commit**: Security fix applied in commit 8323def
54+
**Issue**: #246 - Grafana slice implementation
55+
56+
**Evidence**:
57+
58+
```bash
59+
# UFW status shows port 9090 NOT allowed
60+
$ sudo ufw status | grep 9090
61+
# (no output - port not in UFW rules)
62+
63+
# But Prometheus is accessible externally
64+
$ curl http://10.140.190.35:9090
65+
HTTP/1.1 405 Method Not Allowed # Accessible!
66+
```
67+
68+
**Manual testing documentation**: [docs/e2e-testing/manual/grafana-testing-results.md](../e2e-testing/manual/grafana-testing-results.md)
69+
70+
## Original Security Strategy
71+
72+
The deployment was designed to:
73+
74+
1. **Use UFW exclusively** for firewall management (provider-agnostic)
75+
2. **Avoid provider-specific firewalls** (AWS Security Groups, Hetzner Cloud Firewall, etc.)
76+
3. **Maintain portability** across different hosting providers
77+
4. **Simple configuration** - single firewall mechanism (UFW)
78+
79+
**Rationale**: Integrating with multiple provider-specific firewalls would significantly increase complexity and make deployment harder across different providers.
80+
81+
**NOTE**: No ADR was created for this decision initially, but it was the working assumption.
82+
83+
## Potential Solution (Needs Validation)
84+
85+
### Proposed Strategy
86+
87+
Use a **layered security approach** combining UFW and Docker networking:
88+
89+
#### Layer 1: UFW Firewall (Instance-Level Protection)
90+
91+
- **Purpose**: Secure the entire VM instance
92+
- **Configuration**: Deny all incoming traffic except SSH
93+
- **Responsibility**: Prevent unauthorized access to the instance itself
94+
95+
```yaml
96+
# templates/ansible/configure-firewall.yml
97+
- Set default policy: deny incoming
98+
- Allow only SSH port (22 or custom)
99+
- Do NOT allow application ports (tracker, grafana, etc.)
100+
```
101+
102+
#### Layer 2: Docker Port Bindings (Service-Level Exposure)
103+
104+
- **Purpose**: Selectively expose services to external network
105+
- **Configuration**: Only bind ports for public-facing services
106+
- **Responsibility**: Control which services are accessible from outside
107+
108+
```yaml
109+
# templates/docker-compose/docker-compose.yml.tera
110+
111+
# Public services - port binding
112+
tracker:
113+
ports:
114+
- "8080:8080" # Public API
115+
- "6969:6969/udp" # Public tracker
116+
117+
grafana:
118+
ports:
119+
- "3100:3000" # Public UI
120+
121+
# Internal services - NO port binding
122+
prometheus:
123+
# No ports section - internal only
124+
# Accessed via Docker network: http://prometheus:9090
125+
126+
mysql:
127+
# No ports section - internal only
128+
# Accessed via Docker network: mysql:3306
129+
```
130+
131+
#### Layer 3: Docker Internal Networks (Inter-Service Communication)
132+
133+
- **Purpose**: Allow services to communicate securely within Docker
134+
- **Configuration**: Use Docker network names for service discovery
135+
- **Responsibility**: Internal service communication without external exposure
136+
137+
```yaml
138+
networks:
139+
backend_network: {}
140+
141+
services:
142+
grafana:
143+
networks:
144+
- backend_network
145+
# Connects to Prometheus via: http://prometheus:9090
146+
147+
prometheus:
148+
networks:
149+
- backend_network
150+
# Connects to Tracker via: http://tracker:8080
151+
```
152+
153+
### Key Principle
154+
155+
UFW secures the instance, Docker secures the services:
156+
157+
- UFW closes everything except SSH (instance-level security)
158+
- Docker port bindings control external service exposure (service-level security)
159+
- Docker networks enable internal service communication (no external exposure)
160+
161+
### Benefits
162+
163+
1. ✅ **Provider-agnostic** - Works on any VM provider without provider-specific firewall integration
164+
2. ✅ **Layered security** - Multiple security boundaries
165+
3. ✅ **Explicit exposure** - Port bindings make it clear what's public vs internal
166+
4. ✅ **Simple configuration** - No need for UFW rules per service
167+
5. ✅ **Docker-native** - Leverages Docker's built-in networking and security
168+
169+
### Drawbacks
170+
171+
1. ⚠️ **UFW not controlling application ports** - Relies on correct docker-compose configuration
172+
2. ⚠️ **Human error risk** - Mistakenly adding port binding exposes service immediately
173+
3. ⚠️ **No defense-in-depth for Docker** - If docker-compose misconfigured, service exposed
174+
4. ⚠️ **Trust in Docker networking** - Assumes Docker network isolation is secure
175+
176+
## Questions to Investigate
177+
178+
### Technical Questions
179+
180+
1. **Docker Network Isolation**: How secure is Docker's internal network isolation? Can containers on different networks communicate?
181+
182+
2. **Port Binding Risk**: What happens if a developer accidentally adds a port binding to an internal service? Is there any safeguard?
183+
184+
3. **iptables Priority**: Can we configure UFW to take precedence over Docker's iptables rules? (Likely not without breaking Docker)
185+
186+
4. **Alternative Solutions**:
187+
188+
- Could we use `127.0.0.1:<host-port>:<container-port>` bindings and nginx/reverse-proxy?
189+
- Should we integrate with provider-specific firewalls despite complexity?
190+
- Can we use Docker's built-in firewall features (docker-proxy, etc.)?
191+
192+
5. **Testing Strategy**: How do we automatically verify no unintended ports are exposed during E2E tests?
193+
194+
### Security Questions
195+
196+
1. **Threat Model**: What attack vectors exist with this approach?
197+
198+
- Misconfigured docker-compose exposing internal services
199+
- Docker daemon compromise
200+
- Container escape vulnerabilities
201+
202+
2. **Compliance**: Does this approach meet security best practices for production deployments?
203+
204+
3. **Monitoring**: How do we detect if internal services become accidentally exposed?
205+
206+
4. **Recovery**: If a service is exposed, what's the remediation process?
207+
208+
### Implementation Questions
209+
210+
1. **Migration**: How do we update existing deployments to this strategy?
211+
212+
2. **Documentation**: What warnings/guidance do we provide to prevent misconfigurations?
213+
214+
3. **Validation**: Can we add linting/validation to detect port bindings on internal services?
215+
216+
4. **Testing**: How do we test the security posture in E2E tests?
217+
218+
## Required Actions
219+
220+
### 1. Research Phase
221+
222+
- [ ] Study Docker networking security model
223+
- [ ] Review Docker iptables integration and UFW interaction
224+
- [ ] Research how other projects handle this (Kubernetes, Docker Swarm, etc.)
225+
- [ ] Analyze the torrust-demo#72 issue for related lessons learned
226+
- [ ] Review security best practices for Docker deployments
227+
- [ ] Investigate alternative firewall strategies
228+
229+
### 2. Analysis Phase
230+
231+
- [ ] Document threat model for proposed strategy
232+
- [ ] Analyze attack vectors and security boundaries
233+
- [ ] Compare with provider-specific firewall integration complexity
234+
- [ ] Evaluate trade-offs: simplicity vs security vs portability
235+
- [ ] Define clear security requirements
236+
237+
### 3. Design Phase
238+
239+
- [ ] Create comprehensive ADR for firewall security strategy
240+
- [ ] Define explicit rules for which services get port bindings
241+
- [ ] Design validation/linting for docker-compose security
242+
- [ ] Create security testing strategy for E2E tests
243+
- [ ] Document operational procedures (monitoring, incident response)
244+
245+
### 4. Implementation Phase
246+
247+
- [ ] Update all docker-compose templates with security principles
248+
- [ ] Remove unnecessary port bindings (like Prometheus 9090)
249+
- [ ] Add validation to prevent accidental exposures
250+
- [ ] Implement E2E security tests
251+
- [ ] Update documentation and user guides
252+
253+
### 5. Review Phase
254+
255+
- [ ] Security audit of implementation
256+
- [ ] Penetration testing
257+
- [ ] Documentation review
258+
- [ ] Team review and sign-off
259+
260+
## Immediate Actions (Already Taken)
261+
262+
As part of issue #246 implementation:
263+
264+
✅ **Security fix applied** (commit 8323def):
265+
266+
- Removed Prometheus port binding (`9090:9090`)
267+
- Added comments explaining internal-only services
268+
- Updated tests to verify port NOT exposed
269+
- Documented security issue in manual testing results
270+
271+
✅ **Documentation**:
272+
273+
- Recorded security issue discovery in [manual testing results](../e2e-testing/manual/grafana-testing-results.md)
274+
- Explained Docker bypassing UFW in commit messages
275+
- Created this draft issue specification
276+
277+
## Related Documentation
278+
279+
### Internal Documentation
280+
281+
- [Manual Grafana Testing Results](../e2e-testing/manual/grafana-testing-results.md) - Where security issue was discovered
282+
- [Issue #246 - Grafana Slice](./246-grafana-slice-release-run-commands.md) - Implementation that revealed the issue
283+
- [Firewall Ansible Playbook](../../templates/ansible/configure-firewall.yml) - Current UFW configuration
284+
285+
### External References
286+
287+
- [torrust-demo#72 - Docker bypassing systemd-resolved](https://github.com/torrust/torrust-demo/issues/72) - Related Docker bypass issue
288+
- Docker Documentation: [Packet filtering and firewalls](https://docs.docker.com/network/packet-filtering-firewalls/)
289+
- UFW and Docker: [Known interactions and issues](https://github.com/docker/for-linux/issues/690)
290+
291+
### Similar Problems in the Wild
292+
293+
- [UFW and Docker: The Problem](https://github.com/chaifeng/ufw-docker) - Community solutions
294+
- [Docker and Firewall Issues](https://www.techrepublic.com/article/how-to-fix-the-docker-and-ufw-security-flaw/)
295+
296+
## Priority Justification
297+
298+
**CRITICAL Priority** because:
299+
300+
1. **Security vulnerability** - Internal services can be accidentally exposed
301+
2. **Silent failure** - UFW shows correct configuration but doesn't protect
302+
3. **False sense of security** - Developers may assume UFW is protecting them
303+
4. **Production impact** - Affects all deployments using Docker
304+
5. **Architecture foundation** - Firewall strategy is fundamental to security
305+
306+
**Why DRAFT**:
307+
308+
- Requires thorough analysis before making architectural decisions
309+
- Need to validate proposed solution against security requirements
310+
- Must consider all alternatives and trade-offs
311+
- ADR required for such a fundamental decision
312+
313+
## Next Steps
314+
315+
1. **Schedule analysis session** - Dedicate time to research and analyze
316+
2. **Consult security resources** - Review Docker security best practices
317+
3. **Draft ADR** - Create comprehensive architectural decision record
318+
4. **Team review** - Get feedback on proposed strategy
319+
5. **Implement and test** - Apply solution across codebase
320+
6. **Document** - Update all relevant documentation
321+
322+
## Notes
323+
324+
- This issue was discovered during real-world manual E2E testing
325+
- The fix for Prometheus (removing port binding) is a band-aid, not a complete solution
326+
- We need a coherent, documented strategy for all current and future services
327+
- This affects not just this project but potentially all Torrust projects using Docker
328+
329+
## Open Questions for Discussion
330+
331+
1. Should we reconsider provider-specific firewall integration despite complexity?
332+
2. Is Docker network isolation sufficient for production security?
333+
3. What's the acceptable level of risk for accidental service exposure?
334+
4. Should we implement automated security scanning for port bindings?
335+
5. How do other similar projects (deployment tools for containerized apps) handle this?
336+
337+
---
338+
339+
**Created**: 2025-12-19
340+
**Discovered During**: Issue #246 - Grafana slice implementation
341+
**Needs**: Research → Analysis → ADR → Implementation

0 commit comments

Comments
 (0)