Skip to content

Commit af93704

Browse files
committed
docs: [#251] add ADR for docker security scan exit code zero philosophy
1 parent 92b3325 commit af93704

File tree

2 files changed

+129
-0
lines changed

2 files changed

+129
-0
lines changed

docs/decisions/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ This directory contains architectural decision records for the Torrust Tracker D
66

77
| Status | Date | Decision | Summary |
88
| ------------- | ---------- | --------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
9+
| ✅ Accepted | 2025-12-23 | [Docker Security Scan Exit Code Zero](./docker-security-scan-exit-code-zero.md) | Use exit-code 0 for security scanning - Trivy detects, GitHub Security decides, CI green |
910
| ✅ Accepted | 2025-12-20 | [Grafana Integration Pattern](./grafana-integration-pattern.md) | Enable Grafana by default with hard Prometheus dependency and environment variable config |
1011
| ✅ Accepted | 2025-12-17 | [Secrecy Crate for Sensitive Data Handling](./secrecy-crate-for-sensitive-data.md) | Use secrecy crate for type-safe secret handling with memory zeroing |
1112
| ✅ Accepted | 2025-12-14 | [Database Configuration Structure in Templates](./database-configuration-structure-in-templates.md) | Expose structured database fields in templates rather than pre-resolved connection strings |
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Decision: Exit Code Zero for Docker Security Scanning
2+
3+
## Status
4+
5+
Accepted
6+
7+
## Date
8+
9+
2025-12-23
10+
11+
## Context
12+
13+
When implementing automated Docker vulnerability scanning with Trivy in GitHub Actions, we faced a critical decision about how the CI/CD pipeline should respond to discovered vulnerabilities.
14+
15+
Traditional approaches make CI fail when vulnerabilities are found, blocking all development until issues are resolved. However, this creates several problems:
16+
17+
1. **False Positives**: Security scanners can report issues that don't apply to our context or are accepted risks
18+
2. **Third-Party Dependencies**: We cannot immediately fix vulnerabilities in upstream images (mysql, prometheus, grafana)
19+
3. **Scanner Quirks**: Trivy occasionally exits with code 1 even when no vulnerabilities are found
20+
4. **Development Flow**: Security findings should not block unrelated development work
21+
5. **Policy Enforcement**: Security decisions should be made by security teams, not automated tooling
22+
6. **Partial Data Loss**: If CI fails early, later scans never run and we lose visibility into other images
23+
24+
The initial implementation used `exit-code: "1"` which caused the workflow to fail on any HIGH or CRITICAL vulnerability, including when scanning third-party production images with known CVEs that we cannot immediately fix.
25+
26+
## Decision
27+
28+
Implement a **security-first philosophy** where:
29+
30+
1. **Exit Code Zero Everywhere**: All Trivy scan steps use `exit-code: "0"` - the scanner never fails the CI pipeline
31+
2. **Dual Output Strategy**:
32+
- Human-readable table format in workflow logs for immediate visibility
33+
- SARIF format uploaded to GitHub Security tab for tracking and alerting
34+
3. **Separation of Concerns**:
35+
- Trivy's role: **Detect** vulnerabilities and provide data
36+
- GitHub Security's role: **Decide** enforcement policies and alert routing
37+
- CI's role: **Stay green** and maintain development velocity
38+
4. **Always Run Policy**: Upload job uses `if: always()` to ensure partial results are never lost
39+
5. **Unique Categories**: Each image gets a unique SARIF category for proper alert tracking and deduplication
40+
6. **Scheduled Scanning**: Daily cron ensures continuous monitoring without blocking code changes
41+
42+
This philosophy is summarized as: **"Trivy detects, GitHub Security decides, CI stays green"**
43+
44+
## Consequences
45+
46+
### Positive
47+
48+
- **No False Failures**: Development work never blocked by scanner quirks or edge cases
49+
- **Continuous Visibility**: All scans complete even if one fails, providing complete security picture
50+
- **Flexible Enforcement**: Security team can configure GitHub Security policies without changing code
51+
- **Third-Party Tolerance**: Known vulnerabilities in upstream images don't block development
52+
- **Developer Experience**: Green builds maintain team velocity while security team reviews findings
53+
- **Policy Separation**: Security enforcement decoupled from CI/CD implementation
54+
- **Audit Trail**: All findings recorded in GitHub Security tab for compliance and tracking
55+
- **Incremental Improvement**: Can address vulnerabilities based on priority without CI pressure
56+
57+
### Negative
58+
59+
- **Potential Complacency**: Green CI might lead to ignoring security findings (mitigated by GitHub Security alerts)
60+
- **Requires Monitoring**: Security team must actively monitor GitHub Security tab
61+
- **Policy Configuration**: Requires additional GitHub Security policy setup for enforcement
62+
- **Learning Curve**: Non-traditional approach may confuse developers expecting red builds for vulnerabilities
63+
64+
### Risks Introduced
65+
66+
- **Missed Critical Issues**: If GitHub Security is not properly configured or monitored, critical vulnerabilities might go unaddressed
67+
- **Mitigation**: Daily scheduled scans ensure consistent monitoring; GitHub Security sends email notifications
68+
- **Organizational Resistance**: Some organizations mandate CI failure on security issues
69+
- **Mitigation**: GitHub Security can be configured to block PRs or deployments if needed
70+
71+
## Alternatives Considered
72+
73+
### 1. Exit Code 1 (Fail on Vulnerabilities)
74+
75+
**Approach**: Use `exit-code: "1"` to fail CI when HIGH/CRITICAL vulnerabilities are found.
76+
77+
**Rejected Because**:
78+
79+
- Blocks development on third-party image vulnerabilities we cannot fix immediately
80+
- Scanner quirks cause false CI failures even with zero vulnerabilities
81+
- No flexibility for security team to make risk-based decisions
82+
- Partial data loss when early scans fail
83+
84+
### 2. Mixed Exit Codes (Project vs Third-Party)
85+
86+
**Approach**: Use `exit-code: "1"` for project images but `exit-code: "0"` for third-party images.
87+
88+
**Rejected Because**:
89+
90+
- Inconsistent philosophy creates confusion
91+
- Project images can have legitimate accepted risks
92+
- Still susceptible to scanner quirks on project images
93+
- Doesn't solve the fundamental policy enforcement problem
94+
95+
### 3. Continue-on-Error Pattern
96+
97+
**Approach**: Use `exit-code: "1"` but add `continue-on-error: true` to allow workflow to proceed.
98+
99+
**Rejected Because**:
100+
101+
- Shows misleading "failed" status even though workflow continues
102+
- Scanner errors appear as failures in UI, creating noise
103+
- Doesn't fundamentally change the enforcement model
104+
- Confusing to developers seeing "failed" steps that don't actually fail
105+
106+
### 4. CodeQL Action with Single Category
107+
108+
**Approach**: Upload all SARIF files using github/codeql-action/upload-sarif with same category.
109+
110+
**Rejected Because**:
111+
112+
- CodeQL Action rejects multiple SARIF uploads with identical categories (as of July 2025)
113+
- Results in "multiple SARIF runs with same category" error
114+
- Cannot distinguish alerts between different images
115+
116+
## Related Decisions
117+
118+
- [GitHub Actions Workflow Structure](https://github.com/torrust/torrust-tracker-deployer/pull/256) - How the three-job structure enables this philosophy
119+
- Future: Security Policy Configuration (to be documented when GitHub Security policies are configured)
120+
121+
## References
122+
123+
- [Issue #251: Implement basic Trivy scanning workflow](https://github.com/torrust/torrust-tracker-deployer/issues/251)
124+
- [Pull Request #256: Implement Basic Trivy Scanning Workflow](https://github.com/torrust/torrust-tracker-deployer/pull/256)
125+
- [Trivy Action Documentation](https://github.com/aquasecurity/trivy-action)
126+
- [GitHub Code Scanning Documentation](https://docs.github.com/en/code-security/code-scanning)
127+
- [GitHub Security Policy Enforcement](https://docs.github.com/en/code-security/code-scanning/managing-code-scanning-alerts)
128+
- [Security-First Philosophy Discussion](https://github.com/torrust/torrust-tracker-deployer/pull/256#discussion) - External review recommending exit-code 0 approach

0 commit comments

Comments
 (0)