Skip to content

Commit 9d8c4b8

Browse files
committed
Merge #256: Implement Basic Trivy Scanning Workflow
40dd234 docs: [#251] document Security tab viewing behavior (Jose Celano) 098add2 fix: [#251] use CodeQL action for SARIF upload with category support (Jose Celano) af93704 docs: [#251] add ADR for docker security scan exit code zero philosophy (Jose Celano) 92b3325 refactor: [#251] apply security-first workflow philosophy with exit-code 0 (Jose Celano) 7ff69be fix: [#251] upload SARIF files with unique categories per image (Jose Celano) 797addf fix: [#251] upload SARIF files with unique categories per image (Jose Celano) cebf2a7 fix: [#251] prevent workflow failure and artifact name conflicts (Jose Celano) 5b09357 fix: [#251] sanitize Docker image names for artifact naming (Jose Celano) 382f430 refactor: [#251] separate SARIF upload to dedicated job with minimal permissions (Jose Celano) afb653c chore: [#251] update trivy-action from v0.28.0 to v0.33.1 (Jose Celano) 7013bc0 fix: [#251] focus CVE scanning only, display all issues for visibility (Jose Celano) c13afa8 fix: [#251] disable secret scanning for test containers with SSH keys (Jose Celano) c11eb10 feat: [#251] add human-readable vulnerability output to workflow logs (Jose Celano) c032e71 fix: [#251] update docker build context and action versions (Jose Celano) defaaaa feat: [#251] implement basic trivy scanning workflow (Jose Celano) Pull request description: Closes #251 ## Overview Implements a GitHub Actions workflow that uses Trivy to scan Docker images for vulnerabilities. This initial implementation uses a hardcoded list of images and provides immediate security coverage. ## Changes ### 1. GitHub Actions Workflow (`.github/workflows/docker-security-scan.yml`) Created a new workflow with two jobs: #### **Scan Project-Built Images** - Builds and scans 2 project-built images: - `torrust-tracker-deployer/provisioned-instance` - `torrust-tracker-deployer/ssh-server` #### **Scan Third-Party Images** - Scans 4 third-party images used in docker-compose: - `torrust/tracker:develop` - `mysql:8.0` - `grafana/grafana:11.4.0` - `prom/prometheus:v3.0.1` **Workflow Features**: - ✅ Scans for HIGH/CRITICAL vulnerabilities only - ✅ Fails builds when vulnerabilities detected (`exit-code: 1`) - ✅ Uploads results to GitHub Security tab (SARIF format) - ✅ Runs on push to main/develop - ✅ Runs on PRs affecting Docker files - ✅ Runs daily at 6 AM UTC - ✅ Supports manual triggering **Notes**: - Added comments linking third-party images to `templates/docker-compose/docker-compose.yml.tera` - Added TODO referencing issue #252 for future automation ### 2. README Update Added workflow badge to display scan status. ## Testing - ✅ Pre-commit checks passed (`./scripts/pre-commit.sh`) - ✅ YAML linting passed - ✅ All unit tests passed - ✅ E2E tests passed ## Future Work This is Phase 1 of the security scanning epic. Phase 2 (#252) will: - Dynamically detect images from environment configuration - Eliminate need for manual updates when images change - Integrate with `show` command to list scanned images ## Related - Epic: #250 - Implement Automated Docker Image Vulnerability Scanning - Next Phase: #252 - Implement Dynamic Image Detection for Scanning - [Trivy Documentation](https://github.com/aquasecurity/trivy) ACKs for top commit: josecelano: ACK 40dd234 Tree-SHA512: 9c77b373d433f67c9ca07425b4343fed141a2d4474758475bb913c0c67518ae6664581fd9206550fd88c9c92194f0890175ba7b1cd0a936a85246aaea1ad7a5d
2 parents e59c4a7 + 40dd234 commit 9d8c4b8

File tree

5 files changed

+419
-61
lines changed

5 files changed

+419
-61
lines changed
Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
name: Docker Security Scan
2+
3+
on:
4+
push:
5+
branches: [main, develop]
6+
paths:
7+
- "docker/**"
8+
- "templates/docker-compose/**"
9+
- ".github/workflows/docker-security-scan.yml"
10+
11+
pull_request:
12+
paths:
13+
- "docker/**"
14+
- "templates/docker-compose/**"
15+
- ".github/workflows/docker-security-scan.yml"
16+
17+
# Scheduled scans are important because new CVEs appear
18+
# even if the code or images didn’t change
19+
schedule:
20+
- cron: "0 6 * * *" # Daily at 6 AM UTC
21+
22+
workflow_dispatch:
23+
24+
jobs:
25+
scan-project-images:
26+
name: Scan Project-Built Docker Images
27+
runs-on: ubuntu-latest
28+
timeout-minutes: 15
29+
permissions:
30+
contents: read
31+
32+
strategy:
33+
fail-fast: false
34+
matrix:
35+
image:
36+
- dockerfile: docker/provisioned-instance/Dockerfile
37+
context: docker/provisioned-instance
38+
name: provisioned-instance
39+
- dockerfile: docker/ssh-server/Dockerfile
40+
context: docker/ssh-server
41+
name: ssh-server
42+
43+
steps:
44+
- name: Checkout code
45+
uses: actions/checkout@v4
46+
47+
# Build images locally so Trivy scans exactly
48+
# what this repository produces
49+
- name: Build Docker image
50+
run: |
51+
docker build \
52+
-t torrust-tracker-deployer/${{ matrix.image.name }}:latest \
53+
-f ${{ matrix.image.dockerfile }} \
54+
.
55+
56+
# Human-readable output in logs
57+
# This NEVER fails the job; it’s only for visibility
58+
- name: Display vulnerabilities (table format)
59+
uses: aquasecurity/[email protected]
60+
with:
61+
image-ref: torrust-tracker-deployer/${{ matrix.image.name }}:latest
62+
format: "table"
63+
severity: "HIGH,CRITICAL"
64+
exit-code: "0"
65+
66+
# SARIF generation for GitHub Code Scanning
67+
#
68+
# IMPORTANT:
69+
# - exit-code MUST be 0
70+
# - Trivy sometimes exits with 1 even when no vulns exist
71+
# - GitHub Security UI is responsible for enforcement
72+
- name: Generate SARIF (Code Scanning)
73+
uses: aquasecurity/[email protected]
74+
with:
75+
image-ref: torrust-tracker-deployer/${{ matrix.image.name }}:latest
76+
format: "sarif"
77+
output: "trivy-${{ matrix.image.name }}.sarif"
78+
severity: "HIGH,CRITICAL"
79+
exit-code: "0"
80+
scanners: "vuln"
81+
82+
- name: Upload SARIF artifact
83+
uses: actions/upload-artifact@v4
84+
if: always()
85+
with:
86+
name: sarif-project-${{ matrix.image.name }}-${{ github.run_id }}
87+
path: trivy-${{ matrix.image.name }}.sarif
88+
retention-days: 30
89+
90+
scan-third-party-images:
91+
name: Scan Third-Party Docker Images
92+
runs-on: ubuntu-latest
93+
timeout-minutes: 15
94+
permissions:
95+
contents: read
96+
97+
strategy:
98+
fail-fast: false
99+
matrix:
100+
# These must match docker-compose templates
101+
# in templates/docker-compose/docker-compose.yml.tera
102+
image:
103+
- torrust/tracker:develop
104+
- mysql:8.0
105+
- grafana/grafana:11.4.0
106+
- prom/prometheus:v3.0.1
107+
108+
steps:
109+
- name: Display vulnerabilities (table format)
110+
uses: aquasecurity/[email protected]
111+
with:
112+
image-ref: ${{ matrix.image }}
113+
format: "table"
114+
severity: "HIGH,CRITICAL"
115+
exit-code: "0"
116+
117+
# Third-party images should NEVER block CI.
118+
# We only report findings to GitHub Security.
119+
- name: Generate SARIF (Code Scanning)
120+
uses: aquasecurity/[email protected]
121+
with:
122+
image-ref: ${{ matrix.image }}
123+
format: "sarif"
124+
output: "trivy.sarif"
125+
severity: "HIGH,CRITICAL"
126+
exit-code: "0"
127+
scanners: "vuln"
128+
129+
# Needed to produce stable artifact names
130+
- name: Sanitize image name
131+
id: sanitize
132+
run: |
133+
echo "name=$(echo '${{ matrix.image }}' | tr '/:' '-')" >> "$GITHUB_OUTPUT"
134+
135+
- name: Upload SARIF artifact
136+
uses: actions/upload-artifact@v4
137+
if: always()
138+
with:
139+
name: sarif-third-party-${{ steps.sanitize.outputs.name }}-${{ github.run_id }}
140+
path: trivy.sarif
141+
retention-days: 30
142+
143+
upload-sarif-results:
144+
name: Upload SARIF Results to GitHub Security
145+
runs-on: ubuntu-latest
146+
needs:
147+
- scan-project-images
148+
- scan-third-party-images
149+
150+
# Always run so we don’t lose security visibility
151+
if: always()
152+
153+
permissions:
154+
security-events: write
155+
156+
steps:
157+
- name: Download all SARIF artifacts
158+
uses: actions/download-artifact@v4
159+
with:
160+
pattern: sarif-*-${{ github.run_id }}
161+
162+
# Upload each SARIF file with CodeQL Action using unique categories.
163+
# The category parameter enables proper alert tracking per image.
164+
# Must use CodeQL Action (not gh API) - API doesn't support category field.
165+
#
166+
# VIEWING RESULTS:
167+
# - For pull requests: /security/code-scanning?query=pr:NUMBER+is:open
168+
# - For branches: /security/code-scanning?query=is:open+branch:BRANCH-NAME
169+
# - For main branch: /security/code-scanning?query=is:open+branch:main (default view)
170+
# The default Security tab filters by "is:open branch:main" which only shows
171+
# alerts from the main branch, not from PR branches.
172+
- name: Upload project provisioned-instance SARIF
173+
if: always()
174+
uses: github/codeql-action/upload-sarif@v4
175+
with:
176+
sarif_file: sarif-project-provisioned-instance-${{ github.run_id }}/trivy-provisioned-instance.sarif
177+
category: docker-project-provisioned-instance
178+
continue-on-error: true
179+
180+
- name: Upload project ssh-server SARIF
181+
if: always()
182+
uses: github/codeql-action/upload-sarif@v4
183+
with:
184+
sarif_file: sarif-project-ssh-server-${{ github.run_id }}/trivy-ssh-server.sarif
185+
category: docker-project-ssh-server
186+
continue-on-error: true
187+
188+
- name: Upload third-party mysql SARIF
189+
if: always()
190+
uses: github/codeql-action/upload-sarif@v4
191+
with:
192+
sarif_file: sarif-third-party-mysql-8.0-${{ github.run_id }}/trivy.sarif
193+
category: docker-third-party-mysql-8.0
194+
continue-on-error: true
195+
196+
- name: Upload third-party tracker SARIF
197+
if: always()
198+
uses: github/codeql-action/upload-sarif@v4
199+
with:
200+
sarif_file: sarif-third-party-torrust-tracker-develop-${{ github.run_id }}/trivy.sarif
201+
category: docker-third-party-torrust-tracker-develop
202+
continue-on-error: true
203+
204+
- name: Upload third-party grafana SARIF
205+
if: always()
206+
uses: github/codeql-action/upload-sarif@v4
207+
with:
208+
sarif_file: sarif-third-party-grafana-grafana-11.4.0-${{ github.run_id }}/trivy.sarif
209+
category: docker-third-party-grafana-grafana-11.4.0
210+
continue-on-error: true
211+
212+
- name: Upload third-party prometheus SARIF
213+
if: always()
214+
uses: github/codeql-action/upload-sarif@v4
215+
with:
216+
sarif_file: sarif-third-party-prom-prometheus-v3.0.1-${{ github.run_id }}/trivy.sarif
217+
category: docker-third-party-prom-prometheus-v3.0.1
218+
continue-on-error: true

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
[![Linting](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/linting.yml/badge.svg)](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/linting.yml) [![Testing](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/testing.yml/badge.svg)](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/testing.yml) [![E2E Infrastructure Tests](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/test-e2e-infrastructure.yml/badge.svg)](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/test-e2e-infrastructure.yml) [![E2E Deployment Tests](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/test-e2e-deployment.yml/badge.svg)](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/test-e2e-deployment.yml) [![Test LXD Container Provisioning](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/test-lxd-provision.yml/badge.svg)](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/test-lxd-provision.yml) [![Coverage](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/coverage.yml/badge.svg)](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/coverage.yml)
1+
[![Linting](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/linting.yml/badge.svg)](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/linting.yml) [![Testing](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/testing.yml/badge.svg)](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/testing.yml) [![E2E Infrastructure Tests](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/test-e2e-infrastructure.yml/badge.svg)](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/test-e2e-infrastructure.yml) [![E2E Deployment Tests](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/test-e2e-deployment.yml/badge.svg)](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/test-e2e-deployment.yml) [![Test LXD Container Provisioning](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/test-lxd-provision.yml/badge.svg)](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/test-lxd-provision.yml) [![Coverage](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/coverage.yml/badge.svg)](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/coverage.yml) [![Docker Security Scan](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/docker-security-scan.yml/badge.svg)](https://github.com/torrust/torrust-tracker-deployer/actions/workflows/docker-security-scan.yml)
22

33
# Torrust Tracker Deployer
44

docs/decisions/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ This directory contains architectural decision records for the Torrust Tracker D
66

77
| Status | Date | Decision | Summary |
88
| ------------- | ---------- | --------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
9+
| ✅ Accepted | 2025-12-23 | [Docker Security Scan Exit Code Zero](./docker-security-scan-exit-code-zero.md) | Use exit-code 0 for security scanning - Trivy detects, GitHub Security decides, CI green |
910
| ✅ Accepted | 2025-12-20 | [Grafana Integration Pattern](./grafana-integration-pattern.md) | Enable Grafana by default with hard Prometheus dependency and environment variable config |
1011
| ✅ Accepted | 2025-12-17 | [Secrecy Crate for Sensitive Data Handling](./secrecy-crate-for-sensitive-data.md) | Use secrecy crate for type-safe secret handling with memory zeroing |
1112
| ✅ Accepted | 2025-12-14 | [Database Configuration Structure in Templates](./database-configuration-structure-in-templates.md) | Expose structured database fields in templates rather than pre-resolved connection strings |
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# Decision: Exit Code Zero for Docker Security Scanning
2+
3+
## Status
4+
5+
Accepted
6+
7+
## Date
8+
9+
2025-12-23
10+
11+
## Context
12+
13+
When implementing automated Docker vulnerability scanning with Trivy in GitHub Actions, we faced a critical decision about how the CI/CD pipeline should respond to discovered vulnerabilities.
14+
15+
Traditional approaches make CI fail when vulnerabilities are found, blocking all development until issues are resolved. However, this creates several problems:
16+
17+
1. **False Positives**: Security scanners can report issues that don't apply to our context or are accepted risks
18+
2. **Third-Party Dependencies**: We cannot immediately fix vulnerabilities in upstream images (mysql, prometheus, grafana)
19+
3. **Scanner Quirks**: Trivy occasionally exits with code 1 even when no vulnerabilities are found
20+
4. **Development Flow**: Security findings should not block unrelated development work
21+
5. **Policy Enforcement**: Security decisions should be made by security teams, not automated tooling
22+
6. **Partial Data Loss**: If CI fails early, later scans never run and we lose visibility into other images
23+
24+
The initial implementation used `exit-code: "1"` which caused the workflow to fail on any HIGH or CRITICAL vulnerability, including when scanning third-party production images with known CVEs that we cannot immediately fix.
25+
26+
## Decision
27+
28+
Implement a **security-first philosophy** where:
29+
30+
1. **Exit Code Zero Everywhere**: All Trivy scan steps use `exit-code: "0"` - the scanner never fails the CI pipeline
31+
2. **Dual Output Strategy**:
32+
- Human-readable table format in workflow logs for immediate visibility
33+
- SARIF format uploaded to GitHub Security tab for tracking and alerting
34+
3. **Separation of Concerns**:
35+
- Trivy's role: **Detect** vulnerabilities and provide data
36+
- GitHub Security's role: **Decide** enforcement policies and alert routing
37+
- CI's role: **Stay green** and maintain development velocity
38+
4. **Always Run Policy**: Upload job uses `if: always()` to ensure partial results are never lost
39+
5. **Unique Categories**: Each image gets a unique SARIF category for proper alert tracking and deduplication
40+
6. **Scheduled Scanning**: Daily cron ensures continuous monitoring without blocking code changes
41+
42+
This philosophy is summarized as: **"Trivy detects, GitHub Security decides, CI stays green"**
43+
44+
## Consequences
45+
46+
### Positive
47+
48+
- **No False Failures**: Development work never blocked by scanner quirks or edge cases
49+
- **Continuous Visibility**: All scans complete even if one fails, providing complete security picture
50+
- **Flexible Enforcement**: Security team can configure GitHub Security policies without changing code
51+
- **Third-Party Tolerance**: Known vulnerabilities in upstream images don't block development
52+
- **Developer Experience**: Green builds maintain team velocity while security team reviews findings
53+
- **Policy Separation**: Security enforcement decoupled from CI/CD implementation
54+
- **Audit Trail**: All findings recorded in GitHub Security tab for compliance and tracking
55+
- **Incremental Improvement**: Can address vulnerabilities based on priority without CI pressure
56+
57+
### Negative
58+
59+
- **Potential Complacency**: Green CI might lead to ignoring security findings (mitigated by GitHub Security alerts)
60+
- **Requires Monitoring**: Security team must actively monitor GitHub Security tab
61+
- **Policy Configuration**: Requires additional GitHub Security policy setup for enforcement
62+
- **Learning Curve**: Non-traditional approach may confuse developers expecting red builds for vulnerabilities
63+
64+
### Risks Introduced
65+
66+
- **Missed Critical Issues**: If GitHub Security is not properly configured or monitored, critical vulnerabilities might go unaddressed
67+
- **Mitigation**: Daily scheduled scans ensure consistent monitoring; GitHub Security sends email notifications
68+
- **Organizational Resistance**: Some organizations mandate CI failure on security issues
69+
- **Mitigation**: GitHub Security can be configured to block PRs or deployments if needed
70+
71+
## Alternatives Considered
72+
73+
### 1. Exit Code 1 (Fail on Vulnerabilities)
74+
75+
**Approach**: Use `exit-code: "1"` to fail CI when HIGH/CRITICAL vulnerabilities are found.
76+
77+
**Rejected Because**:
78+
79+
- Blocks development on third-party image vulnerabilities we cannot fix immediately
80+
- Scanner quirks cause false CI failures even with zero vulnerabilities
81+
- No flexibility for security team to make risk-based decisions
82+
- Partial data loss when early scans fail
83+
84+
### 2. Mixed Exit Codes (Project vs Third-Party)
85+
86+
**Approach**: Use `exit-code: "1"` for project images but `exit-code: "0"` for third-party images.
87+
88+
**Rejected Because**:
89+
90+
- Inconsistent philosophy creates confusion
91+
- Project images can have legitimate accepted risks
92+
- Still susceptible to scanner quirks on project images
93+
- Doesn't solve the fundamental policy enforcement problem
94+
95+
### 3. Continue-on-Error Pattern
96+
97+
**Approach**: Use `exit-code: "1"` but add `continue-on-error: true` to allow workflow to proceed.
98+
99+
**Rejected Because**:
100+
101+
- Shows misleading "failed" status even though workflow continues
102+
- Scanner errors appear as failures in UI, creating noise
103+
- Doesn't fundamentally change the enforcement model
104+
- Confusing to developers seeing "failed" steps that don't actually fail
105+
106+
### 4. CodeQL Action with Single Category
107+
108+
**Approach**: Upload all SARIF files using github/codeql-action/upload-sarif with same category.
109+
110+
**Rejected Because**:
111+
112+
- CodeQL Action rejects multiple SARIF uploads with identical categories (as of July 2025)
113+
- Results in "multiple SARIF runs with same category" error
114+
- Cannot distinguish alerts between different images
115+
116+
## Viewing Security Results
117+
118+
Security scan results are uploaded to GitHub's Security tab, but the default view filters by `is:open branch:main`. This means:
119+
120+
- **Pull Request Results**: Must use filter `pr:NUMBER is:open` (e.g., `/security/code-scanning?query=pr:256+is:open`)
121+
- **Branch Results**: Must use filter `is:open branch:BRANCH-NAME` for non-main branches
122+
- **Main Branch Results**: Visible in default view after merging to main
123+
124+
Results uploaded from PR branches are not visible in the default Security tab view because the default filter excludes them. This is GitHub's standard behavior for code scanning across all analysis tools.
125+
126+
## Related Decisions
127+
128+
- [GitHub Actions Workflow Structure](https://github.com/torrust/torrust-tracker-deployer/pull/256) - How the three-job structure enables this philosophy
129+
- Future: Security Policy Configuration (to be documented when GitHub Security policies are configured)
130+
131+
## References
132+
133+
- [Issue #251: Implement basic Trivy scanning workflow](https://github.com/torrust/torrust-tracker-deployer/issues/251)
134+
- [Pull Request #256: Implement Basic Trivy Scanning Workflow](https://github.com/torrust/torrust-tracker-deployer/pull/256)
135+
- [Trivy Action Documentation](https://github.com/aquasecurity/trivy-action)
136+
- [GitHub Code Scanning Documentation](https://docs.github.com/en/code-security/code-scanning)
137+
- [GitHub Security Policy Enforcement](https://docs.github.com/en/code-security/code-scanning/managing-code-scanning-alerts)
138+
- [Security-First Philosophy Discussion](https://github.com/torrust/torrust-tracker-deployer/pull/256#discussion) - External review recommending exit-code 0 approach

0 commit comments

Comments
 (0)