Skip to content

Commit 5ed35cd

Browse files
committed
Merge #247: feat: [#246] Add Grafana metrics visualization service
e2efe88 test: [#246] Fix doctests creating schema.json artifacts in root (Jose Celano) 50c0243 docs: [#246] Add comprehensive tracker verification guide (Jose Celano) 550d6f9 docs: [#246] Add note about upcoming show command for getting VM IP (Jose Celano) 0747286 docs: [#246] Update issue progress and enhance Grafana verification guide (Jose Celano) da4fe01 feat: [#246] Add Grafana datasource auto-provisioning and health checks (Jose Celano) 1eded0b feat: [#246] Add health checks for Prometheus and Grafana services (Jose Celano) c7a4dec docs: [#246] Add extension tasks for Grafana health checks and auto-provisioning (Jose Celano) bd140df docs: [#246] complete Phase 4 documentation and enable Grafana in E2E tests with retry logic (Jose Celano) 8272d58 docs: [#246] mark all acceptance criteria as complete (Jose Celano) 4dd3cc5 docs: [#246] complete Phase 4 - add Grafana terms to dictionary and fix Clippy warnings (Jose Celano) 607c26b docs: [#246] update Phase 3 & 4 progress with manual testing completion and password bug fix (Jose Celano) c03d9a6 docs: [#246] update issue progress for Phase 3 tasks (Jose Celano) 21c4e7b feat: [#246] add Grafana E2E validation (Jose Celano) eed9c65 refactor: [#246] Remove Grafana firewall configuration (Jose Celano) 7d56581 fix: [#246] bind Prometheus to localhost for secure validation (Jose Celano) 5116f33 docs: [#246] update issue progress - Phase 3 complete, security fix applied (Jose Celano) be00228 docs: add DRAFT issue spec for Docker and UFW firewall security strategy (Jose Celano) 99b1339 refactor: [#246] organize manual testing documentation (Jose Celano) 8323def fix: [#246] remove Prometheus port exposure for security (Jose Celano) 696fc0d docs: [#246] add manual E2E testing results for Grafana deployment (Jose Celano) 6af9efc docs: [#246] update issue documentation to reflect actual implementation details (Jose Celano) ad0b272 docs: [#246] mark E2E test configuration tasks complete (Jose Celano) 2b07e8e feat: [#246] implement Phase 3 Grafana firewall configuration (Jose Celano) a04b4bc feat: [#246] implement Phase 2 Docker Compose integration for Grafana slice (Jose Celano) 426f64a feat: [#246] use NonZeroU32 for Prometheus scrape interval domain model (Jose Celano) b847f2c feat: [#246] add GrafanaRequiresPrometheus error variant (Jose Celano) 503df82 feat: [#246] add Grafana service configuration to UserInputs (Jose Celano) 78215b6 feat: [#246] add GrafanaConfig domain model (Jose Celano) Pull request description: ## Summary Implements Grafana as a metrics visualization service for the Torrust Tracker deployment. This PR adds Grafana to the docker-compose stack as an optional service (enabled by default) that connects to Prometheus for displaying tracker metrics through dashboards. **Related Issue**: Closes #246 ## Extension Tasks Completed After the initial implementation, five extension tasks were identified and completed to improve automation and user experience. These enhancements are documented in [docs/issues/246-grafana-slice-release-run-commands-extension.md](https://github.com/torrust/torrust-tracker-deployer/blob/246-grafana-slice/docs/issues/246-grafana-slice-release-run-commands-extension.md). ✅ **Task 1: Prometheus Health Check** - Docker Compose health check using `/-/healthy` endpoint ✅ **Task 2: Grafana Health Check** - Docker Compose health check using `/api/health` endpoint ✅ **Task 3: Auto-Configure Prometheus Datasource** - Grafana provisioning for automatic Prometheus connection ✅ **Task 4: Preload Grafana Dashboards** - Auto-load Stats and Metrics dashboards from torrust-demo ✅ **Task 5: Enhanced Documentation** - Comprehensive E2E testing manuals and verification guides ## Key Features ✅ **Grafana Service Integration** - Docker Compose service with grafana/grafana:11.4.0 image - Exposed on port 3100 for web UI access - Configurable admin credentials via environment variables - Automatic Prometheus data source configuration - Pre-loaded Stats and Metrics dashboards ✅ **Health Checks** - Prometheus health check (10s interval, 5s timeout, 10s start period) - Grafana health check (10s interval, 5s timeout, 30s start period) - Docker-aware service readiness for better orchestration ✅ **Dependency Validation** - Grafana requires Prometheus (enforced at environment creation) - Clear error messages with actionable fix instructions - Type-safe dependency checking in domain layer ✅ **Firewall Configuration** - Port 3100 opened for Grafana UI (public access) - UFW firewall rules applied automatically during configure step - Step-level conditional execution (only runs when Grafana enabled) ✅ **Enabled-by-Default Pattern** - Grafana included in generated environment templates - Users can disable by removing configuration section - Follows same pattern as Prometheus integration ✅ **Full Automation** - Zero manual Grafana configuration required - Prometheus datasource automatically provisioned - Dashboards automatically loaded on first startup - Production-ready deployment out of the box ## Documentation Improvements ✅ **E2E Testing Manual Updates** - Created comprehensive [tracker-verification.md](https://github.com/torrust/torrust-tracker-deployer/blob/246-grafana-slice/docs/e2e-testing/manual/tracker-verification.md) with HTTP/UDP/API testing procedures - Updated [grafana-verification.md](https://github.com/torrust/torrust-tracker-deployer/blob/246-grafana-slice/docs/e2e-testing/manual/grafana-verification.md) with datasource and dashboard verification - Enhanced [main E2E manual](https://github.com/torrust/torrust-tracker-deployer/blob/246-grafana-slice/docs/e2e-testing/manual/README.md) with service index - All commands tested against live environment with actual outputs captured ## Security Fix Included 🔒 **Critical Security Issue Discovered & Fixed** During manual testing, discovered that Docker bypasses UFW firewall rules when publishing ports with `0.0.0.0:` binding. **Issue**: Prometheus port 9090 was exposed to external network despite UFW default deny incoming policy. **Fix Applied**: Removed Prometheus port mapping from docker-compose template. Prometheus is now truly internal-only (not accessible from external network), while Grafana continues to access it via Docker internal network (`http://prometheus:9090`). **Documentation**: Created comprehensive DRAFT issue specification for future analysis: [docs/issues/DRAFT-docker-ufw-firewall-security-strategy.md](https://github.com/torrust/torrust-tracker-deployer/blob/246-grafana-slice/docs/issues/DRAFT-docker-ufw-firewall-security-strategy.md) ## Implementation Phases ### Phase 1: Domain Models & Validation ✅ - Created `GrafanaConfig` domain type - Implemented Grafana-Prometheus dependency validation - Added `GrafanaRequiresPrometheus` error with actionable help messages - Integrated into `UserInputs` domain model ### Phase 2: Docker Compose Integration ✅ - Extended `DockerComposeContext` with `grafana_config` field - Extended `EnvContext` with Grafana service configuration - Updated templates: `docker-compose.yml.tera`, `.env.tera` - Conditional service rendering (only when Grafana enabled) ### Phase 3: Configuration & Testing ✅ - Created `configure-grafana-firewall.yml` Ansible playbook (static) - Implemented `ConfigureGrafanaFirewallStep` following tracker firewall pattern - Integrated in configure command with step-level conditionals - Created E2E test configurations (3 configs) - Completed manual E2E testing (full workflow validated) - Applied security fix (Prometheus port exposure) ### Phase 4: Extension Tasks ✅ - Added Prometheus and Grafana health checks - Implemented automatic Grafana provisioning (datasource + dashboards) - Created comprehensive E2E testing documentation - Verified all commands against live environment ### Phase 5: Documentation ✅ (Partial) - Updated issue specification with implementation details - Documented manual testing results - Created DRAFT security issue specification - Created tracker and Grafana verification guides - ADR and user guide deferred (not critical for MVP) ## Testing ✅ **Unit Tests**: 1563 tests passing ✅ **Linters**: All passing (markdown, yaml, toml, cspell, clippy, rustfmt, shellcheck) ✅ **Manual E2E Testing**: Complete deployment workflow validated (create → provision → configure → release → run → test) ✅ **Security Testing**: Verified Prometheus not accessible externally, Grafana accessible on port 3100 ✅ **Health Check Testing**: Verified both Prometheus and Grafana report healthy status after startup ✅ **Provisioning Testing**: Verified Prometheus datasource and dashboards automatically configured **Manual Testing Results**: [docs/e2e-testing/manual/grafana-testing-results.md](https://github.com/torrust/torrust-tracker-deployer/blob/246-grafana-slice/docs/e2e-testing/manual/grafana-testing-results.md) ## Configuration Examples ### Enable Grafana (Default) ```json { "prometheus": { "scrape_interval_in_secs": 15 }, "grafana": { "admin_user": "admin", "admin_password": "secure-password" } } ``` ### Disable Grafana ```json { "prometheus": { "scrape_interval_in_secs": 15 } // No grafana section = disabled } ``` ### Validation Error (Grafana without Prometheus) ```json { // No prometheus section "grafana": { "admin_user": "admin", "admin_password": "secure-password" } } ``` ❌ **Error**: "Grafana requires Prometheus for metrics visualization. Either enable Prometheus by adding the 'prometheus' section, or disable Grafana by removing the 'grafana' section." ## Files Changed **Created**: - `src/domain/grafana/config.rs` - Domain model - `src/application/steps/system/configure_grafana_firewall.rs` - Firewall configuration step - `templates/ansible/configure-grafana-firewall.yml` - Ansible playbook (static) - `templates/ansible/deploy-grafana-provisioning.yml` - Grafana provisioning deployment (static) - `templates/grafana/provisioning/datasources/prometheus.yml.tera` - Datasource template - `templates/grafana/provisioning/dashboards/torrust.yml` - Dashboard provider config (static) - `templates/grafana/dashboards/stats.json` - Stats dashboard (from torrust-demo) - `templates/grafana/dashboards/metrics.json` - Metrics dashboard (from torrust-demo) - `docs/e2e-testing/manual/grafana-testing-results.md` - Manual testing documentation - `docs/e2e-testing/manual/tracker-verification.md` - Tracker verification guide - `docs/issues/DRAFT-docker-ufw-firewall-security-strategy.md` - Security issue spec - `docs/issues/246-grafana-slice-release-run-commands-extension.md` - Extension tasks documentation **Modified**: - `src/domain/environment/user_inputs.rs` - Added grafana field - `src/application/command_handlers/create/config/errors.rs` - Added validation error - `src/application/command_handlers/configure/handler.rs` - Integrated firewall and provisioning steps - `templates/docker-compose/docker-compose.yml.tera` - Added Grafana service with health checks, removed Prometheus port - `templates/docker-compose/.env.tera` - Added Grafana environment variables - `docs/e2e-testing/manual/README.md` - Added service index - `docs/e2e-testing/manual/grafana-verification.md` - Enhanced with provisioning verification - Multiple test files updated (1563 tests passing) ## Breaking Changes ⚠️ **Prometheus Port Change**: Prometheus port 9090 is no longer exposed to the host. This is a security fix, not a feature change. Services should access Prometheus via Docker internal network, not host port. ## Architectural Decisions 1. **Static Playbook Pattern**: Uses static `.yml` playbook with centralized variables (not `.tera` template) 2. **Step-Level Conditionals**: Decision to execute happens in handler, not task-level with variables 3. **Selective Firewall Exposure**: Only user-facing services (Grafana) exposed publicly, internal services (Prometheus) remain internal 4. **Enabled-by-Default**: Following Prometheus pattern for consistent user experience 5. **Grafana Provisioning**: Uses Grafana's built-in provisioning system for datasources and dashboards 6. **Dashboard Selection**: Uses proven dashboards from torrust-demo for immediate value ## Related Issues - #246 - Grafana slice (this PR) - #216 - Parent epic: Implement ReleaseCommand and RunCommand with vertical slices - Future: Docker/UFW firewall security strategy (see DRAFT issue spec) ## Checklist - [x] Code follows project conventions and style guide - [x] All unit tests passing (1563 tests) - [x] All linters passing - [x] Manual E2E testing complete - [x] Security issue discovered and fixed - [x] Documentation updated (extension tasks, verification guides) - [x] Commit messages follow conventional commits format - [x] Branch rebased/merged with latest main (if needed) ## Deployment Notes After deployment, Grafana UI will be available at `http://<vm-ip>:3100` with the credentials specified in the environment configuration. **First login**: Use admin credentials from environment config. The Prometheus datasource and two dashboards (Stats and Metrics) will be automatically configured and ready to use immediately. **Dashboards Available**: - **Torrust Tracker Stats** - Aggregate statistics and state metrics - **Torrust Tracker Metrics** - Detailed operational metrics and performance data ## Commits (28 total) 1-3. Phase 1: Domain models, validation, integration 4. Phase 2: Docker Compose integration 5. Phase 3: Firewall configuration 6. E2E test configurations documentation 7. Commit message correction 8. Issue documentation update 9. Manual E2E testing results 10. Security fix (Prometheus port exposure) 11. Security documentation update 12. Documentation reorganization 13. DRAFT security issue specification 14-18. Extension tasks: Health checks implementation 19-22. Extension tasks: Grafana provisioning (datasource + dashboards) 23-27. Documentation improvements: Verification guides and testing manuals ACKs for top commit: josecelano: ACK e2efe88 Tree-SHA512: 16c073863dea02accd3b0588eff6b4d82ac487be4fa2d73cca0db05948a2fa196fe7df83211aea422e5b665c4ce06e70e85734ff9fcf5fa60cf021bb68a6371d
2 parents a950252 + e2efe88 commit 5ed35cd

File tree

80 files changed

+9747
-736
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

80 files changed

+9747
-736
lines changed

docs/decisions/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ This directory contains architectural decision records for the Torrust Tracker D
66

77
| Status | Date | Decision | Summary |
88
| ------------- | ---------- | --------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
9+
| ✅ Accepted | 2025-12-20 | [Grafana Integration Pattern](./grafana-integration-pattern.md) | Enable Grafana by default with hard Prometheus dependency and environment variable config |
910
| ✅ Accepted | 2025-12-17 | [Secrecy Crate for Sensitive Data Handling](./secrecy-crate-for-sensitive-data.md) | Use secrecy crate for type-safe secret handling with memory zeroing |
1011
| ✅ Accepted | 2025-12-14 | [Database Configuration Structure in Templates](./database-configuration-structure-in-templates.md) | Expose structured database fields in templates rather than pre-resolved connection strings |
1112
| ✅ Accepted | 2025-12-13 | [Environment Variable Injection in Docker Compose](./environment-variable-injection-in-docker-compose.md) | Use .env file injection instead of hardcoded values for runtime configuration changes |

0 commit comments

Comments
 (0)