Alerting rules#19
Merged
lgahdl merged 2 commits intojefferson/cow-616-cow-593-task-2-remaining-dashboards-resources-comparisonfrom Feb 25, 2026
Merged
Conversation
- Create 7 core alerting rules (latency, error rate, throughput, resources, test execution) - Enable rule_files in Prometheus configuration - Add alerts volume mount in Docker Compose - Add Grafana annotations to show firing alerts on dashboard - Add container_memory_percent metric for CriticalMemoryUsage alert
- Add implementation plan: thoughts/plans/2026-02-13-cow-598-alerting-rules.md - Add implementation notes to ticket file documenting scope decisions - Update INDEX.md with plan entry and document cluster reference Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
58b6816 to
721d397
Compare
d1e8e58 to
721d397
Compare
5184210
into
jefferson/cow-616-cow-593-task-2-remaining-dashboards-resources-comparison
17 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements Prometheus alerting rules for the CoW Performance Testing Suite (COW-598). Adds 7 core alerts that notify developers when performance degrades, error rates spike, or resource utilization exceeds thresholds during performance testing.
Approach: Option A (Prometheus alerting rules + Grafana visualization) - no Alertmanager required.
Alerts Implemented
Changes
New Files
configs/prometheus/alerts/performance-testing.yml- Alert rules with documented parameters at top for easy customization (TODO(COW-617) for future configurability)Modified Files
configs/prometheus.yml- Enablerule_files:to load alert rulesdocker-compose.yml- Mount alerts directory in Prometheus containerconfigs/dashboards/performance.json- Add alert annotations to show firing alertssrc/cow_performance/prometheus/metrics.py- Addcow_perf_container_memory_percentgaugesrc/cow_performance/prometheus/exporter.py- Export memory percentage metricDocumentation
thoughts/plans/2026-02-13-cow-598-alerting-rules.md- Implementation planthoughts/tickets/COW-598-alerting-rules.md- Updated with implementation notesthoughts/INDEX.md- Updated with plan referenceHow to Test
Start the monitoring stack:
Verify alert rules loaded in Prometheus:
Check the Prometheus alerts page:
Run a test to generate metrics and verify Grafana annotations:
cow-perf run --prometheus-port 9091 --duration 120 open http://localhost:3000 # Navigate to Performance Overview dashboardChecklist
poetry run pytest tests/unit/)poetry run ruff check src/cow_performance/prometheus/)poetry run mypy src/cow_performance/prometheus/)Scope Decisions
Reduced scope (7 alerts instead of 15+) based on:
Deferred to COW-617: Configurable thresholds via TOML/env variables
Breaking Changes
None
Related Issues
🤖 Generated with Claude Code