Merged
Conversation
Add real-time Prometheus HTTP exporter for performance test metrics: - MetricsRegistry with order counters, latency histograms, throughput gauges - PrometheusExporter with HTTP server and MetricsStore callback integration - CLI --prometheus-port flag to enable exporter during test runs - Prometheus scrape config for cow-performance-test job Metrics exposed: cow_perf_orders_*, cow_perf_*_latency_seconds, cow_perf_orders_per_second, cow_perf_test_info, etc.
Add API, resource, per-trader, and baseline comparison metrics to complete COW-591 Prometheus exporter deliverable: - API metrics: requests counter, response time histogram, errors counter - Resource metrics: container CPU, memory, network gauges - Per-trader metrics: orders submitted/filled by trader index - Comparison metrics: baseline percent change, regression detection Update MetricsStore to pass container name with resource callbacks. Add 21 new unit tests covering all Phase 2 functionality.
Add two Grafana dashboards for monitoring performance tests: - Overview dashboard: test progress, order rates, latency distributions - API Performance dashboard: response times, throughput, error rates Configure dashboard provisioning via docker-compose volume mount and add explicit UID to Prometheus datasource for dashboard compatibility.
Add upload_app_data_with_retry() and get_open_order_count() methods that were missing from the instrumented wrapper, causing AttributeError when used in place of the underlying OrderbookClient.
Add three new dashboards completing the Grafana visualization suite: - Resources dashboard: CPU, memory, network monitoring per container - Comparison dashboard: baseline vs current with regression indicators - Trader Activity dashboard: per-trader statistics and activity patterns Update existing dashboards with cross-navigation links to all 5 dashboards.
… COW-593 Document Prometheus exporter phases and Grafana dashboard implementation plans to track progress on metrics infrastructure work.
- Add prometheus_port config field with default 9091 - CLI uses config default, --prometheus-port 0 to disable - Enhance order timeout logging with status, age, token pair, lifecycle - Improve monitoring output with status breakdown counts - Show all terminal states in final summary (filled/expired/failed/cancelled) - Update README and CLI docs with monitoring instructions
Add concurrent Prometheus metrics update loop that exports test progress and throughput metrics every second during performance test runs. This fixes "No Data" panels in the Overview dashboard. Remove redundant P50 delta panels from the comparison dashboard and adjust grid positions for cleaner layout.
- Create 7 core alerting rules (latency, error rate, throughput, resources, test execution) - Enable rule_files in Prometheus configuration - Add alerts volume mount in Docker Compose - Add Grafana annotations to show firing alerts on dashboard - Add container_memory_percent metric for CriticalMemoryUsage alert
- Add implementation plan: thoughts/plans/2026-02-13-cow-598-alerting-rules.md - Add implementation notes to ticket file documenting scope decisions - Update INDEX.md with plan entry and document cluster reference Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…aining-dashboards-resources-comparison COW-593 task 2 remaining dashboards resources comparison
…ential-dashboards-overview-api feat(grafana): add performance and API monitoring dashboards
…tended-prometheus-metrics COW-591 phase 2 extended prometheus metrics
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Review Focus
Most of the line count comes from comprehensive test coverage and metric definitions. Focus review on: (1) the callback integration pattern between
PrometheusExporterandMetricsStoreinexporter.py:102-127, (2) Prometheus metric naming conventions inmetrics.py, and (3) the CLI flag integration inrun.py.Summary
MetricsRegistrywith order counters, latency histograms, and throughput gaugesPrometheusExporterwith HTTP server and MetricsStore callback integration--prometheus-portCLI flag to enable exporter during test runscow-performance-testjobTest plan
mypytype checking passesrufflinting passes--prometheus-port 9091and verify metrics athttp://localhost:9091/metricsconfigs/prometheus.ymlMetrics exposed
cow_perf_orders_created_totalcow_perf_orders_submitted_totalcow_perf_orders_filled_totalcow_perf_orders_failed_totalcow_perf_orders_expired_totalcow_perf_orders_activecow_perf_submission_latency_secondscow_perf_orderbook_latency_secondscow_perf_settlement_latency_secondscow_perf_order_lifecycle_secondscow_perf_orders_per_secondcow_perf_test_info🤖 Generated with Claude Code