Skip to content

COW-591- Core prometheus exporters#15

Merged
lgahdl merged 16 commits intodevelopfrom
jefferson/cow-591-11-prometheus-exporters
Feb 26, 2026
Merged

COW-591- Core prometheus exporters#15
lgahdl merged 16 commits intodevelopfrom
jefferson/cow-591-11-prometheus-exporters

Conversation

@jeffersonBastos
Copy link
Collaborator

@jeffersonBastos jeffersonBastos commented Feb 6, 2026

Review Focus

Most of the line count comes from comprehensive test coverage and metric definitions. Focus review on: (1) the callback integration pattern between PrometheusExporter and MetricsStore in exporter.py:102-127, (2) Prometheus metric naming conventions in metrics.py, and (3) the CLI flag integration in run.py.

Summary

  • Implement Phase 1 Prometheus HTTP exporter for real-time performance test metrics
  • Add MetricsRegistry with order counters, latency histograms, and throughput gauges
  • Add PrometheusExporter with HTTP server and MetricsStore callback integration
  • Integrate --prometheus-port CLI flag to enable exporter during test runs
  • Add Prometheus scrape configuration for cow-performance-test job
  • Add M3 milestone task documentation for COW-591, COW-593, COW-598

Test plan

  • All 60 Prometheus-related unit and integration tests pass
  • mypy type checking passes
  • ruff linting passes
  • Manual verification: Start exporter with --prometheus-port 9091 and verify metrics at http://localhost:9091/metrics
  • Manual verification: Configure Prometheus to scrape the endpoint using configs/prometheus.yml

Metrics exposed

Metric Type Description
cow_perf_orders_created_total Counter Orders created
cow_perf_orders_submitted_total Counter Orders submitted to API
cow_perf_orders_filled_total Counter Orders successfully filled
cow_perf_orders_failed_total Counter Orders that failed
cow_perf_orders_expired_total Counter Orders that expired
cow_perf_orders_active Gauge Currently active orders
cow_perf_submission_latency_seconds Histogram Time to submit orders
cow_perf_orderbook_latency_seconds Histogram Time to orderbook acceptance
cow_perf_settlement_latency_seconds Histogram Time to settlement
cow_perf_order_lifecycle_seconds Histogram Full order lifecycle time
cow_perf_orders_per_second Gauge Current throughput
cow_perf_test_info Info Test metadata (scenario, git commit, etc.)

🤖 Generated with Claude Code

@linear
Copy link

linear bot commented Feb 6, 2026

Add real-time Prometheus HTTP exporter for performance test metrics:
- MetricsRegistry with order counters, latency histograms, throughput gauges
- PrometheusExporter with HTTP server and MetricsStore callback integration
- CLI --prometheus-port flag to enable exporter during test runs
- Prometheus scrape config for cow-performance-test job

Metrics exposed: cow_perf_orders_*, cow_perf_*_latency_seconds,
cow_perf_orders_per_second, cow_perf_test_info, etc.
@jeffersonBastos jeffersonBastos changed the base branch from develop to jefferson/cow-590-10-automated-reporting February 6, 2026 14:43
Add API, resource, per-trader, and baseline comparison metrics to
complete COW-591 Prometheus exporter deliverable:

- API metrics: requests counter, response time histogram, errors counter
- Resource metrics: container CPU, memory, network gauges
- Per-trader metrics: orders submitted/filled by trader index
- Comparison metrics: baseline percent change, regression detection

Update MetricsStore to pass container name with resource callbacks.
Add 21 new unit tests covering all Phase 2 functionality.
@jeffersonBastos jeffersonBastos changed the base branch from jefferson/cow-590-10-automated-reporting to develop February 6, 2026 17:00
@jeffersonBastos jeffersonBastos marked this pull request as ready for review February 6, 2026 17:59
jeffersonBastos and others added 12 commits February 10, 2026 11:35
Add two Grafana dashboards for monitoring performance tests:
- Overview dashboard: test progress, order rates, latency distributions
- API Performance dashboard: response times, throughput, error rates

Configure dashboard provisioning via docker-compose volume mount and
add explicit UID to Prometheus datasource for dashboard compatibility.
Add upload_app_data_with_retry() and get_open_order_count() methods
that were missing from the instrumented wrapper, causing AttributeError
when used in place of the underlying OrderbookClient.
Add three new dashboards completing the Grafana visualization suite:

- Resources dashboard: CPU, memory, network monitoring per container
- Comparison dashboard: baseline vs current with regression indicators
- Trader Activity dashboard: per-trader statistics and activity patterns

Update existing dashboards with cross-navigation links to all 5 dashboards.
… COW-593

Document Prometheus exporter phases and Grafana dashboard implementation
plans to track progress on metrics infrastructure work.
- Add prometheus_port config field with default 9091
- CLI uses config default, --prometheus-port 0 to disable
- Enhance order timeout logging with status, age, token pair, lifecycle
- Improve monitoring output with status breakdown counts
- Show all terminal states in final summary (filled/expired/failed/cancelled)
- Update README and CLI docs with monitoring instructions
Add concurrent Prometheus metrics update loop that exports test progress
and throughput metrics every second during performance test runs. This
fixes "No Data" panels in the Overview dashboard.

Remove redundant P50 delta panels from the comparison dashboard and
adjust grid positions for cleaner layout.
- Create 7 core alerting rules (latency, error rate, throughput, resources, test execution)
- Enable rule_files in Prometheus configuration
- Add alerts volume mount in Docker Compose
- Add Grafana annotations to show firing alerts on dashboard
- Add container_memory_percent metric for CriticalMemoryUsage alert
- Add implementation plan: thoughts/plans/2026-02-13-cow-598-alerting-rules.md
- Add implementation notes to ticket file documenting scope decisions
- Update INDEX.md with plan entry and document cluster reference

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…aining-dashboards-resources-comparison

COW-593 task 2 remaining dashboards resources comparison
…ential-dashboards-overview-api

feat(grafana): add performance and API monitoring dashboards
…tended-prometheus-metrics

COW-591 phase 2 extended prometheus metrics
@lgahdl lgahdl merged commit 3fccf49 into develop Feb 26, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants