Conversation
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@19e4b776998f9ee42bfb1a183e8144c718b47651Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@19e4b776998f9ee42bfb1a183e8144c718b47651Last updated for commit: |
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📥 CommitsReviewing files that changed from the base of the PR and between 5de94a2a35eb54d6119c70a8ec0ea37cc7bcba81 and 19e4b77. 📒 Files selected for processing (12)
WalkthroughThis PR adds comprehensive support for handling Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
d9bd238 to
5de94a2
Compare
When asked to parse prometheus endpoint that publish unknown metrics it fails with error. This commit treats unknown as gauges and adds unit tests. Signed-off-by: Rafi Wiener <rwiener@nvidia.com>
5de94a2 to
19e4b77
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/server_metrics/test_accumulator.py (1)
114-155:⚠️ Potential issue | 🟡 MinorAdd assertions that verify UNKNOWN metrics produce gauge-equivalent output in
endpoint_summaries.The new parametrized case covers
PrometheusMetricType.UNKNOWN, but the assertions only check structural properties (result type, endpoint presence, summary count). The PR's core claim — that UNKNOWN is treated identically to GAUGE — is not exercised at the assertion level: neither thatmetric_nameexists in the per-endpoint metrics map, nor that scalar statistics (e.g.mean,min,max) are present as they would be for a GAUGE.✅ Suggested additional assertions
assert result.endpoint_summaries is not None assert len(result.endpoint_summaries) == 1 + + # Verify the metric is present and carries scalar (gauge-equivalent) stats + summary = list(result.endpoint_summaries.values())[0] + assert metric_name in summary.metrics + series = summary.metrics[metric_name].series + assert len(series) > 0 + stats = series[0] + assert stats.mean is not None + assert stats.min is not None + assert stats.max is not None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/server_metrics/test_accumulator.py` around lines 114 - 155, The test currently only checks structural properties but doesn't assert that UNKNOWN metrics are converted to gauge-like scalars; update test_export_results_with_data to verify that for the returned ServerMetricsResults (from ServerMetricsAccumulator.export_results) the endpoint_summaries contains the metric_name in the per-endpoint metrics map and that the metric entry includes scalar statistics (e.g., mean, min, max, count) matching the supplied samples; reference the ServerMetricsResults.endpoint_summaries and the metric key (metric_name) to assert presence and verify the scalar fields (mean/min/max/count) have the expected numeric values for the generated samples so UNKNOWN behaves like GAUGE.
🧹 Nitpick comments (3)
src/aiperf/server_metrics/csv_exporter.py (1)
42-47: The STAT_KEYS_MAP entry for UNKNOWN is dead code and can be removed.UNKNOWN metrics are correctly written to CSV because
GaugeMetricData.typeis hardcoded toPrometheusMetricType.GAUGE(not UNKNOWN). When the accumulator converts UNKNOWN metrics toGaugeMetricData, they are grouped under GAUGE in_group_metrics_by_typeand exported in the GAUGE section. TheSTAT_KEYS_MAPentry for UNKNOWN will never be accessed since metrics are only grouped by GAUGE, COUNTER, or HISTOGRAM. Remove line 44 as unnecessary.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiperf/server_metrics/csv_exporter.py` around lines 42 - 47, Remove the dead STAT_KEYS_MAP entry for PrometheusMetricType.UNKNOWN: delete the mapping key PrometheusMetricType.UNKNOWN that points to GAUGE_STAT_KEYS inside the STAT_KEYS_MAP definition so the map only contains GAUGE, COUNTER, and HISTOGRAM; verify no other code refers to PrometheusMetricType.UNKNOWN in STAT_KEYS_MAP (search for STAT_KEYS_MAP, PrometheusMetricType.UNKNOWN, GAUGE_STAT_KEYS) and run tests to ensure CSV export behavior is unchanged.tests/unit/server_metrics/test_export_stats_basic.py (1)
97-114:TestUnknownExportStats— logic is correct; consider also asserting the result type.The expected values (avg=20.0, min=10.0, max=30.0) are arithmetically correct for
[10.0, 20.0, 30.0].result.statsis accessed directly afterassert result is not None; ifstatswereNone, this would surface as anAttributeErrorrather than a clear test-failure message. Addingassert result.stats is not Nonebefore the field assertions would give a clearer signal. Optionally, assertingisinstance(result, GaugeSeries)would also confirm the correct return type end-to-end.♻️ Suggested improvement
assert result is not None + assert result.stats is not None + assert isinstance(result, GaugeSeries) # UNKNOWN routes to GaugeSeries assert result.stats.avg == 20.0 assert result.stats.min == 10.0 assert result.stats.max == 30.0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/server_metrics/test_export_stats_basic.py` around lines 97 - 114, The test TestUnknownExportStats currently asserts result is not None then accesses result.stats directly; add an explicit assertion that result.stats is not None to surface a clear failure if stats are missing and optionally assert the concrete return type (e.g., isinstance(result, GaugeSeries)) to confirm UNKNOWN is routed to gauge logic; update the test around compute_stats (called with PrometheusMetricType.UNKNOWN, get_gauge, make_time_filter) to include assert result.stats is not None and, if desired, assert isinstance(result, GaugeSeries) before checking avg/min/max.tests/unit/server_metrics/test_csv_exporter.py (1)
338-341: Prefer==overisto test value equality rather than object identity.Using
isasserts that both keys reference the same list object, which is an implementation detail. If the backingcsv_exporter.pyis ever refactored to create a copy (e.g.,STAT_KEYS_MAP[UNKNOWN] = list(GAUGE_STAT_KEYS)), this test would silently break while the behavior remains correct.==expresses the actual intent — that the stat-key lists are identical in content.♻️ Proposed fix
- assert ( - STAT_KEYS_MAP[PrometheusMetricType.UNKNOWN] - is STAT_KEYS_MAP[PrometheusMetricType.GAUGE] - ) + assert ( + STAT_KEYS_MAP[PrometheusMetricType.UNKNOWN] + == STAT_KEYS_MAP[PrometheusMetricType.GAUGE] + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/server_metrics/test_csv_exporter.py` around lines 338 - 341, The assertion is using object identity instead of value equality; update the test to compare contents by replacing the `is` check with an equality check so that STAT_KEYS_MAP[PrometheusMetricType.UNKNOWN] == STAT_KEYS_MAP[PrometheusMetricType.GAUGE]; this keeps the intent clear and resilient if the lists are copies (referencing STAT_KEYS_MAP and PrometheusMetricType.UNKNOWN / PrometheusMetricType.GAUGE to locate the assertion).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/server_metrics/test_csv_exporter.py`:
- Around line 333-341: The CSV exporter currently omits UNKNOWN metrics because
section_order only lists GAUGE/COUNTER/HISTOGRAM and _group_metrics_by_type keys
metrics by their raw type; fix by either adding PrometheusMetricType.UNKNOWN to
the section_order used by _generate_content or by normalizing UNKNOWN to
PrometheusMetricType.GAUGE inside _group_metrics_by_type (or both for clarity),
and add an integration test that creates a metric with
type=PrometheusMetricType.UNKNOWN and asserts a CSV row is emitted; reference
STAT_KEYS_MAP, PrometheusMetricType.UNKNOWN, section_order, _generate_content,
and _group_metrics_by_type when updating the code and tests.
In `@tests/unit/server_metrics/test_export_stats_basic.py`:
- Around line 9-11: The imports in the test import block are out of alphabetical
order causing the ruff pre-commit failure: move the PrometheusMetricType import
before the TimeRangeFilter import so that aiperf.common.enums is imported prior
to aiperf.common.models (swap the order of PrometheusMetricType and
TimeRangeFilter in the import list).
---
Outside diff comments:
In `@tests/unit/server_metrics/test_accumulator.py`:
- Around line 114-155: The test currently only checks structural properties but
doesn't assert that UNKNOWN metrics are converted to gauge-like scalars; update
test_export_results_with_data to verify that for the returned
ServerMetricsResults (from ServerMetricsAccumulator.export_results) the
endpoint_summaries contains the metric_name in the per-endpoint metrics map and
that the metric entry includes scalar statistics (e.g., mean, min, max, count)
matching the supplied samples; reference the
ServerMetricsResults.endpoint_summaries and the metric key (metric_name) to
assert presence and verify the scalar fields (mean/min/max/count) have the
expected numeric values for the generated samples so UNKNOWN behaves like GAUGE.
---
Nitpick comments:
In `@src/aiperf/server_metrics/csv_exporter.py`:
- Around line 42-47: Remove the dead STAT_KEYS_MAP entry for
PrometheusMetricType.UNKNOWN: delete the mapping key
PrometheusMetricType.UNKNOWN that points to GAUGE_STAT_KEYS inside the
STAT_KEYS_MAP definition so the map only contains GAUGE, COUNTER, and HISTOGRAM;
verify no other code refers to PrometheusMetricType.UNKNOWN in STAT_KEYS_MAP
(search for STAT_KEYS_MAP, PrometheusMetricType.UNKNOWN, GAUGE_STAT_KEYS) and
run tests to ensure CSV export behavior is unchanged.
In `@tests/unit/server_metrics/test_csv_exporter.py`:
- Around line 338-341: The assertion is using object identity instead of value
equality; update the test to compare contents by replacing the `is` check with
an equality check so that STAT_KEYS_MAP[PrometheusMetricType.UNKNOWN] ==
STAT_KEYS_MAP[PrometheusMetricType.GAUGE]; this keeps the intent clear and
resilient if the lists are copies (referencing STAT_KEYS_MAP and
PrometheusMetricType.UNKNOWN / PrometheusMetricType.GAUGE to locate the
assertion).
In `@tests/unit/server_metrics/test_export_stats_basic.py`:
- Around line 97-114: The test TestUnknownExportStats currently asserts result
is not None then accesses result.stats directly; add an explicit assertion that
result.stats is not None to surface a clear failure if stats are missing and
optionally assert the concrete return type (e.g., isinstance(result,
GaugeSeries)) to confirm UNKNOWN is routed to gauge logic; update the test
around compute_stats (called with PrometheusMetricType.UNKNOWN, get_gauge,
make_time_filter) to include assert result.stats is not None and, if desired,
assert isinstance(result, GaugeSeries) before checking avg/min/max.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between e358413 and 5de94a2a35eb54d6119c70a8ec0ea37cc7bcba81.
📒 Files selected for processing (12)
src/aiperf/server_metrics/accumulator.pysrc/aiperf/server_metrics/csv_exporter.pysrc/aiperf/server_metrics/export_stats.pysrc/aiperf/server_metrics/json_exporter.pysrc/aiperf/server_metrics/parquet_exporter.pysrc/aiperf/server_metrics/storage.pytests/server_metrics/test_storage.pytests/unit/server_metrics/test_accumulator.pytests/unit/server_metrics/test_csv_exporter.pytests/unit/server_metrics/test_export_stats_basic.pytests/unit/server_metrics/test_export_stats_edge_cases_aggressive.pytests/unit/server_metrics/test_storage_edge_cases.py
| class TestStatKeysMap: | ||
| """Test STAT_KEYS_MAP covers all expected metric types.""" | ||
|
|
||
| def test_unknown_metric_type_uses_gauge_stat_keys(self): | ||
| """Test UNKNOWN type maps to the same stat keys as GAUGE.""" | ||
| assert ( | ||
| STAT_KEYS_MAP[PrometheusMetricType.UNKNOWN] | ||
| is STAT_KEYS_MAP[PrometheusMetricType.GAUGE] | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "section_order|UNKNOWN|PrometheusMetricType" --type py src/aiperf/server_metrics/csv_exporter.pyRepository: ai-dynamo/aiperf
Length of output: 1239
🏁 Script executed:
sed -n '170,200p' src/aiperf/server_metrics/csv_exporter.pyRepository: ai-dynamo/aiperf
Length of output: 1244
🏁 Script executed:
sed -n '213,250p' src/aiperf/server_metrics/csv_exporter.pyRepository: ai-dynamo/aiperf
Length of output: 1515
🏁 Script executed:
rg -n "UNKNOWN.*csv|test.*UNKNOWN.*metric|integration.*csv" --type py tests/Repository: ai-dynamo/aiperf
Length of output: 42
🏁 Script executed:
rg -n "UNKNOWN|test_.*metric" tests/unit/server_metrics/test_csv_exporter.py | head -30Repository: ai-dynamo/aiperf
Length of output: 937
Add UNKNOWN metrics to CSV output by including it in section_order or normalizing to GAUGE.
The STAT_KEYS_MAP correctly maps PrometheusMetricType.UNKNOWN to GAUGE_STAT_KEYS (line 44), but _generate_content does not process UNKNOWN metrics. The section_order list (lines 186-189) includes only GAUGE, COUNTER, and HISTOGRAM. Since _group_metrics_by_type stores metrics keyed by their raw type without normalizing UNKNOWN to GAUGE (line 237), any metric with type=UNKNOWN will be silently omitted from CSV output. Either add PrometheusMetricType.UNKNOWN to section_order or normalize UNKNOWN types to GAUGE during the grouping step. Additionally, add an integration test that verifies UNKNOWN-typed metrics produce rows in the CSV output.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/server_metrics/test_csv_exporter.py` around lines 333 - 341, The
CSV exporter currently omits UNKNOWN metrics because section_order only lists
GAUGE/COUNTER/HISTOGRAM and _group_metrics_by_type keys metrics by their raw
type; fix by either adding PrometheusMetricType.UNKNOWN to the section_order
used by _generate_content or by normalizing UNKNOWN to
PrometheusMetricType.GAUGE inside _group_metrics_by_type (or both for clarity),
and add an integration test that creates a metric with
type=PrometheusMetricType.UNKNOWN and asserts a CSV row is emitted; reference
STAT_KEYS_MAP, PrometheusMetricType.UNKNOWN, section_order, _generate_content,
and _group_metrics_by_type when updating the code and tests.
There was a problem hiding this comment.
section_order will not have UNKNOWN in it since _group_metrics_by_type in csv_exporter.py line 211 iterates endpoint_summary.metrics.items and it type is defined as
metrics: SerializeAsAny[
dict[
str,
GaugeMetricData | CounterMetricData | HistogramMetricData,
]
that's ok because in accumulator.py in line 247
case PrometheusMetricType.GAUGE | PrometheusMetricType.UNKNOWN:
metrics[base_name] = GaugeMetricData(
description=metric_entry.description,
series=[series_stats],
)
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
When asked to parse prometheus endpoint that publish unknown metrics it fails with error. This commit treats unknown as gauges and adds unit tests.
Summary by CodeRabbit
Bug Fixes
Tests