feat: add metrics_aggregator service for real-time metric computation#188
feat: add metrics_aggregator service for real-time metric computation#188nv-alicheng merged 12 commits intomainfrom
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new metrics_aggregator service for real-time metric computation from ZMQ events. The implementation is well-structured, with clear separation of concerns for event processing, metric emission, and tokenization. The accompanying unit and end-to-end tests are comprehensive and cover many edge cases, ensuring the reliability of the new service.
My review includes a few suggestions to improve maintainability and performance. These include simplifying resource management using context managers, improving typing for better static analysis, and a design suggestion to potentially increase event processing throughput by offloading tokenization to a background task.
src/inference_endpoint/async_utils/services/metrics_aggregator/__main__.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/aggregator.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/emitter.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/emitter.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/token_metrics.py
Outdated
Show resolved
Hide resolved
a1d6ba4 to
b51f11b
Compare
f2ec2f8 to
14fdfdc
Compare
nvzhihanj
left a comment
There was a problem hiding this comment.
Review Council — Multi-AI Code Review
Found 3 issues across 3 files.
src/inference_endpoint/async_utils/services/metrics_aggregator/emitter.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/aggregator.py
Outdated
Show resolved
Hide resolved
Review Council — Multi-AI Code Review CouncilReviewed by: Claude Found 3 issues across 3 files:
Each issue is posted as an inline comment on the relevant file and line.
*Codex independently identified "a likely batch-shutdown bug in the aggregator" before sandbox restrictions prevented it from producing structured output. 🤖 Generated with Claude Code |
nvzhihanj
left a comment
There was a problem hiding this comment.
Review Council — Multi-AI Code Review
Found 10 issues (2 high, 6 medium, 2 low).
src/inference_endpoint/async_utils/services/metrics_aggregator/emitter.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/aggregator.py
Outdated
Show resolved
Hide resolved
tests/unit/async_utils/services/metrics_aggregator/test_aggregator.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/token_metrics.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/event_logger/__main__.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/metrics_table.py
Show resolved
Hide resolved
nvzhihanj
left a comment
There was a problem hiding this comment.
Review Council — Multi-AI Code Review
Found 10 issues (2 high, 6 medium, 2 low).
src/inference_endpoint/async_utils/services/metrics_aggregator/emitter.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/aggregator.py
Outdated
Show resolved
Hide resolved
tests/unit/async_utils/services/metrics_aggregator/test_aggregator.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/token_metrics.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/event_logger/__main__.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/metrics_table.py
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/emitter.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/aggregator.py
Outdated
Show resolved
Hide resolved
tests/unit/async_utils/services/metrics_aggregator/test_aggregator.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/token_metrics.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/event_logger/__main__.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/async_utils/services/metrics_aggregator/metrics_table.py
Show resolved
Hide resolved
Review Council — Multi-AI Code Review CouncilReviewed by: Claude (Codex timed out) | Depth: thorough Found 10 issues across 8 files. 🔴 Must Fix (high)Issues that will cause incorrect behavior or crashes in production.
🟡 Should Fix (medium)Real issues under specific conditions or design flaws that will compound.
🔵 Consider (low)Valid improvements that could be follow-ups.
Each issue is posted as an inline comment on the relevant file and line. 🤖 Generated with Claude Code |
14fdfdc to
43a3769
Compare
- MetricsAggregator: subscribes to ZMQ events, computes QPS/latency/TTFT/TPOT - MetricsTable: columnar storage for efficient percentile calculations - TokenMetrics: ISL/OSL token-level metrics from PromptData - MetricsEmitter: periodic metric publishing - Service entry point with CLI interface - Comprehensive unit and e2e tests
… .tokenize call - only .token_count is used
…stop tracking events for warmup+accuracy
…n streaming mode, fix race condition in TPOT as part of this change
43a3769 to
1ce35f4
Compare
src/inference_endpoint/async_utils/services/metrics_aggregator/metrics_table.py
Fixed
Show fixed
Hide fixed
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
…le from triggering
What does this PR do?
Type of change
Related issues
Testing
Checklist