Skip to content

test: refactor tests for readability and maintainability#189

Closed
nv-alicheng wants to merge 7 commits intofeat/alicheng-metrics-aggregatorfrom
feat/alicheng-test-cleanup
Closed

test: refactor tests for readability and maintainability#189
nv-alicheng wants to merge 7 commits intofeat/alicheng-metrics-aggregatorfrom
feat/alicheng-test-cleanup

Conversation

@nv-alicheng
Copy link
Copy Markdown
Collaborator

  • Convert repetitive test methods to parametrized tests
  • Split test_reporter.py into focused modules: test_reporter_report.py, test_reporter_rollup.py, test_reporter_stop_perf.py
  • Consolidate duplicate test patterns across test suites
  • Improve test organization in transforms, extractor, types, record tests

What does this PR do?

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

- Move record.py from async_utils/transport to core/
- Add PromptData, TextModelOutput, ErrorData types with msgspec Struct
- Deprecate str as response_output type in favor of TextModelOutput
- Add msgspec struct performance flags (gc=False, array_like=True)
- Fix threading safety issues in http_client, sample handler, recorder
- Update all imports across the codebase
- Bump default worker init timeout to 60s
@nv-alicheng nv-alicheng requested a review from a team as a code owner March 18, 2026 00:56
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 18, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions bot requested review from arekay-nv and nvzhihanj March 18, 2026 00:56
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the existing test suite to enhance readability, maintainability, and efficiency. By converting repetitive tests into parametrized formats and reorganizing test files, the changes aim to streamline future test development and make it easier to understand the purpose and behavior of various components. The overall impact is a cleaner, more robust testing infrastructure.

Highlights

  • Test Refactoring: Converted numerous repetitive test methods across various modules into more concise and maintainable parametrized tests using pytest.mark.parametrize.
  • Test File Organization: Split the monolithic test_reporter.py into three focused modules: test_reporter_report.py, test_reporter_rollup.py, and test_reporter_stop_perf.py to improve clarity and manageability.
  • Test Clarity and Brevity: Renamed many test methods to be more concise and descriptive, enhancing readability without losing context.
  • Mocking External Dependencies: Introduced a FakeTokenizer and an _mock_tokenizer fixture in test_token_metrics.py to prevent network calls during tests, making them faster and more reliable.
  • Consolidated Test Patterns: Unified common test patterns, such as error handling, serialization round-trips, and column remapping, into single parametrized tests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant and valuable refactoring of the test suite. The changes greatly improve the readability and maintainability of the tests by converting many repetitive test methods into concise, parametrized tests. Splitting test_reporter.py into more focused modules is also a good structural improvement.

I've found a couple of minor issues:

  • A small bug in an f-string in an error message within test_random_data_loader.
  • A removed assertion in test_preserves_original_columns that I believe should be restored to ensure the test is fully robust.

Overall, this is an excellent cleanup of the test code. My comments are minor suggestions for improvement.

@@ -218,7 +238,6 @@ def test_preserves_original_columns(self):
assert "answer" in result.columns
assert "prompt" in result.columns
assert list(result["question"]) == ["Q1", "Q2"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test is named test_preserves_original_columns. While it checks for the presence of the answer column, it no longer verifies that its content is preserved. It would be good to add back the assertion that checks the content of the answer column to make the test more robust and aligned with its name.

Suggested change
assert list(result["question"]) == ["Q1", "Q2"]
assert list(result["question"]) == ["Q1", "Q2"]
assert list(result["answer"]) == ["A1", "A2"]

nv-alicheng and others added 5 commits March 17, 2026 20:41
The SGLang adapter now returns TextModelOutput, not raw str.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New EventLoggerService subscribing to ZMQ events
- JSONLWriter for file-based event logging
- SQLWriter via SQLAlchemy for DB compatibility
- Writer base class with async context manager protocol
- ZMQ context: add socket_dir parameter for configurable IPC paths
- Add sqlalchemy dependency
- MetricsAggregator: subscribes to ZMQ events, computes QPS/latency/TTFT/TPOT
- MetricsTable: columnar storage for efficient percentile calculations
- TokenMetrics: ISL/OSL token-level metrics from PromptData
- MetricsEmitter: periodic metric publishing
- Service entry point with CLI interface
- Comprehensive unit and e2e tests
- Convert repetitive test methods to parametrized tests
- Split test_reporter.py into focused modules:
  test_reporter_report.py, test_reporter_rollup.py, test_reporter_stop_perf.py
- Consolidate duplicate test patterns across test suites
- Improve test organization in transforms, extractor, types, record tests
The error message said `* 0.2` but the assertion checks `* 1.2`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nv-alicheng nv-alicheng force-pushed the feat/alicheng-test-cleanup branch from b723356 to bc244a8 Compare March 18, 2026 03:43
@nv-alicheng nv-alicheng force-pushed the feat/alicheng-metrics-aggregator branch from f2ec2f8 to 14fdfdc Compare March 18, 2026 03:43
@nvzhihanj
Copy link
Copy Markdown
Collaborator

Review Council — Multi-AI Code Review

Reviewed by: Codex + Claude | Depth: thorough

Found 15 issues across 12 files.

Must Fix (high)

Issues that weaken test coverage or will cause incorrect behavior.

# File Line Category Reviewer Summary
1 tests/unit/async_utils/services/metrics_aggregator/test_token_metrics.py 41 testing Codex The autouse patch returns the same FakeTokenizer object for every call, so all TokenizePool workers share one tokeni...
2 tests/unit/dataset_manager/test_transforms.py 210 testing Codex This refactor now only checks result[output_col][0], and the multi-variable formatter case was reduced to a single-row...
3 tests/unit/metrics/test_reporter_report.py 35 testing Codex The split-out report test no longer checks report.n_samples_failed, even though this fixture contains one distinct fai...

Should Fix (medium)

Real issues under specific conditions, or gaps that will compound.

# File Line Category Reviewer Summary
4 src/inference_endpoint/openai/accumulator.py 91 bug Claude When output_chunks is empty, TextModelOutput(output=[], reasoning=None) is created. __post_init__ converts to `out...
5 src/inference_endpoint/core/types.py 250 api-contract Claude QueryResult.response_output is typed as `OUTPUT_TYPE
6 tests/unit/async_utils/services/metrics_aggregator/test_token_metrics.py 63 testing Claude The parametrized test_error_cases with action == "close_twice" has no assertion when error_type is None. The test ...
7 tests/unit/async_utils/services/metrics_aggregator/test_aggregator_e2e.py 152 testing Claude TestAggregatorE2E is marked @pytest.mark.integration but is located in tests/unit/. This class creates real ZMQ co...
8 tests/unit/core/test_record.py 34 testing Claude Test classes TestEventType, TestEventRecordConstruction, TestEncodeEventRecord, and TestEventRecordRoundTrip are...
9 tests/unit/core/test_types.py 36 testing Claude Test classes TestErrorData, TestQuerySerialization, TestQueryResultSerialization, TestStreamChunkSerialization, ...

Consider (low)

Valid improvements — could be follow-ups.

# File Line Category Reviewer Summary
10 src/inference_endpoint/utils/__init__.py 114 concurrency Claude _instance_lock is a class variable on SingletonMixin, meaning all subclasses share one lock. Not a bug (each checks ...
11 src/inference_endpoint/async_utils/transport/zmq/context.py 58 design Claude The socket_dir docstring says "This directory will be cleaned up when the context is cleaned up." This is only true fo...
12 src/inference_endpoint/async_utils/transport/zmq/context.py 87 design Claude `self.socket_dir: str
13 src/inference_endpoint/endpoint_client/worker.py 521 design Claude When error is a plain string, ErrorData is created with error_type="error" — generic and non-descriptive. Consider...
14 tests/unit/async_utils/services/event_logger/test_event_logger.py 329 testing Claude test_creates_log_dir_if_missing and test_not_a_directory_error don't test production code — they replicate logic man...
15 tests/unit/async_utils/services/metrics_aggregator/test_aggregator_e2e.py 193 testing Claude _publish_and_sleep uses hardcoded time.sleep(0.05) between publishes and time.sleep(0.5) for ZMQ slow-joiner — fla...

Note: Most issues point to source files in the base branch (feat/alicheng-metrics-aggregator), not files changed in this PR. Inline comments could not be posted on those files.

Copy link
Copy Markdown
Collaborator

@nvzhihanj nvzhihanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test

return list(range(len(text.split())))


@pytest.fixture(autouse=True)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Codex] high (testing): The autouse patch returns the same FakeTokenizer object for every call, so all TokenizePool workers share one tokenizer instance. If _get_thread_tokenizer() or its thread-local caching regresses to cross-thread reuse, test_multiple_workers and test_concurrent_thread_safe will still pass even though production code is supposed to keep one tokenizer per worker thread.

Copy link
Copy Markdown
Collaborator

@nvzhihanj nvzhihanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test

def test_token_count_returns_int(self):
with TokenizePool("gpt2", n_workers=1) as pool:
with TokenizePool("fake", n_workers=1) as pool:
count = pool.token_count("Hello world")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude] medium (testing): The parametrized test_error_cases with action == "close_twice" has no assertion when error_type is None. The test simply calls close() twice and returns without verifying behavior. Add an explicit assertion (e.g., assert pool._executor is None) to clarify test intent.

Copy link
Copy Markdown
Collaborator

@nvzhihanj nvzhihanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test

@@ -32,33 +32,37 @@


class TestEventType:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude] medium (testing): Test classes TestEventType, TestEventRecordConstruction, TestEncodeEventRecord, and TestEventRecordRoundTrip are missing @pytest.mark.unit markers. Per project testing standards, every test needs a marker.

Copy link
Copy Markdown
Collaborator

@nvzhihanj nvzhihanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test

@@ -36,150 +36,104 @@
class TestErrorData:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude] medium (testing): Test classes TestErrorData, TestQuerySerialization, TestQueryResultSerialization, TestStreamChunkSerialization, TestQueryResultWorkerPatterns, and TestMixedTypeSerialization are missing @pytest.mark.unit markers.

@nv-alicheng nv-alicheng force-pushed the feat/alicheng-metrics-aggregator branch from 14fdfdc to 43a3769 Compare March 20, 2026 00:34
@nv-alicheng nv-alicheng force-pushed the feat/alicheng-metrics-aggregator branch from 43a3769 to 1ce35f4 Compare March 24, 2026 01:20
@arekay-nv
Copy link
Copy Markdown
Collaborator

Closing for now. Will followup later.

@arekay-nv arekay-nv closed this Mar 24, 2026
@github-actions github-actions bot locked and limited conversation to collaborators Mar 24, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants