Skip to content

chore(http-client): cleanup types, improve coverage, remove orjson#157

Closed
viraatc wants to merge 1 commit intomainfrom
feat/viraatc-cleanup-asserts
Closed

chore(http-client): cleanup types, improve coverage, remove orjson#157
viraatc wants to merge 1 commit intomainfrom
feat/viraatc-cleanup-asserts

Conversation

@viraatc
Copy link
Copy Markdown
Collaborator

@viraatc viraatc commented Mar 9, 2026

What does this PR do?

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

Copilot AI review requested due to automatic review settings March 9, 2026 21:22
@viraatc viraatc requested a review from a team as a code owner March 9, 2026 21:22
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 9, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions bot requested review from arekay-nv and nvzhihanj March 9, 2026 21:22
@viraatc viraatc changed the title drop orjson chore(http-client): cleanup types, improve coverage, remove orjson Mar 9, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on a significant refactor of the HTTP client, aiming to streamline its interface and improve internal consistency. A major part of this effort involved migrating from the orjson library to msgspec.json for all JSON handling, which contributes to a more unified and potentially performant serialization strategy. Additionally, the client's type annotations have been cleaned up, and its synchronous operational methods (poll, drain, shutdown) have received dedicated test coverage, ensuring a more reliable and easier-to-use component.

Highlights

  • Dependency Migration: The orjson dependency has been completely removed from the project and all its usages have been migrated to msgspec.json for JSON serialization and deserialization.
  • HTTP Client Refactoring: The AsyncHttpEndpointClient has been renamed to HTTPEndpointClient and refactored to provide a unified, synchronous interface for shutdown, poll, and drain operations, simplifying its usage.
  • Type Hint and Assertion Cleanup: Numerous type hints across config.py, http.py, and worker.py have been refined, and redundant assert statements have been removed, improving code clarity and type safety.
  • Improved Test Coverage: New integration tests have been added for the HTTPEndpointClient's poll, recv, drain, and shutdown methods, significantly enhancing the client's test coverage and robustness.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • pyproject.toml
    • Removed 'orjson' from project dependencies.
  • src/inference_endpoint/endpoint_client/init.py
    • Removed 'AsyncHttpEndpointClient' from the public exports.
    • Updated module docstring to reflect the unified HTTP client implementation.
  • src/inference_endpoint/endpoint_client/config.py
    • Updated type hints for 'adapter', 'accumulator', and 'worker_pool_transport' to remove 'None' and add 'type: ignore[assignment]' for default initialization.
  • src/inference_endpoint/endpoint_client/http.py
    • Made the 'ConnectionPool.release' method idempotent to prevent errors on multiple calls.
    • Updated the type hint for 'InFlightRequest.connection' to be non-optional, with assignment ignored.
  • src/inference_endpoint/endpoint_client/http_client.py
    • Renamed 'AsyncHttpEndpointClient' to 'HTTPEndpointClient' to reflect its unified nature.
    • Removed the separate 'HTTPEndpointClient' class, integrating its functionality into the main client.
    • Updated the usage example in the docstring to reflect the new synchronous methods.
    • Refactored the 'shutdown' method to be synchronous, with an internal asynchronous helper '_shutdown_async'.
    • Optimized the 'drain' method to use 'list(iter(self.poll, None))' for efficiency.
  • src/inference_endpoint/endpoint_client/http_sample_issuer.py
    • Removed an unnecessary 'assert self.http_client.loop is not None' statement.
  • src/inference_endpoint/endpoint_client/worker.py
    • Updated type hints for internal HTTP and IPC components to be non-optional, with assignment ignored.
    • Removed several redundant 'assert' statements where types are guaranteed by initialization.
    • Added an explicit assertion message for 'event_logs_dir' when 'record_worker_events' is enabled.
  • src/inference_endpoint/evaluation/scoring.py
    • Replaced all imports and usages of 'orjson' with 'msgspec.json' for JSON operations.
    • Updated exception handling to catch 'msgspec.DecodeError' instead of 'orjson.JSONDecodeError'.
  • src/inference_endpoint/load_generator/session.py
    • Replaced all imports and usages of 'orjson' with 'msgspec.json' for JSON operations.
    • Updated JSON formatting for runtime settings and sample UUID map to use 'msgspec.json.format' and 'msgspec.json.encode'.
  • src/inference_endpoint/metrics/recorder.py
    • Replaced all imports and usages of 'orjson' with 'msgspec.json' for JSON operations.
    • Updated exception handling to catch 'msgspec.EncodeError' instead of 'orjson.JSONEncodeError'.
  • src/inference_endpoint/metrics/reporter.py
    • Replaced all imports and usages of 'orjson' with 'msgspec.json' for JSON operations.
    • Updated comments to reflect the change from 'orjson' to 'msgspec'.
    • Modified JSON serialization for reports to use 'msgspec.json.format' and 'msgspec.json.encode'.
    • Updated exception handling to catch 'msgspec.DecodeError' instead of 'orjson.JSONDecodeError'.
  • src/inference_endpoint/openai/openai_adapter.py
    • Removed 'orjson' import.
    • Replaced 'orjson.dumps' with 'msgspec.json.encode' for request encoding.
    • Replaced 'orjson.loads' with 'msgspec.json.decode' for response decoding.
  • src/inference_endpoint/utils/benchmark_httpclient.py
    • Updated imports to use 'HTTPEndpointClient' instead of 'AsyncHttpEndpointClient'.
    • Modified client creation to use the new 'HTTPEndpointClient'.
    • Changed client shutdown call to use the synchronous 'client.shutdown()' method.
  • tests/conftest.py
    • Replaced all imports and usages of 'orjson' with 'msgspec.json' for JSON operations.
  • tests/integration/endpoint_client/test_http_client.py
    • Added helper functions '_create_client' and '_make_query' for test setup.
    • Introduced a new pytest fixture 'http_client' for consistent client instantiation.
    • Added 'TestPoll' class with tests for non-blocking 'poll()' behavior.
    • Added 'TestRecv' class with asynchronous tests for blocking 'recv()' behavior.
    • Added 'TestDrain' class with tests for non-blocking 'drain()' behavior.
    • Added 'TestShutdown' class with tests for client shutdown idempotency and post-shutdown behavior.
  • tests/unit/metrics/test_recorder.py
    • Replaced all imports and usages of 'orjson' with 'msgspec.json' for JSON operations.
  • tests/unit/metrics/test_reporter.py
    • Replaced all imports and usages of 'orjson' with 'msgspec.json' for JSON operations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the orjson dependency and migrates JSON encoding/decoding to msgspec, while also simplifying the HTTP endpoint client API (consolidating the async/sync client split) and expanding integration coverage for polling/receiving/draining responses.

Changes:

  • Replace orjson usage with msgspec.json across metrics, evaluation, loadgen reporting, and tests.
  • Refactor the HTTP endpoint client to a single HTTPEndpointClient and update benchmarks/exports accordingly.
  • Add integration tests covering poll(), recv(), drain(), and shutdown() behavior.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/metrics/test_reporter.py Switch test event payload encoding from orjson to msgspec.json.
tests/unit/metrics/test_recorder.py Update unit tests to use msgspec.json.encode/decode for event row data.
tests/integration/endpoint_client/test_http_client.py Add integration tests for poll/recv/drain/shutdown and client fixture helpers.
tests/conftest.py Update test DB fixtures to encode JSON via msgspec.json.
src/inference_endpoint/utils/benchmark_httpclient.py Update benchmark to use HTTPEndpointClient and sync shutdown.
src/inference_endpoint/openai/openai_adapter.py Migrate request/response JSON encode/decode to msgspec.
src/inference_endpoint/metrics/reporter.py Replace JSON dump/load operations with msgspec.json formatting/decoding.
src/inference_endpoint/metrics/recorder.py Use msgspec.json.encode and msgspec.EncodeError in event recording.
src/inference_endpoint/load_generator/session.py Write JSON artifacts using msgspec formatting/encoding.
src/inference_endpoint/evaluation/scoring.py Replace JSON parsing/serialization with msgspec.json and update error handling.
src/inference_endpoint/endpoint_client/worker.py Simplify request/response handling and rely on idempotent pool release.
src/inference_endpoint/endpoint_client/http_sample_issuer.py Remove redundant loop assertion when scheduling response handler.
src/inference_endpoint/endpoint_client/http_client.py Consolidate client class and make shutdown synchronous via internal async helper.
src/inference_endpoint/endpoint_client/http.py Make ConnectionPool.release() explicitly idempotent; adjust InFlightRequest typing.
src/inference_endpoint/endpoint_client/config.py Make adapter/accumulator/transport “defaulted in post_init” (typing adjustments).
src/inference_endpoint/endpoint_client/init.py Update exports to only expose HTTPEndpointClient.
pyproject.toml Remove orjson dependency (keep msgspec).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 45 to 54

Usage:
with ManagedZMQContext.scoped() as zmq_ctx:
client = AsyncHttpEndpointClient(config, zmq_context=zmq_ctx)
client = HTTPEndpointClient(config, zmq_context=zmq_ctx)
client.issue(query)
response = await client.recv()
await client.shutdown()
response = client.poll() # Non-blocking, returns None if nothing ready
responses = client.drain() # Drain all available responses
# response = await client.recv() # Blocking; only if caller provides its own loop
client.shutdown() # Blocks until workers stop
"""
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that AsyncHttpEndpointClient/HTTPEndpointClient have been collapsed into a single class that typically owns a background event-loop thread (LoopManager.create_loop for non-"default" loops), the public issue() method needs to be thread-safe. With ZmqWorkerPoolTransport, send() may call loop.add_writer() via _ZmqSenderTransport, which must run on the event loop thread; calling it directly from the caller thread can sporadically raise or misbehave under backpressure.

Make issue() schedule the actual pool.send(next(self._worker_cycle), query) onto self.loop via call_soon_threadsafe (and advance the cycle on that same thread).

Copilot uses AI. Check for mistakes.
@viraatc viraatc closed this Mar 9, 2026
@github-actions github-actions bot locked and limited conversation to collaborators Mar 9, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants