Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Pull request overview
This PR removes aiohttp as a dependency and replaces it with a custom HTTP client implementation built directly on asyncio's Protocol/Transport API, using httptools for HTTP parsing. The change also removes numerous ZMQ configuration parameters from test fixtures and client instantiation, replacing them with a cleaner transport abstraction.
Changes:
- Replaced aiohttp with custom HTTP client using httptools (llhttp parser)
- Refactored ZMQ transport into protocol-based abstraction with factory pattern
- Added CPU affinity utilities and automatic worker count detection based on NUMA topology
Reviewed changes
Copilot reviewed 40 out of 40 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/inference_endpoint/endpoint_client/http.py |
New custom HTTP client implementation with connection pooling |
src/inference_endpoint/endpoint_client/transport/ |
New transport abstraction layer for worker IPC |
src/inference_endpoint/endpoint_client/worker.py |
Refactored to use new HTTP client and transport abstractions |
src/inference_endpoint/endpoint_client/configs.py |
Removed AioHttpConfig and ZMQConfig, added new configuration options |
src/inference_endpoint/utils/cpu_affinity.py |
New CPU affinity utilities for NUMA-aware worker pinning |
tests/unit/endpoint_client/test_http.py |
Unit tests for HTTP request template builder |
tests/unit/endpoint_client/transport/test_zmq.py |
Unit tests for ZMQ transport layer |
| Multiple test files | Updated to use simplified client instantiation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary of ChangesHello @viraatc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly overhauls the client-side networking and worker management infrastructure. The core purpose is to achieve higher performance and greater control over HTTP requests and inter-process communication by moving away from Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and impressive refactoring by replacing aiohttp with a custom, high-performance HTTP client. The new implementation is well-structured, introducing a clean transport abstraction layer and performance-oriented features like a custom connection pool, CPU affinity management, and GC tuning. The code is modular, and the tests have been diligently updated to reflect the new architecture. My feedback includes a few suggestions for improvement, such as simplifying the async handling in the probe command and preventing potential duplicate HTTP headers. Overall, this is a high-quality contribution that significantly enhances the client's performance and maintainability.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 40 out of 40 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
will review after rebase. |
21c0fea to
db18136
Compare
db18136 to
be224de
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 39 out of 39 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
be224de to
1f33f7d
Compare
b68d3df to
c29c860
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 32 out of 32 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 35 out of 35 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 34 out of 34 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e0f8601 to
6069095
Compare
There was a problem hiding this comment.
Copilot reviewed 36 out of 36 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
77c044b to
caa3e38
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 38 out of 38 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
tests/integration/endpoint_client/test_worker.py:1
- The port number changed from 99999 to 59999. Port 99999 is outside the valid port range (1-65535), so 59999 is correct. However, consider using a port that's less likely to be in use, such as a higher ephemeral port or documenting why 59999 was chosen.
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 38 out of 38 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What does this PR do?
requires both:
MR includes:
new impl (http.py) improves over old (aiohttp) by:
dependencies updated:
aiohttp, addedhttptoolsdirectly(1)
better error-rates when oversubscribing ephemeral port limit.
example: offline mode with 60k queries:
better error-rates at high issue rates.
example: offline mode with 20k queries (within ephemeral port limit)
(2) higher throughput
Type of change
Related issues
Testing
Checklist