Skip to content

feat: msgspec optimizations, docs#74

Merged
viraatc merged 5 commits intomainfrom
feat/viraatc-msgspec-2
Mar 9, 2026
Merged

feat: msgspec optimizations, docs#74
viraatc merged 5 commits intomainfrom
feat/viraatc-msgspec-2

Conversation

@viraatc
Copy link
Copy Markdown
Collaborator

@viraatc viraatc commented Dec 24, 2025

What does this PR do?

adapter/types.py

Category          Payload  Metric   Old Mean (ns)   New Mean (ns)   Change (%)   Status
Request Decode    Empty    Time          1,784.94          687.42      -61.49%   ✅ Faster
                  100      Time          1,825.40          825.16      -54.79%   ✅ Faster
                  1k       Time          2,638.88        1,144.91      -56.61%   ✅ Faster
                  8k       Time          7,835.93        3,701.16      -52.77%   ✅ Faster
                  32k      Time         25,603.05       13,484.60      -47.33%   ✅ Faster
Request Encode    Empty    Time          1,154.86          585.11      -49.33%   ✅ Faster
                  100      Time          1,240.40          353.21      -71.52%   ✅ Faster
                  1k       Time          1,925.28          729.57      -62.10%   ✅ Faster
                  8k       Time          6,939.44        3,459.37      -50.15%   ✅ Faster
                  32k      Time         24,288.44       13,044.01      -46.29%   ✅ Faster
Response Decode   Empty    Time            844.84          909.48       +7.65%   ⚠️ Mixed*
                  100      Time            870.97          816.52       -6.25%   ✅ Faster
                  1k       Time          1,242.13        1,068.67      -13.96%   ✅ Faster
                  8k       Time          3,417.19        3,197.63       -6.42%   ✅ Faster
                  32k      Time         10,882.39       11,665.54       +7.20%   ❌ Slower
Response Encode   Empty    Time            694.69          512.92      -26.17%   ✅ Faster
                  100      Time            592.84          576.37       -2.78%   ✅ Faster
                  1k       Time          1,068.31          960.80      -10.06%   ✅ Faster
                  8k       Time          3,610.59        3,702.92       +2.56%   ⚠️ Slower
                  32k      Time         11,633.41       13,129.86      +12.86%   ❌ Slower
SSE Decode        Empty    Time            905.41          350.42      -61.30%   ✅ Faster
                  100      Time            987.80          366.81      -62.87%   ✅ Faster
                  1k       Time          1,626.45          792.64      -51.27%   ✅ Faster
                  8k       Time          5,952.73        3,399.48      -42.89%   ✅ Faster
                  32k      Time         20,951.99       12,020.98      -42.63%   ✅ Faster
SSE Encode        Empty    Time            512.95          208.20      -59.41%   ✅ Faster
                  100      Time            404.95          284.60      -29.72%   ✅ Faster
                  1k       Time          1,338.40          611.12      -54.34%   ✅ Faster
                  8k       Time          6,526.64        3,395.30      -47.98%   ✅ Faster
                  32k      Time         24,750.72       13,478.79      -45.54%   ✅ Faster

core/types.py

Category              Test Case   Old Mean (ns)   New Mean (ns)   Delta (%)   Status
Query Decode          empty              397.88          371.99       -6.5%   ✅ Faster
                      100                439.80          382.39      -13.1%   ✅ Faster
                      1k                 558.74          507.33       -9.2%   ✅ Faster
                      8k                 919.55          887.50       -3.5%   ✅ Faster
                      32k              2,589.57        2,785.50       +7.6%   ⚠️ Slower
Query Encode          empty              249.47          182.47      -26.9%   ✅ Faster
                      100                223.41          186.24      -16.6%   ✅ Faster
                      1k                 286.72          232.52      -18.9%   ✅ Faster
                      8k                 337.22          289.43      -14.2%   ✅ Faster
                      32k                922.07        1,167.86      +26.7%   ⚠️ Slower
QueryResult Decode    empty              519.12          440.54      -15.1%   ✅ Faster
                      100                564.37          465.78      -17.5%   ✅ Faster
                      1k                 710.82          615.42      -13.4%   ✅ Faster
                      8k               1,202.23        1,156.27       -3.8%   ✅ Faster
                      32k              2,794.45        2,894.15       +3.6%   ⚠️ Slower
QueryResult Encode    empty              201.83          134.20      -33.5%   ✅ Faster
                      100                201.81          135.57      -32.8%   ✅ Faster
                      1k                 278.38          199.90      -28.2%   ✅ Faster
                      8k                 356.35          243.89      -31.6%   ✅ Faster
                      32k                934.11        1,130.12      +21.0%   ⚠️ Slower
StreamChunk Decode    empty              204.22          155.15      -24.0%   ✅ Faster
                      100                225.08          186.12      -17.3%   ✅ Faster
                      1k                 361.74          318.78      -11.9%   ✅ Faster
                      8k                 782.52          753.45       -3.7%   ✅ Faster
                      32k              2,163.58        2,475.16      +14.4%   ⚠️ Slower
StreamChunk Encode    empty              159.30           89.61      -43.8%   ✅ Faster
                      100                161.62          116.73      -27.8%   ✅ Faster
                      1k                 232.94          181.53      -22.1%   ✅ Faster
                      8k                 309.05          230.97      -25.3%   ✅ Faster
                      32k                902.02        1,191.24      +32.1%   ⚠️ Slower

E2E

latest main:

  Query            32 chars (  128 B): issue=  460,000 msg/s    58.9 MB/s, recv=  459,600 msg/s    58.8 MB/s
  QueryResult      32 chars (  113 B): issue=  450,800 msg/s    50.9 MB/s, recv=  450,400 msg/s    50.9 MB/s
  StreamChunk      32 chars (   96 B): issue=  541,600 msg/s    52.0 MB/s, recv=  541,200 msg/s    52.0 MB/s
  Query           128 chars (  224 B): issue=  467,800 msg/s   104.8 MB/s, recv=  467,400 msg/s   104.7 MB/s
  QueryResult     128 chars (  209 B): issue=  467,200 msg/s    97.6 MB/s, recv=  466,800 msg/s    97.6 MB/s
  StreamChunk     128 chars (  192 B): issue=  519,400 msg/s    99.7 MB/s, recv=  519,000 msg/s    99.6 MB/s
  Query           512 chars (  609 B): issue=  448,400 msg/s   273.1 MB/s, recv=  448,000 msg/s   272.8 MB/s
  QueryResult     512 chars (  594 B): issue=  439,800 msg/s   261.2 MB/s, recv=  439,400 msg/s   261.0 MB/s
  StreamChunk     512 chars (  577 B): issue=  485,200 msg/s   280.0 MB/s, recv=  484,800 msg/s   279.7 MB/s
  Query          1024 chars ( 1121 B): issue=  410,800 msg/s   460.5 MB/s, recv=  410,400 msg/s   460.1 MB/s
  QueryResult    1024 chars ( 1106 B): issue=  414,800 msg/s   458.8 MB/s, recv=  414,400 msg/s   458.3 MB/s
  StreamChunk    1024 chars ( 1089 B): issue=  443,600 msg/s   483.1 MB/s, recv=  443,200 msg/s   482.6 MB/s
  Query          4096 chars ( 4193 B): issue=  301,800 msg/s  1265.4 MB/s, recv=  301,400 msg/s  1263.8 MB/s
  QueryResult    4096 chars ( 4178 B): issue=  297,600 msg/s  1243.4 MB/s, recv=  297,200 msg/s  1241.7 MB/s
  StreamChunk    4096 chars ( 4161 B): issue=  330,600 msg/s  1375.6 MB/s, recv=  330,200 msg/s  1374.0 MB/s
  Query         16384 chars (16481 B): issue=  148,000 msg/s  2439.2 MB/s, recv=  147,600 msg/s  2432.6 MB/s
  QueryResult   16384 chars (16466 B): issue=  145,600 msg/s  2397.4 MB/s, recv=  145,089 msg/s  2389.0 MB/s
  StreamChunk   16384 chars (16449 B): issue=  159,200 msg/s  2618.7 MB/s, recv=  158,800 msg/s  2612.1 MB/s
  Query         32768 chars (32865 B): issue=   73,600 msg/s  2418.9 MB/s, recv=   73,200 msg/s  2405.7 MB/s
  QueryResult   32768 chars (32850 B): issue=   80,200 msg/s  2634.6 MB/s, recv=   79,800 msg/s  2621.4 MB/s
  StreamChunk   32768 chars (32833 B): issue=   72,800 msg/s  2390.2 MB/s, recv=   72,400 msg/s  2377.1 MB/s

MR:

  Query            32 chars (  101 B): issue=  512,400 msg/s    51.8 MB/s, recv=  512,000 msg/s    51.7 MB/s
  QueryResult      32 chars (   61 B): issue=  546,800 msg/s    33.4 MB/s, recv=  546,400 msg/s    33.3 MB/s
  StreamChunk      32 chars (   52 B): issue=  633,400 msg/s    32.9 MB/s, recv=  633,000 msg/s    32.9 MB/s
  Query           128 chars (  197 B): issue=  537,600 msg/s   105.9 MB/s, recv=  537,200 msg/s   105.8 MB/s
  QueryResult     128 chars (  157 B): issue=  524,800 msg/s    82.4 MB/s, recv=  524,400 msg/s    82.3 MB/s
  StreamChunk     128 chars (  148 B): issue=  595,600 msg/s    88.1 MB/s, recv=  595,278 msg/s    88.1 MB/s
  Query           512 chars (  582 B): issue=  507,800 msg/s   295.5 MB/s, recv=  507,400 msg/s   295.3 MB/s
  QueryResult     512 chars (  542 B): issue=  490,800 msg/s   266.0 MB/s, recv=  490,400 msg/s   265.8 MB/s
  StreamChunk     512 chars (  533 B): issue=  535,800 msg/s   285.6 MB/s, recv=  535,400 msg/s   285.4 MB/s
  Query          1024 chars ( 1094 B): issue=  439,800 msg/s   481.1 MB/s, recv=  439,572 msg/s   480.9 MB/s
  QueryResult    1024 chars ( 1054 B): issue=  449,800 msg/s   474.1 MB/s, recv=  449,400 msg/s   473.7 MB/s
  StreamChunk    1024 chars ( 1045 B): issue=  511,400 msg/s   534.4 MB/s, recv=  511,000 msg/s   534.0 MB/s
  Query          4096 chars ( 4166 B): issue=  329,200 msg/s  1371.4 MB/s, recv=  328,800 msg/s  1369.8 MB/s
  QueryResult    4096 chars ( 4126 B): issue=  329,600 msg/s  1359.9 MB/s, recv=  329,200 msg/s  1358.3 MB/s
  StreamChunk    4096 chars ( 4117 B): issue=  358,200 msg/s  1474.7 MB/s, recv=  357,800 msg/s  1473.1 MB/s
  Query         16384 chars (16454 B): issue=  158,200 msg/s  2603.0 MB/s, recv=  157,800 msg/s  2596.4 MB/s
  QueryResult   16384 chars (16414 B): issue=  161,000 msg/s  2642.7 MB/s, recv=  160,600 msg/s  2636.1 MB/s
  StreamChunk   16384 chars (16405 B): issue=  169,200 msg/s  2775.7 MB/s, recv=  168,800 msg/s  2769.2 MB/s
  Query         32768 chars (32838 B): issue=   67,600 msg/s  2219.8 MB/s, recv=   67,200 msg/s  2206.7 MB/s
  QueryResult   32768 chars (32798 B): issue=   79,800 msg/s  2617.3 MB/s, recv=   79,400 msg/s  2604.2 MB/s
  StreamChunk   32768 chars (32789 B): issue=   78,600 msg/s  2577.2 MB/s, recv=   78,200 msg/s  2564.1 MB/s


Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

Copilot AI review requested due to automatic review settings December 24, 2025 10:00
@viraatc viraatc requested a review from a team as a code owner December 24, 2025 10:00
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 24, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @viraatc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on significantly enhancing the performance and efficiency of data serialization and deserialization within the system by deeply integrating and optimizing the msgspec library. It introduces advanced msgspec features like garbage collector control, default value omission, and array-like encoding to critical data structures, alongside implementing buffer reuse for network communication. Furthermore, it provides extensive documentation on msgspec best practices and new performance tests to validate these optimizations.

Highlights

  • msgspec Optimization Documentation: A new comprehensive guide (.cursor/rules/msgspec-patterns.mdc) has been added, detailing 13 best practices for optimizing msgspec usage, including Struct definitions, omit_defaults, encode_into, gc=False, array_like=True, and MessagePack usage.
  • Core Type Performance Enhancements: Key data structures (Query, QueryResult, StreamChunk) in src/inference_endpoint/core/types.py have been updated to leverage msgspec's gc=False, omit_defaults=True, and array_like=True options for improved memory management and serialization efficiency.
  • ZMQ Buffer Reuse: The ZMQPushSocket in src/inference_endpoint/endpoint_client/zmq_utils.py now utilizes msgspec.msgpack.Encoder.encode_into() with a pre-allocated bytearray, significantly reducing memory allocations during message sending in hot loops.
  • OpenAI Adapter Optimizations: msgspec.Struct definitions within the OpenAI adapters (src/inference_endpoint/openai/openai_adapter.py and src/inference_endpoint/openai/openai_msgspec_adapter.py) have been enhanced with gc=False and omit_defaults=True for better performance in handling frequent, short-lived message objects.
  • New Performance Benchmarks: A new test file (tests/performance/test_msgspec_serialization.py) has been introduced to benchmark the encoding and decoding performance of Query, QueryResult, and StreamChunk types, including tests to ensure linear scaling with payload size.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables msgspec performance optimizations across core data structures and adds comprehensive documentation for msgspec usage patterns. The changes focus on reducing garbage collection overhead and message payload sizes through strategic use of gc=False, omit_defaults=True, and array_like=True options, along with buffer reuse in ZMQ serialization.

  • Added gc=False to all msgspec Struct definitions to reduce GC pauses by 75x
  • Implemented encode_into() with buffer reuse in ZMQ push socket to eliminate per-message allocations
  • Added comprehensive msgspec optimization guide with 13 rules and performance benchmarks
  • Created performance test suite validating encoding/decoding throughput and linear scaling

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tests/performance/test_msgspec_serialization.py New performance test suite measuring encoding/decoding throughput for Query, QueryResult, and StreamChunk with varying payload sizes
src/inference_endpoint/openai/openai_msgspec_adapter.py Added gc=False optimization to all OpenAI API struct types (ChatMessage, ChatCompletionRequest, responses)
src/inference_endpoint/openai/openai_adapter.py Added gc=False and omit_defaults=True to SSE streaming message structs (SSEDelta, SSEChoice, SSEMessage)
src/inference_endpoint/endpoint_client/zmq_utils.py Implemented buffer reuse pattern in ZMQPushSocket using encode_into() with 4MB pre-allocated buffer
src/inference_endpoint/core/types.py Added gc=False, omit_defaults=True, and array_like=True to Query, QueryResult, and StreamChunk with documentation explaining optimization choices
.cursor/rules/msgspec-patterns.mdc New comprehensive documentation covering 13 msgspec optimization patterns with examples, benchmarks, and decision trees

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several performance optimizations for msgspec serialization, such as enabling gc=False, omit_defaults=True, and array_like=True on data structures, and switching to encode_into for buffer reuse. It also adds comprehensive documentation on msgspec usage patterns and a new set of performance tests.

The optimizations applied to the data types are well-justified and clearly documented. However, I've identified a critical bug in the ZMQ utility where encode_into is used incorrectly, which could lead to sending corrupted data. Additionally, several code examples in the new documentation exhibit the same flawed buffer handling pattern. I have provided suggestions to correct these issues. The new performance tests are a valuable addition for monitoring serialization performance.

Copilot AI review requested due to automatic review settings December 24, 2025 11:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@viraatc viraatc changed the title feat: enable msgspec optimizations, add msgspec docs feat: msgspec optimizations, docs Dec 25, 2025
@viraatc viraatc force-pushed the feat/viraatc-msgspec-2 branch from 3e37a28 to 14a141b Compare January 9, 2026 10:15
@viraatc viraatc mentioned this pull request Jan 13, 2026
5 tasks
Copilot AI review requested due to automatic review settings January 14, 2026 12:21
@viraatc viraatc force-pushed the feat/viraatc-msgspec-2 branch from 14a141b to 4d4139b Compare January 14, 2026 12:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@viraatc viraatc force-pushed the feat/viraatc-msgspec-2 branch 2 times, most recently from 685ce9f to b37b149 Compare January 21, 2026 13:08
Copilot AI review requested due to automatic review settings March 6, 2026 23:11
@viraatc viraatc force-pushed the feat/viraatc-msgspec-2 branch from b37b149 to 068e430 Compare March 6, 2026 23:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@viraatc viraatc force-pushed the feat/viraatc-msgspec-2 branch from 068e430 to 45f70e5 Compare March 6, 2026 23:21
Copilot AI review requested due to automatic review settings March 9, 2026 22:56
@viraatc viraatc force-pushed the feat/viraatc-msgspec-2 branch from 45f70e5 to 6ed6f70 Compare March 9, 2026 22:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@viraatc viraatc force-pushed the feat/viraatc-msgspec-2 branch from f6f36bc to 1f94898 Compare March 9, 2026 23:04
Copilot AI review requested due to automatic review settings March 9, 2026 23:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 9, 2026 23:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@viraatc viraatc force-pushed the feat/viraatc-msgspec-2 branch from 9608a0d to e2d26f7 Compare March 9, 2026 23:30
@viraatc viraatc merged commit c36843c into main Mar 9, 2026
4 checks passed
@viraatc viraatc deleted the feat/viraatc-msgspec-2 branch March 9, 2026 23:34
@github-actions github-actions bot locked and limited conversation to collaborators Mar 9, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants