Skip to content

Comments

Optimize MemoryStream Base64 serialization in JSON marshallers#4335

Merged
muhammad-othman merged 1 commit intodevelopmentfrom
muhamoth/DOTNET-6445-SES-service-allocates-huge-amounts-of-memory
Feb 21, 2026
Merged

Optimize MemoryStream Base64 serialization in JSON marshallers#4335
muhammad-othman merged 1 commit intodevelopmentfrom
muhamoth/DOTNET-6445-SES-service-allocates-huge-amounts-of-memory

Conversation

@muhammad-othman
Copy link
Member

@muhammad-othman muhammad-othman commented Feb 18, 2026

Description

This PR reduces memory allocations when serializing MemoryStream (blob) properties in JSON-based service marshallers by eliminating an unnecessary intermediate Base64 string allocation.

When marshalling MemoryStream properties (e.g., raw email data in SESV2), the generated code previously called:
context.Writer.WriteStringValue(StringUtils.FromMemoryStream(requestObject.Data));

This created a two-step process:

  1. FromMemoryStream() converts the entire stream to a Base64 string
  2. WriteStringValue() re-encodes that string to UTF-8 bytes in the JSON output

This intermediate string allocation is entirely unnecessary since Utf8JsonWriter natively supports writing Base64 directly from binary data via WriteBase64StringValue(ReadOnlySpan<byte>) which is what was add in this PR.
This improvement applies to all JSON-based services that marshal MemoryStream/blob properties, not just SESV2.

Comparing the old FromMemoryStream + WriteStringValue approach vs the new WriteBase64StringValue approach for serializing a 5MB MIME message:

Method Job Runtime Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
Old_FromMemoryStream_WriteStringValue .NET 6.0 .NET 6.0 29.147 ms 0.6410 ms 1.8391 ms 1.00 0.09 2000.0000 2000.0000 2000.0000 195.37 MB 1.00
New_WriteBase64StringValue .NET 6.0 .NET 6.0 7.105 ms 0.2020 ms 0.5829 ms 0.24 0.02 1000.0000 1000.0000 1000.0000 49.13 MB 0.25
Old_FromMemoryStream_WriteStringValue .NET 8.0 .NET 8.0 37.141 ms 4.7770 ms 14.0850 ms 1.16 0.65 2000.0000 2000.0000 2000.0000 195.37 MB 1.00
New_WriteBase64StringValue .NET 8.0 .NET 8.0 6.491 ms 0.2993 ms 0.8685 ms 0.20 0.08 - - - 33.12 MB 0.17
Old_FromMemoryStream_WriteStringValue .NET Core 3.1 .NET Core 3.1 16.626 ms 0.3524 ms 0.9882 ms 1.00 0.08 1000.0000 1000.0000 1000.0000 61.59 MB 1.00
New_WriteBase64StringValue .NET Core 3.1 .NET Core 3.1 6.498 ms 0.3580 ms 1.0498 ms 0.39 0.07 1000.0000 1000.0000 1000.0000 34.22 MB 0.56

On .NET 8.0: 75% faster, 83% less memory allocated, and zero GC pressure (Gen0/1/2 all eliminated).

Comparing SES V1 vs V2 sending a 5MB email with attachment before this fix:

Method Mean Error StdDev Median Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
SimpleEmail_SendRawEmail 792.0 ms 23.96 ms 66.39 ms 794.4 ms 1.01 0.12 2000.0000 2000.0000 2000.0000 277.6 MB 1.00
SimpleEmailV2_SendEmail 829.8 ms 85.64 ms 252.50 ms 745.5 ms 1.05 0.33 1000.0000 1000.0000 1000.0000 67.5 MB 0.24

Same benchmark after applying this fix:

Method Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
SimpleEmail_SendRawEmail 752.5 ms 15.61 ms 44.02 ms 1.00 0.08 2000.0000 2000.0000 2000.0000 277.61 MB 1.00
SimpleEmailV2_SendEmail 541.9 ms 16.71 ms 47.68 ms 0.72 0.08 - - - 33.24 MB 0.12

After the fix, SES V2 allocates only 33.24 MB (down from 67.5 MB — a 51% reduction), is 28% faster, and generates zero GC pressure. Compared to V1, V2 now allocates 88% less memory.

Note: The V1 and V2 benchmarks were run separately to avoid assembly referencing issues between the two SDK versions.

Motivation and Context

#1922

Testing

  • Benchmarked using BenchmarkDotNet comparing old vs new serialization approaches with a 5MB MIME email payload
  • DRY_RUN-7b49ff35-34e6-4522-903a-9f9804867aad.
  • We have protocol tests that was affected by this change and did succeed in the dry run.
  • Added unit tests to test WriteBase64StringValue compared to FromMemoryStream.

Breaking Changes Assessment

  1. Identify all breaking changes including the following details:
    • What functionality was changed?
    • How will this impact customers?
    • Why does this need to be a breaking change and what are the most notable non-breaking alternatives?
    • Are best practices being followed?
    • How have you tested this breaking change?
  2. Has a senior/+ engineer been assigned to review this PR?

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have read the README document
  • I have added tests to cover my changes
  • All new and existing tests passed

License

  • I confirm that this pull request can be released under the Apache 2 license

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request optimizes MemoryStream Base64 serialization in JSON-based service marshallers by eliminating an unnecessary intermediate Base64 string allocation. The change modifies the code generation templates to use Utf8JsonWriter.WriteBase64StringValue(ReadOnlySpan<byte>) directly instead of converting to a string first with StringUtils.FromMemoryStream().

Changes:

  • Added new StringUtils.WriteBase64StringValue() method that writes directly to Utf8JsonWriter
  • Updated JsonRPCStructureMarshaller.tt template to use the new optimized method
  • Added dev config file with patch-level versioning

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

File Description
sdk/src/Core/Amazon.Runtime/Internal/Util/StringUtils.cs Added new WriteBase64StringValue method to optimize MemoryStream serialization
generator/ServiceClientGeneratorLib/Generators/Marshallers/JsonRPCStructureMarshaller.tt Updated template to call WriteBase64StringValue instead of FromMemoryStream + WriteStringValue
generator/ServiceClientGeneratorLib/Generators/Marshallers/JsonRPCStructureMarshaller.cs Generated code from template with path changes (non-functional)
generator/.DevConfigs/5ba72673-5d59-4b0a-9206-6237d869c0f0.json Added dev config with patch versioning and changelog message

@muhammad-othman muhammad-othman force-pushed the muhamoth/DOTNET-6445-SES-service-allocates-huge-amounts-of-memory branch from 0fe969c to 71ed8d4 Compare February 19, 2026 17:48
@peterrsongg peterrsongg self-requested a review February 19, 2026 18:01
@GarrettBeatty GarrettBeatty self-requested a review February 19, 2026 18:26
"updateMinimum": true,
"type": "patch",
"changeLogMessages": [
"Optimize MemoryStream Base64 serialization in JSON marshallers"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, could do "Fix #1922" as part of the changelog message, that would link the issue too

{
value.Position = 0;
value.Read(array, 0, (int)value.Length);
writer.WriteBase64StringValue(new ReadOnlySpan<byte>(array, 0, (int)value.Length));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two issues here.

  1. Shouldn't we use the actual bytesRead here instead of (int)value.Length? The Read operation can read less than you requested. It's actually quite common for very large memory stream reads.

  2. Also, we should keep reading until the bytesRead is zero, or else we may do an incomplete read of the stream and only partially write

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggested code in the try block

        try
        {
            value.Position = 0;
            int totalBytesRead = 0;
            int bytesToRead = (int)value.Length;
            
            while (totalBytesRead < bytesToRead)
            {
                int bytesRead = value.Read(array, totalBytesRead, bytesToRead - totalBytesRead);
                if (bytesRead == 0)
                    throw new IOException($"Stream ended prematurely: expected {bytesToRead} bytes, read {totalBytesRead}");
                totalBytesRead += bytesRead;
            }
            
            writer.WriteBase64StringValue(new ReadOnlySpan<byte>(array, 0, totalBytesRead));
        }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about Stream.Read potentially returning fewer bytes than requested.

However the original FromMemoryStream method uses the same single-read pattern value.Read(array, 0, (int)value.Length) followed by Convert.ToBase64String(array, 0, (int)value.Length), and we didn't face partial reads so far.
The reason for non partial reads so far is that in practice, MemoryStream.Read always returns the full requested bytes. Looking at the code https://github.com/dotnet/runtime/blob/960dca4391a731a20b25a92cdb500ef737bfcbbd/src/libraries/System.Private.CoreLib/src/System/IO/MemoryStream.cs#L320, the Read does a single Buffer.BlockCopy at the end, it never produces partial reads since it's purely an in-memory operation backed by a byte array.

The Stream.Read contract does allow partial reads, and the official docs state: An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached. So while the current MemoryStream implementation won't exhibit this behavior, relying on it is technically relying on an implementation detail rather than the documented contract.

But since both methods (FromMemoryStream and WriteBase64StringValue) accept MemoryStream (not Stream), and the parameter type constrains this to the known implementation, I think the current approach is safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, taking a look at the source code for MemoryStream makes me feel better, but i am still a bit concerned b/c the docs state that if the data isn't fully available then it wouldn't do that. Also, good point about it being that way originally, but if we can make it more resilient i don't see a harm in that. @normj what do you think?

@muhammad-othman muhammad-othman force-pushed the muhamoth/DOTNET-6445-SES-service-allocates-huge-amounts-of-memory branch from 71ed8d4 to 8db63f8 Compare February 20, 2026 21:10
@muhammad-othman muhammad-othman merged commit ec1d129 into development Feb 21, 2026
7 checks passed
@muhammad-othman muhammad-othman deleted the muhamoth/DOTNET-6445-SES-service-allocates-huge-amounts-of-memory branch February 21, 2026 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants