Optimize MemoryStream Base64 serialization in JSON marshallers#4335
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request optimizes MemoryStream Base64 serialization in JSON-based service marshallers by eliminating an unnecessary intermediate Base64 string allocation. The change modifies the code generation templates to use Utf8JsonWriter.WriteBase64StringValue(ReadOnlySpan<byte>) directly instead of converting to a string first with StringUtils.FromMemoryStream().
Changes:
- Added new
StringUtils.WriteBase64StringValue()method that writes directly to Utf8JsonWriter - Updated
JsonRPCStructureMarshaller.tttemplate to use the new optimized method - Added dev config file with patch-level versioning
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| sdk/src/Core/Amazon.Runtime/Internal/Util/StringUtils.cs | Added new WriteBase64StringValue method to optimize MemoryStream serialization |
| generator/ServiceClientGeneratorLib/Generators/Marshallers/JsonRPCStructureMarshaller.tt | Updated template to call WriteBase64StringValue instead of FromMemoryStream + WriteStringValue |
| generator/ServiceClientGeneratorLib/Generators/Marshallers/JsonRPCStructureMarshaller.cs | Generated code from template with path changes (non-functional) |
| generator/.DevConfigs/5ba72673-5d59-4b0a-9206-6237d869c0f0.json | Added dev config with patch versioning and changelog message |
0fe969c to
71ed8d4
Compare
| "updateMinimum": true, | ||
| "type": "patch", | ||
| "changeLogMessages": [ | ||
| "Optimize MemoryStream Base64 serialization in JSON marshallers" |
There was a problem hiding this comment.
nit, could do "Fix #1922" as part of the changelog message, that would link the issue too
| { | ||
| value.Position = 0; | ||
| value.Read(array, 0, (int)value.Length); | ||
| writer.WriteBase64StringValue(new ReadOnlySpan<byte>(array, 0, (int)value.Length)); |
There was a problem hiding this comment.
I think there are two issues here.
-
Shouldn't we use the actual
bytesReadhere instead of(int)value.Length? TheReadoperation can read less than you requested. It's actually quite common for very large memory stream reads. -
Also, we should keep reading until the bytesRead is zero, or else we may do an incomplete read of the stream and only partially write
There was a problem hiding this comment.
suggested code in the try block
try
{
value.Position = 0;
int totalBytesRead = 0;
int bytesToRead = (int)value.Length;
while (totalBytesRead < bytesToRead)
{
int bytesRead = value.Read(array, totalBytesRead, bytesToRead - totalBytesRead);
if (bytesRead == 0)
throw new IOException($"Stream ended prematurely: expected {bytesToRead} bytes, read {totalBytesRead}");
totalBytesRead += bytesRead;
}
writer.WriteBase64StringValue(new ReadOnlySpan<byte>(array, 0, totalBytesRead));
}
There was a problem hiding this comment.
Good point about Stream.Read potentially returning fewer bytes than requested.
However the original FromMemoryStream method uses the same single-read pattern value.Read(array, 0, (int)value.Length) followed by Convert.ToBase64String(array, 0, (int)value.Length), and we didn't face partial reads so far.
The reason for non partial reads so far is that in practice, MemoryStream.Read always returns the full requested bytes. Looking at the code https://github.com/dotnet/runtime/blob/960dca4391a731a20b25a92cdb500ef737bfcbbd/src/libraries/System.Private.CoreLib/src/System/IO/MemoryStream.cs#L320, the Read does a single Buffer.BlockCopy at the end, it never produces partial reads since it's purely an in-memory operation backed by a byte array.
The Stream.Read contract does allow partial reads, and the official docs state: An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached. So while the current MemoryStream implementation won't exhibit this behavior, relying on it is technically relying on an implementation detail rather than the documented contract.
But since both methods (FromMemoryStream and WriteBase64StringValue) accept MemoryStream (not Stream), and the parameter type constrains this to the known implementation, I think the current approach is safe.
There was a problem hiding this comment.
Okay, taking a look at the source code for MemoryStream makes me feel better, but i am still a bit concerned b/c the docs state that if the data isn't fully available then it wouldn't do that. Also, good point about it being that way originally, but if we can make it more resilient i don't see a harm in that. @normj what do you think?
71ed8d4 to
8db63f8
Compare
Description
This PR reduces memory allocations when serializing
MemoryStream(blob) properties in JSON-based service marshallers by eliminating an unnecessary intermediate Base64 string allocation.When marshalling
MemoryStreamproperties (e.g., raw email data inSESV2), the generated code previously called:context.Writer.WriteStringValue(StringUtils.FromMemoryStream(requestObject.Data));This created a two-step process:
FromMemoryStream()converts the entire stream to a Base64stringWriteStringValue()re-encodes that string to UTF-8 bytes in the JSON outputThis intermediate string allocation is entirely unnecessary since
Utf8JsonWriternatively supports writing Base64 directly from binary data viaWriteBase64StringValue(ReadOnlySpan<byte>)which is what was add in this PR.This improvement applies to all JSON-based services that marshal
MemoryStream/blob properties, not justSESV2.Comparing the old
FromMemoryStream+WriteStringValueapproach vs the newWriteBase64StringValueapproach for serializing a 5MB MIME message:On .NET 8.0: 75% faster, 83% less memory allocated, and zero GC pressure (Gen0/1/2 all eliminated).
Comparing SES V1 vs V2 sending a 5MB email with attachment before this fix:
Same benchmark after applying this fix:
After the fix, SES V2 allocates only 33.24 MB (down from 67.5 MB — a 51% reduction), is 28% faster, and generates zero GC pressure. Compared to V1, V2 now allocates 88% less memory.
Motivation and Context
#1922
Testing
BenchmarkDotNetcomparing old vs new serialization approaches with a 5MB MIME email payloadDRY_RUN-7b49ff35-34e6-4522-903a-9f9804867aad.WriteBase64StringValuecompared toFromMemoryStream.Breaking Changes Assessment
Screenshots (if appropriate)
Types of changes
Checklist
License