-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Open
Labels
area-System.IO.CompressionenhancementProduct code improvement that does NOT require public API changes/additionsProduct code improvement that does NOT require public API changes/additionsin-prThere is an active PR which will close this issue when it is mergedThere is an active PR which will close this issue when it is mergedtenet-performancePerformance related issuePerformance related issue
Milestone
Description
When writing to a Brotli stream one line at a time, the compressed output is actually larger than the uncompressed input.
Repro:
using System;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace BrotliTest
{
class Program
{
static async Task Main(string[] args)
{
const int NUMBER_OF_LINES = 10000;
const string SAMPLE_STRING = "hello this should compress well\n";
var inputString = string.Concat(Enumerable.Repeat(SAMPLE_STRING, NUMBER_OF_LINES));
var inputBytes = Encoding.UTF8.GetBytes(inputString);
Console.WriteLine($"Input size: {inputBytes.Length} bytes");
using (var outputMemory = new MemoryStream())
using (var outputStream = new BrotliStream(outputMemory, CompressionLevel.Fastest))
{
await outputStream.WriteAsync(inputBytes);
Console.WriteLine($"Output size (all at once): {outputMemory.Length} bytes");
}
using (var outputMemory = new MemoryStream())
using (var outputStream = new BrotliStream(outputMemory, CompressionLevel.Fastest))
{
var bytes = Encoding.UTF8.GetBytes(SAMPLE_STRING);
for (var i = 0; i < NUMBER_OF_LINES; i++)
{
await outputStream.WriteAsync(bytes);
}
Console.WriteLine($"Output size (line by line): {outputMemory.Length} bytes");
}
using (var outputMemory = new MemoryStream())
using (var outputStream = new BrotliStream(outputMemory, CompressionLevel.Fastest))
{
foreach (var inputByte in inputBytes)
{
await outputStream.WriteAsync(new[] { inputByte });
}
Console.WriteLine($"Output size (byte by byte): {outputMemory.Length} bytes");
}
Console.ReadKey();
}
}
}
Output:
Input size: 320000 bytes
Output size (all at once): 105 bytes
Output size (line by line): 350000 bytes
Output size (byte by byte): 1280000 bytes
Buffering the entire contents in memory then writing it all in a single write avoids the issue, but then that defeats the purpose of using a stream (may as well just use a function that compresses a byte array, and totally avoid streams).
masonwheeler, Daniel-Svensson and inklesspen1rusCopilot
Metadata
Metadata
Labels
area-System.IO.CompressionenhancementProduct code improvement that does NOT require public API changes/additionsProduct code improvement that does NOT require public API changes/additionsin-prThere is an active PR which will close this issue when it is mergedThere is an active PR which will close this issue when it is mergedtenet-performancePerformance related issuePerformance related issue