Skip to content

Brotli very inefficient with smaller writes #36245

@Daniel15

Description

@Daniel15

When writing to a Brotli stream one line at a time, the compressed output is actually larger than the uncompressed input.

Repro:

using System;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace BrotliTest
{
    class Program
    {
        static async Task Main(string[] args)
        {
            const int NUMBER_OF_LINES = 10000;
            const string SAMPLE_STRING = "hello this should compress well\n";

            var inputString = string.Concat(Enumerable.Repeat(SAMPLE_STRING, NUMBER_OF_LINES));
            var inputBytes = Encoding.UTF8.GetBytes(inputString);
            Console.WriteLine($"Input size: {inputBytes.Length} bytes");

            using (var outputMemory = new MemoryStream())
            using (var outputStream = new BrotliStream(outputMemory, CompressionLevel.Fastest))
            {
                await outputStream.WriteAsync(inputBytes);
                Console.WriteLine($"Output size (all at once): {outputMemory.Length} bytes");
            }

            using (var outputMemory = new MemoryStream())
            using (var outputStream = new BrotliStream(outputMemory, CompressionLevel.Fastest))
            {
                var bytes = Encoding.UTF8.GetBytes(SAMPLE_STRING);
                for (var i = 0; i < NUMBER_OF_LINES; i++)
                {
                    await outputStream.WriteAsync(bytes);
                }
                Console.WriteLine($"Output size (line by line): {outputMemory.Length} bytes");
            }

            using (var outputMemory = new MemoryStream())
            using (var outputStream = new BrotliStream(outputMemory, CompressionLevel.Fastest))
            {
                foreach (var inputByte in inputBytes)
                {
                    await outputStream.WriteAsync(new[] { inputByte });
                }

                Console.WriteLine($"Output size (byte by byte): {outputMemory.Length} bytes");
            }

            Console.ReadKey();
        }
    }
}

Output:

Input size: 320000 bytes
Output size (all at once): 105 bytes
Output size (line by line): 350000 bytes
Output size (byte by byte): 1280000 bytes

Buffering the entire contents in memory then writing it all in a single write avoids the issue, but then that defeats the purpose of using a stream (may as well just use a function that compresses a byte array, and totally avoid streams).

Metadata

Metadata

Assignees

Labels

area-System.IO.CompressionenhancementProduct code improvement that does NOT require public API changes/additionsin-prThere is an active PR which will close this issue when it is mergedtenet-performancePerformance related issue

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions