Skip to content

BOM Gone? #553

@r2d2Proton

Description

@r2d2Proton

The best I can tell, in Lottie-Windows Loader.cs, StorageFIleLoader.cs, LottieCompositionReader.cs there is an effort to process different UTF and non-UTF files:

public static LottieComposition? ReadLottieCompositionFromJsonStream(Stream stream, Options options, out IReadOnlyList<(string Code, string Description)> issues)
{
    ReadStreamToUTF8(stream, out var utf8Text);
    return ReadLottieCompositionFromJson(utf8Text, options, out issues);
}

static void ReadStreamToUTF8(Stream stream, out ReadOnlySpan<byte> utf8Text)
{
    // This buffer size is chosen to be about 50% larger than
    // the average file size in our corpus, so most of the time
    // we don't need to reallocate and copy.
    var buffer = new byte[150000];
    var bytesRead = stream.Read(buffer, 0, buffer.Length);
    var spaceLeftInBuffer = buffer.Length - bytesRead;

    while (spaceLeftInBuffer == 0)
    {
        // Might be more to read. Expand the buffer.
        var newBuffer = new byte[buffer.Length * 2];
        spaceLeftInBuffer = buffer.Length;
        var totalBytesRead = buffer.Length;
        Array.Copy(buffer, 0, newBuffer, 0, totalBytesRead);
        buffer = newBuffer;
        bytesRead = stream.Read(buffer, totalBytesRead, buffer.Length - totalBytesRead);
        spaceLeftInBuffer -= bytesRead;
    }

    utf8Text = new ReadOnlySpan<byte>(buffer);
    NormalizeTextToUTF8(ref utf8Text);
}

static void NormalizeTextToUTF8(ref ReadOnlySpan<byte> text)
{
    if (text.Length >= 1)
    {
        switch (text[0])
        {
            case 0xEF:
                // Possibly start of UTF8 BOM.
                if (text.Length >= 3 && text[1] == 0xBB && text[2] == 0xBF)
                {
                    // UTF8 BOM. Step over the UTF8 BOM.
                    text = text.Slice(3, text.Length - 3);
                }
                break;  
        }
    }
}

The best I can tell, when loading UTF-8 files with:

var filePicker = new FileOpenPicker{};
StorageFile? file = await filePicker.PickSingleFileAsync();

The BOM has already been eaten by a function before this is called. The beginning of the buffer is the start of the "{"JSON.

Simplified version:

static void ReadStreamToUTF8(Stream stream, out ReadOnlySpan<byte> utf8Text)
{
    // This buffer size is chosen to be about 50% larger than the average file size in our corpus, so most of the time
    var buffer = new byte[stream.Length];
    var bytesRead = stream.Read(buffer, 0, buffer.Length);
    utf8Text = new ReadOnlySpan<byte>(buffer);
    NormalizeTextToUTF8(ref utf8Text);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions