Skip to content

Decoders with custom offset #22

@SirJayEarn

Description

@SirJayEarn

Hi!

first of all: thanks a lot for bringing Elm into the world! This language kept me motivated digging more and more into functional programming, whereas with other languages I felt overwhelmed and discouraged pretty quickly.

Short

Recently I've been writing a parser for midi files in elm, using elm/bytes. Overall it worked pretty nicely. But there was one thing I was missing: writing a custom decoder where an offset could be provided.

Issue

Parts of a midi file contain a list-like structure. Items are (potentially) compressed in a way that one byte needs to be read in order to know how to process this same and the following bytes. Depending on the most significant bit, this byte might contain some meta-information. If it doesn't this means the current list-item has the same meta-information as the latest item, so it is just dropped here. Implying that the byte that was just read is not to be read as just a single byte, but in various ways, depending on the previous meta-information. This leads to a situation where some kind of lookahead would be useful. Something that goes like: Ok the byte is of this structure, so keep it and read it as two nibbles, which define how to read the following bytes. Or: Oh the byte is of this other structure, so just forget about it and use the most recent meta-information and continue like normal, BUT start where the byte we just read started, because it is not the meta-info byte, but already part of the data. So we are basically one byte off.

Solution (possibly)

I think in this case it would be nice to just set back the offset, so we effectively forget about the byte we just decoded. Because there is no way to reset the offset when using andThen I carried this already read byte around and needed to provided it to the following decoders, where I had to prepend it conditionally. This made the code harder to read, understand and reuse.

In the source code of elm/bytes andThen and mapN use an offset internally. I guess exposing the data constructor Decoder (Bytes -> Int -> (Int, a)), instead of just the type constructor Decoder a would be all that is needed to be able to build custom map / andThen decoders for doing this kind of lookahead decoding.

Illustration

Maybe my explanation was a little confusing, so this code hopefully makes it easier to understand

Bytes.Decode.unsignedInt8
    |> Bytes.Decode.andThen
        (\currentPotentialStatusByte ->
            let
                isCompressed =
                    currentPotentialStatusByte < 128

                currentStatusByte =
                    if isCompressed then
                        previousStatusByte

                    else
                        currentPotentialStatusByte

                ( mEventName, channel ) =
                    statusByteToNibbles currentStatusByte

                readFirstByte =
                    if isCompressed then
                        Bytes.Decode.succeed currentPotentialStatusByte

                    else
                        Bytes.Decode.unsignedInt8
...
            readFirstByte |> Bytes.Decode.andThen preReadVariableLengthValueDecoder |> Bytes.Decode.andThen Bytes.Decode.string |> Bytes.Decode.map ((++) "System Exclusive Begin Event" >> NotYetSupportedEvent)

and then I need to carry around readFirstByte and map all the following decoders. But I think this would be nicer:

Bytes.Decode.Decoder
    (\bites offset ->
        let
            (Bytes.Decode.Decoder uint8Decode) =
                Bytes.Decode.unsignedInt8

            (currentPotentialStatusByte, newOffset) =
                uint8Decode bites offset

            isCompressed =
                currentPotentialStatusByte < 128

            currentStatusByte =
                if isCompressed then
                    previousStatusByte

                else
                    currentPotentialStatusByte

            ( mEventName, channel ) =
                statusByteToNibbles currentStatusByte

            nextOffset =
                if isCompressed then
                    offset

                else
                    newOffset
...
        withOffset nextOffset readVariableLengthValueDecoder |> Bytes.Decode.andThen Bytes.Decode.string |> Bytes.Decode.map ((++) "System Exclusive Begin Event" >> NotYetSupportedEvent)

Where withOffset would be something like

withOffset : Int -> Bytes.Decode.Decoder a
withOffset offset (Bytes.Decode.Decoder decode) =
	Decoder <| \bites _ -> decode bites offset

So here I could just use readVariableLengthValueDecoder, which is also used in other places. And I would not need to create a preReadVariableLengthValueDecoder - which does the same thing but either reads the first byte, or doesn't depending on the given argument

edit: during the past days I watched a bunch of elm talks (mostly held by Richard Feldman). I now understand much better why the type Decoder is opaque. Still I think having a way to adjust the offset would be very nice. Maybe through adding a utility function like withOffset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions