Skip to content

Optional fields do not follow protobuf specification #293

@johroj

Description

@johroj

I am aware that this package was mostly written mid-2022. From this thread, it appears that optional was not mentioned in the proto3 specification until somewhat later. But the current protobuf 3 specification states:

optional: (recommended) An optional field is in one of two possible states:

  • the field is set, and contains a value that was explicitly set or parsed from the wire. It will be serialized to the wire.
  • the field is unset, and will return the default value. It will not be serialized to the wire.

You can check to see if the value was explicitly set.

implicit: (not recommended) An implicit field has no explicit cardinality label and behaves as follows:

  • if the field is a message type, it behaves just like an optional field.
  • if the field is not a message, it has two states:
    • the field is set to a non-default (non-zero) value that was explicitly set or parsed from the wire. It will be serialized to the wire.
    • the field is set to the default (zero) value. It will not be serialized to the wire. In fact, you cannot determine whether the default (zero) value was set or parsed from the wire or not provided at all. For more on this subject, see Field Presence.

When encoding, ProtoBuf.jl seems to treat optional singular fields (proto2 or proto3) as the implicit fields of proto3, i.e. the value is only serialized if it is not the default. When decoding, ProtoBuf.jl always returns all values, but gives no information about which values were received and which were implicitly set to the default.

As a result, we cannot signal that a field was explicitly set or not, which can be an important source of information.

Example

syntax = "proto3";

message MyMessage {
    optional int64 value = 2;
}

results in

struct MyMessage
    value::Int64
end

function PB.decode(d::PB.AbstractProtoDecoder, ::Type{<:MyMessage})
    value = zero(Int64)
    while !PB.message_done(d)
        field_number, wire_type = PB.decode_tag(d)
        if field_number == 2
            value = PB.decode(d, Int64)
        else
            Base.skip(d, wire_type)
        end
    end
    return MyMessage(value) # How should we figure out if value was on the wire or not? 
end

function PB.encode(e::PB.AbstractProtoEncoder, x::MyMessage) # How can we tell that we only want to serialize a subset of the fields?
    initpos = position(e.io)
    x.value != zero(Int64) && PB.encode(e, 2, x.value) # For an optional message, we should unconditionally encode here!
    return position(e.io) - initpos
end

Proposal

I think not encoding optional fields when the value equals the default should be considered a bug.

The ability to represent partial messages is more of a missing feature. It could be solved by adding Union{Nothing, ...} around all types and encode/decode accordingly. The drawback of this approach is that the user would have to check something whenever accessing a field. One better option could be some optional wrapper Partial{MyMessage} which, in addition to the message itself, also stores a BitVector denoting which members are assigned.

I'm leaving this ticket here in case someone would like to give some input. Might draft a proposal in case I cannot work around this limitation some other way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions