Skip to content

Enum Forward Compatibility Not Working According to Avro Specification #93

@michelecoco

Description

@michelecoco

When deserializing Avro data with an enum value that doesn't exist in the reader's schema, the library violates the Avro specification.

Expected Behavior

According to the Avro specification:

if the writer’s symbol is not present in the reader’s enum and the reader has a default value, then that value is used, otherwise an error is signalled.

Therefore, when deserializing, if the writer’s enum symbol is missing from the reader’s enum:

  • If the reader has a default enum value, that default should be used.
  • If no default is defined, an error should be thrown.

Actual Behavior

The library currently returns the writer’s enum value even when it does not exist in the reader’s schema, ignoring the reader’s default value if present. Moreover, it does not throw an error when no default value is defined, violating the spec.

How to Reproduce

final class DefaultChoiceDTO
{
    public function __construct(
        public string $choice = "Unknown"
    ) {
    }
}

$normalizers = [
    new ObjectNormalizer(),
    new PropertyNormalizer(),
    new GetSetMethodNormalizer()
];
$encoders = [
    new AvroSerDeEncoder($this->recordSerializer)
];

$this->serializer = new Serializer($normalizers, $encoders);

public function testAddEnumSymbolsWithDefaultForwardCompatible(): void
{
    // V1 Schema (reader) - enum with 3 symbols and default
    $schemaV1 = AvroSchema::parse(json_encode([
        "type" => "record",
        "name" => "Event",
        "namespace" => "app.tests.dto",
        "fields" => [[
            "name" => "choice",
            "type" => [
                "type" => "enum",
                "name" => "Choices",
                "symbols" => ["Unknown", "First", "Second"],
                "default" => "Unknown"
            ],
            "default" => "Unknown"
        ]]
    ]));

    // V2 Schema (writer) - enum with 4 symbols
    $schemaV2 = AvroSchema::parse(json_encode([
        "type" => "record",
        "name" => "Event",
        "namespace" => "app.tests.dto",
        "fields" => [[
            "name" => "choice",
            "type" => [
                "type" => "enum",
                "name" => "Choices",
                "symbols" => ["Unknown", "First", "Second", "Third"],
                "default" => "Unknown"
            ],
            "default" => "Unknown"
        ]]
    ]));

    $data = new \App\Tests\DTO\DefaultChoiceDTO("Third");

    // Serialize with V2 schema
    $serialized = $this->serializer->serialize(
        $data,
        AvroSerDeEncoder::FORMAT_AVRO,
        [
            AvroSerDeEncoder::CONTEXT_ENCODE_SUBJECT => $subject,
            AvroSerDeEncoder::CONTEXT_ENCODE_WRITERS_SCHEMA => $schemaV2,
        ]
    );

    // Deserialize with V1 schema
    $deserialized = $this->serializer->deserialize(
        $serialized,
        \App\Tests\DTO\DefaultChoiceDTO::class,
        AvroSerDeEncoder::FORMAT_AVRO,
        [AvroSerDeEncoder::CONTEXT_DECODE_READERS_SCHEMA => $schemaV1]
    );

    // This assertion fails:
    // Expected: "Unknown" (the default)
    // Actual: "Third" (the writer's value)
    $this->assertEquals("Unknown", $deserialized->choice);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions