Skip to content

bijection-avro sometimes deserializes objects to GenericData.Record instead of the requested type #265

@rabejens

Description

@rabejens

I defined an Avro schema and used SBT Avrohugger to generate the Scala code. Serialization and deserialization so far works on my local machine. I am doing something like this:

val x: Array[Byte] = ... // Get the serialized data
val myThing = SpecificAvroCodecs.toBinary[MyAvroThing](MyAvroThing.SCHEMA$).invert(x)

When I run this locally, it works perfectly. I now created a Spark task that can be submitted to Spark with the help of the SBT Assembly plugin. When I "submit" this task locally (using spark-submit --master local[*]), this serialization works. However, when I submit it to a "real" Spark installation, I get a CCE:

Exception in thread "main" java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to com.example.avro.MyAvroThing

So, the deserializer does not recognize the format and deserializes it to a generic Avro type. I double checked that all necessary Avro libraries and Twitter's Bijection-Avro are correctly embedded in my resulting JAR.

As a next investigation step, I analyzed the GenericData.Record I get by doing:

val mystery = SpecificAvroCodecs.toBinary[MyAvroThing](MyAvroThing.SCHEMA$).invert(x).asInstanceOf[Try[Any]]
mystery.get match {
  case _: MyAvroThing => println("ok!")
  case r: GenericData.Record => println("Got a generic record with schema: " + r.getSchema.getFields.map(_.name()).mkString(", "))
  case _ => println("Got something completely different")
}

When I run this locally, it prints out ok! as it correctly gets the MyAvroThing. When I run this on the Spark cluster, I get:

Got a generic record with schema: foo, bar, quux

this means, my schema IS honored by the deserializer and it is deserialized correctly, only the transformation to the resulting class is not done somehow.

When I query the record's fields by name, I get the correct data I expect in my MyAvroThing.

What is going wrong here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions