Skip to content

Union types break service serialization #323

@chris05atm

Description

@chris05atm

What happened?

When we upgraded our conjure-python dependency we ran into runtime pyspark serialization issues. We previously could serialize a service object but post-conjure-python upgrade this same service was no longer serializable.

We suspect #320 or #221 broke serde behavior for us.

The pyspark error was:

py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/palantir/services/.4229/var/tmp/asset-install/85af169544daf00da129a002813aba21/spark/python/lib/pyspark.zip/pyspark/worker.py", line 413, in main
    func, profiler, deserializer, serializer = read_command(pickleSer, infile)
  File "/opt/palantir/services/.4229/var/tmp/asset-install/85af169544daf00da129a002813aba21/spark/python/lib/pyspark.zip/pyspark/worker.py", line 68, in read_command
    command = serializer._read_with_length(file)
  File "/opt/palantir/services/.4229/var/tmp/asset-install/85af169544daf00da129a002813aba21/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 173, in _read_with_length
    return self.loads(obj)
  File "/opt/palantir/services/.4229/var/tmp/asset-install/85af169544daf00da129a002813aba21/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 697, in loads
    return pickle.loads(obj, encoding=encoding)
AttributeError: type object 'AlertFailureResponse' has no attribute '_service_exception'

This was thrown when passing our service through a map function. This occurred even with zero data passed along. It was only the service code that previously worked.

Other conjure definitions:


      AlertResponse:
        union:
          failureResponse: AlertFailureResponse
          successResponse: AlertSuccessResponse

      AlertFailureResponse:
        fields:
          serviceException: ServiceException
      AlertSuccessResponse:
        fields:
          uuid: uuid

Our __conjure_generator_version__ is 3.12.1.

We mitigated the issue by building our Conjure service in a mapPartitions function which is likely a better practice anyway.

What did you want to happen?

We are not entirely sure on why these new type definitions are not serializable. I believe the fields are renamed in a way that pyspark's serialization cannot find but that is conjecture at this point.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions