Skip to content

PythonObjectSerializer raise 'utf-8' codec can't decode byte #639

@MASARIwot

Description

@MASARIwot

Good time of the day team

When I tried to cache API responses, it failed as sometimes I have UnicodeDecodeError.

Looking deeper, I found that this is happening because of this code out.write_string(cPickle.dumps(obj, 0).decode("utf-8")) in PythonObjectSerializer
Full Code:

class PythonObjectSerializer(BaseSerializer):
    def read(self, inp):
        str = inp.read_string().encode()
        return cPickle.loads(str)

    def write(self, out, obj):
        out.write_string(cPickle.dumps(obj, 0).decode("utf-8"))

    def get_type_id(self):
        return PYTHON_TYPE_PICKLE

Issue example:

>>> import pickle

>>> pickle.dumps("\u00e4").decode("utf-8")

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

As a workaround, I created such a custom serializer:

class HazelcastJsonSerializer(StreamSerializer):

  def read(self, inp):
       return json.loads(inp.read_string())

  def write(self, out, obj):
     out.write_string(json.dumps(obj))

   def get_type_id(self):
         …

Is there any better solution?

python version: 3.6

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions