Skip to content

Improve (de-)serialization performance for scalar arrays #515

@124C41p

Description

@124C41p

One of my personal use cases for betterproto is to call SciPy functions from other languages, which do not have such nice math libraries. That is, I have a Python grpc service which essentially is receiving large float arrays, doing math with them, and sending large result arrays in return. Unfortunately, serializing and deserializing (numpy) arrays with betterproto does not seem to be as efficient, as it could be.

When serializing a float array to the protobuf format, the serialized protobuf message happens to be exactly in the right byte format to be interpreted as a numpy array, as the following example shows:

@dataclass
class Array(betterproto.Message):
    values: List[float] = betterproto.double_field(1)

proto_array = Array(values=[1.23, 2.34, 3.45, 4.56])
serialized_array = bytes(proto_array)
np_array = np.frombuffer(serialized_array[2:])

print(np_array)

However, when deserializing the protobuf message with betterproto, the array is converted into a Python list right away. If I need a numpy array, I have no choice but to convert it back and forth (same for serialization) which is computationally expensive.

I have two ideas for solving that issue:

Idea 1: Instead of storing protobuf scalar arrays as Python lists, you could store their byte representation inside a slim wrapper which behaves like a list, but which can also be converted into a numpy array without efforts:

class Float64Array:
    __data: bytes

    def __len__(self):
        ...
    
    def __getitem__(self, i):
        ...

    def to_numpy_array(self):
        import numpy as np
        return np.frombuffer(self.__data)

Idea 2: You could introduce an optional protoc compiler flag for letting the caller decide whether scalar arrays should be stored as Python lists or as numpy arrays.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions