-
Notifications
You must be signed in to change notification settings - Fork 232
Description
One of my personal use cases for betterproto
is to call SciPy
functions from other languages, which do not have such nice math libraries. That is, I have a Python grpc service which essentially is receiving large float arrays, doing math with them, and sending large result arrays in return. Unfortunately, serializing and deserializing (numpy) arrays with betterproto
does not seem to be as efficient, as it could be.
When serializing a float array to the protobuf format, the serialized protobuf message happens to be exactly in the right byte format to be interpreted as a numpy array, as the following example shows:
@dataclass
class Array(betterproto.Message):
values: List[float] = betterproto.double_field(1)
proto_array = Array(values=[1.23, 2.34, 3.45, 4.56])
serialized_array = bytes(proto_array)
np_array = np.frombuffer(serialized_array[2:])
print(np_array)
However, when deserializing the protobuf message with betterproto
, the array is converted into a Python list right away. If I need a numpy array, I have no choice but to convert it back and forth (same for serialization) which is computationally expensive.
I have two ideas for solving that issue:
Idea 1: Instead of storing protobuf scalar arrays as Python lists, you could store their byte representation inside a slim wrapper which behaves like a list, but which can also be converted into a numpy array without efforts:
class Float64Array:
__data: bytes
def __len__(self):
...
def __getitem__(self, i):
...
def to_numpy_array(self):
import numpy as np
return np.frombuffer(self.__data)
Idea 2: You could introduce an optional protoc compiler flag for letting the caller decide whether scalar arrays should be stored as Python lists or as numpy arrays.