-
Notifications
You must be signed in to change notification settings - Fork 22
Description
I was looking for a good, compact and efficient way to store UUIDs in Pandas DataFrames. The easy way is as columns of uuid.UUID objects (56 bytes each). Since UUIDs can be represented as 128 bits (16 bytes), it would be nice for a column to be a contiguous array.
As the cyberpandas IPv6 extension array also stores 128 bit wide IP addresses, I was thinking of leveraging the work done here for IPv6 for UUIDs.
Then a future potential step would be to make an extension type that supports any numpy "Sn" fixed width field, with efficient implementations of the low level Pandas array operations, plus a mechanism to easily register various high-level representation and accessor methods (e.g. IPv6, UUID, and so forth).
Tom, maybe can you say how you see this project evolving? Is it essentially "done" as it is today, with IPv4 and IPv6. Or as a place where similar extension arrays can be added, as semi-standard additions to the Pandas ecosystem?
Thanks
Stephen