-
Notifications
You must be signed in to change notification settings - Fork 144
Description
In some places JVector uses platform-dependent byte-order to serialize data.
For example here it is writing float vectors using the native byte order, rather than BIG_ENDIAN as would be described by DataInput.
The main issue is that an index cannot be used on a little-endian machine (most reading code assumes BIG_ENDIAN, which is also in line with DataInput/DataOutput).
A secondary problem is that implementations of RandomAccessReader cannot detect or assume the correct byte ordering, even when the IndexWriter used to write the index does enforce one byte order.
This is not easy for consumers of the library to address at the moment because of how VectorTypeSupport is populated/referenced globally.
This issue could be fixed by forcing BIG_ENDIAN everywhere (in accordance with DataInput, at the least), but it seems like a better solution would be to avoid relying on DataInput altogether and instead plumb IndexWriter through so that consumers can choose to override the endianness for their application. Adding a writeFloats method would avoid the performance penalty of writing single floats at a time, but also allow consumers to extend with their own endianness if necessary.