Skip to content

Use consistent byte-order when writing to disk #576

@abernardi597

Description

@abernardi597

In some places JVector uses platform-dependent byte-order to serialize data.
For example here it is writing float vectors using the native byte order, rather than BIG_ENDIAN as would be described by DataInput.

The main issue is that an index cannot be used on a little-endian machine (most reading code assumes BIG_ENDIAN, which is also in line with DataInput/DataOutput).
A secondary problem is that implementations of RandomAccessReader cannot detect or assume the correct byte ordering, even when the IndexWriter used to write the index does enforce one byte order.

This is not easy for consumers of the library to address at the moment because of how VectorTypeSupport is populated/referenced globally.

This issue could be fixed by forcing BIG_ENDIAN everywhere (in accordance with DataInput, at the least), but it seems like a better solution would be to avoid relying on DataInput altogether and instead plumb IndexWriter through so that consumers can choose to override the endianness for their application. Adding a writeFloats method would avoid the performance penalty of writing single floats at a time, but also allow consumers to extend with their own endianness if necessary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions