Use consistent byte-order when writing to disk

In some places JVector uses platform-dependent byte-order to serialize data.
For example [here](https://github.com/datastax/jvector/blob/02fea879d63d4c84af83dca96c4dd0bd832007b4/jvector-base/src/main/java/io/github/jbellis/jvector/vector/ArrayVectorProvider.java#L62) it is writing float vectors using the native byte order, rather than `BIG_ENDIAN` as would be described by `DataInput`.

The main issue is that an index cannot be used on a little-endian machine (most reading code assumes `BIG_ENDIAN`, which is also in line with `DataInput`/`DataOutput`).
A secondary problem is that implementations of `RandomAccessReader` cannot detect or assume the correct byte ordering, even when the `IndexWriter` used to write the index does enforce one byte order.

This is not easy for consumers of the library to address at the moment because of how `VectorTypeSupport` is populated/referenced globally.

This issue could be fixed by forcing `BIG_ENDIAN` everywhere (in accordance with `DataInput`, at the least), but it seems like a better solution would be to avoid relying on `DataInput` altogether and instead plumb `IndexWriter` through so that consumers can choose to override the endianness for their application. Adding a `writeFloats` method would avoid the performance penalty of writing single floats at a time, but also allow consumers to extend with their own endianness if necessary.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use consistent byte-order when writing to disk #576

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use consistent byte-order when writing to disk #576

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions