SOLR-17948: Support indexing primitive float[] values for DenseVectorField #3747
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
https://issues.apache.org/jira/browse/SOLR-17948
Description
Currently, DenseVectorParser rejects primitive float[]/double[] in JavaBin requests, even though JavaBin encoders (e.g., SolrJ’s codec) can emit primitive arrays. Other Solr loaders (JSON, CSV, XML) typically represent vector values as lists (e.g. List or List) when parsed, which means the ability to accept primitive float[]/double[] would particularly benefit JavaBin use cases—allowing more compact serialization paths for clients that can produce primitive arrays. With this change, DenseVectorParser accepts primitive arrays and treats them equivalently to List-based inputs, enabling more compact JavaBin updates across all clients.
These changes are based on investigation and testing by @noblepaul
Solution
I have extended DenseVectorParser to handle float[] and double[] inputs in addition to the existing List-based formats.
Reference: SearchScale/solr-javabin-generator#1 (example producer of primitive-array JavaBin).
This lets all Solr users send smaller JavaBin updates by avoiding boxed lists (usually ~20% reduction).
Keeps behavior consistent with other loaders: vectors are parsed, indexed, and stored correctly regardless of whether they arrive as lists or primitive arrays.
Tests
I added a small helper method that serializes a SolrInputDocument into JavaBin format and feeds it through JavabinLoader, so the test (DenseVectorFieldTest.testIndexingViaJavaBin) can simulate a real client sending JavaBin data to Solr.
Manual test:
End to end test for writing javabin with both List and primitive float.
Then we index both these payloads, and search on both of them to validate the index.
We do this using solrj client.
Script used: https://gist.github.com/punAhuja/c77cc60e396ccf7aa5a55a92ba23ffc3
JavaBin sizes:
List : 63.1 MB (66188931 bytes)
float[] : 51.1 MB (53588931 bytes)
Savings : 12.0 MB (19.04% smaller)
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.