Skip to content

Conversation

punAhuja
Copy link
Contributor

@punAhuja punAhuja commented Oct 9, 2025

https://issues.apache.org/jira/browse/SOLR-17948

Description

Currently, DenseVectorParser rejects primitive float[]/double[] in JavaBin requests, even though JavaBin encoders (e.g., SolrJ’s codec) can emit primitive arrays. Other Solr loaders (JSON, CSV, XML) typically represent vector values as lists (e.g. List or List) when parsed, which means the ability to accept primitive float[]/double[] would particularly benefit JavaBin use cases—allowing more compact serialization paths for clients that can produce primitive arrays. With this change, DenseVectorParser accepts primitive arrays and treats them equivalently to List-based inputs, enabling more compact JavaBin updates across all clients.

These changes are based on investigation and testing by @noblepaul

Solution

I have extended DenseVectorParser to handle float[] and double[] inputs in addition to the existing List-based formats.
Reference: SearchScale/solr-javabin-generator#1 (example producer of primitive-array JavaBin).

This lets all Solr users send smaller JavaBin updates by avoiding boxed lists (usually ~20% reduction).

Keeps behavior consistent with other loaders: vectors are parsed, indexed, and stored correctly regardless of whether they arrive as lists or primitive arrays.

Tests

I added a small helper method that serializes a SolrInputDocument into JavaBin format and feeds it through JavabinLoader, so the test (DenseVectorFieldTest.testIndexingViaJavaBin) can simulate a real client sending JavaBin data to Solr.

Manual test:
End to end test for writing javabin with both List and primitive float.
Then we index both these payloads, and search on both of them to validate the index.
We do this using solrj client.

Script used: https://gist.github.com/punAhuja/c77cc60e396ccf7aa5a55a92ba23ffc3

JavaBin sizes:
List : 63.1 MB (66188931 bytes)
float[] : 51.1 MB (53588931 bytes)
Savings : 12.0 MB (19.04% smaller)

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@chatman
Copy link
Contributor

chatman commented Oct 9, 2025

@punAhuja There are existing tests where a SolrInputField uses a List for the vector field. With this change, the expectation would be that float[] should also work. Can you please modify the tests to make sure that works too?

}

@Test
public void indexing_floatPrimitiveArray_viaJavaBin_shouldIndexAndReturnStored()
Copy link
Contributor

@chatman chatman Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests in Solr usually don't have the "_shouldIndexAndReturnStored" part in the method name. Also, camel casing is preferable for test names. I think for sake of consistency, we should name this test testIndexingFloatPrimitiveArrayViaJavaBin, or simply "testIndexingViaJavaBin" and make sure both float[] and List works (maybe through a randomization). Also, can we fold both the double[] and float[] tests into a single test?

@chatman chatman merged commit 06a3b5e into apache:main Oct 16, 2025
2 of 3 checks passed
@chatman chatman deleted the puneet/SOLR-17948-javabin branch October 16, 2025 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants