Use native vectorization in Lucene#15508
Conversation
|
Sharing some JMH benchmark results from Graviton 2 and Graviton 3 with this PR
Graviton 2 Graviton 3 Apple M2 Pro Note : |
|
I do have some minor comments but the large one is: how do we handle native code for end-users? Do we ship multiple precompiled binaries? Do we ship none? The current build modifications works "for you" but to compile for a matrix of all the possibilities is a bit of a nightmare. |
dweiss
left a comment
There was a problem hiding this comment.
I'm not sure about this. The integration with native libs here is... tight. And to get a working version, you pretty much have to recompile from scratch on the target machine.
It would be more elegant to have it somehow wrapped in a service and a separate (optional) module. I'm not sure if it's possible.
Also - as I mentioned - some thought needs to be given to what the public artifacts are going to be - are binaries going to be shipped, for which cpus, how are they going to be compiled and what dev requirements this entails (installing cross-compilation environments is probably not going to fly with many).
build.gradle
Outdated
| plugins { | ||
| id "base" | ||
| id "lucene.root-project.setup" | ||
| id "c" |
There was a problem hiding this comment.
this can be moved to the project which actually uses the plugin (misc?).
lucene/core/build.gradle
Outdated
| test { | ||
| dependsOn ':lucene:misc:buildNative' | ||
| systemProperty( | ||
| "java.library.path", | ||
| project(":lucene:misc").layout.buildDirectory.get().asFile.absolutePath + "/libs/dotProduct/shared" | ||
| ) | ||
| } | ||
|
|
There was a problem hiding this comment.
All of these changes should be pulled into a single gradle java plugin that handles the configuration across all the involved projects. If native libs are not enabled, these changes shouldn't be applied at all. A single plugin will make it easier to see the set of changes applied globally; currently they're scattered around.
There was a problem hiding this comment.
Yeah, makes sense
|
For the record, any Java library that ships with native libs has similar integration problems. You can look at jansi or any other lib that has native components for inspiration (and to get an idea) how hairy it becomes. Here's jansi's docker-based cross-compilation build. |
|
Thanks @dweiss for taking a look. I completely agree with maintaining different native bindings in Lucene can get really challenging as you mentioned and I propose we don't ship don't ship any pre-built binaries (or to say at least in this PR today if someone feels otherwise). I got some cross environment binaries working but I agree we might not be able to support all permutations/combinations of the different environments in the best possible way. While I got some cross-platform binaries working for few selective environments, supporting all environment permutations will be really painful. I added the build configuration to test across different environments on my end. I'm more happy to remove the Gradle build configurations for binary generation. One important point here is that GCC's auto-vectorization of simple C code performs comparably to (and sometimes better than) hand-written NEON/SVE implementations(except some cases/runs for SVE). This allows us to keep things simple on Lucene's end and let users experiment. This is inline with what @rmuir proposed in this comment. So here is what I propose we can do and would look like for user :
Contract: This PR would enable the interface to interact with binaries for dot product computations but providing that binary is in user's court. Users need to provide the binary with the required signature/impl:
Let me know if this makes it clear for what it would look like from Lucene user's perspective. |
|
To expand on what exactly a Lucene user would look like on their end to test/enable this :
// Linux/Unix
gcc -shared -O3 -march=native -funroll-loops -o /home/simd/libdotProduct.so /home/apachelucene/lucene/misc/src/c/dotProduct.c
// macOS
gcc -shared -O3 -march=native -funroll-loops -o /home/simd/libdotProduct.dylib /home/apachelucene/lucene/misc/src/c/dotProduct.c
// Windows
gcc -shared -O3 -march=native -funroll-loops -o /home/simd/dotProduct.dll /home/apachelucene/lucene/misc/src/c/dotProduct.c// dotProduct.c
int32_t dot8s(int8_t vec1[], int8_t vec2[], int32_t limit) {
// dot product impl
}
./gradlew test // PASSES, since by default native dot product is not testes
./gradlew test -Ptest.native.dotProduct=false // PASSES, switched off
./gradlew test \
-Ptest.native.dotProduct=true \
-Ptests.jvmargs="-Djava.library.path=/home/simd" // PASSES
./gradlew test -Ptest.native.dotProduct=true // FAILS, needs library to be linked
java --enable-native-access=ALL-UNNAMED \
--enable-preview \
-Djava.library.path="/home/simd" \
-jar lucene/benchmark-jmh/build/benchmarks/lucene-benchmark-jmh-11.0.0-SNAPSHOT.jar \
regexp "binaryDotProductVector|dot8sNative" |
|
I get the performance improvement is nice but it'll be difficult to maintain and test properly the way it's currently implemented. Not to mention most folks out there use Lucene without the possibility to recompile from sources.... I don't mind adding a native implementation but maybe this should be integrated via more standard mechanism (like service lookup) and then everything would live in that optional "native" module? This would open up the possibility to compile such a module independently. Sorry for my lack of enthusiasm about this but I can tell it'll be a problem to maintain even by looking at it (properties, etc.). |
i'd use zig |
|
I looked again and I still think a nicer way to plug this native impl. into Lucene would be to make another implementation of VectorUtilSupport ("NativeVectorUtilSupport") and then use it instead of the default Panama implementation, if it is available. If you used service lookup for picking the implementation, you could select between them when classes are initialized. Then the native implementation could go into its own module and there'd be no need for hacks like system properties, internals opened for benchmarks, etc. I may be missing something of course, but I think it'd be a much easier way forward. What do you think? |
|
@dweiss I tried to make it more pluggable and added the I removed all the gradle related changes and not generate any binary at all for now. We can add support for more functions as we go but I think this is a good start?. There is no requirement to have a system property, though users could still pass Note : I kept the C code (as is in misc) even though we are not building since it could be used as a reference for users or provides a good starting point.
|
Very Interesting! I'd love to explore this path and would be good in general if we could have some reliable way to provide efficient binaries to users. Maybe we could pursue this as a followup and keep this change just focussed on adding support to use the provided binary (next we could try adding actual binary support)? |
|
It's not what I had in mind, sorry for being vague. I have a strong feeling that the entire thing should be implemented using standard ServiceLoader mechanisms. So the default singleton in VectorizationProvider.Holder.INSTANCE should be instantiated using a service and the logic of this method - should be moved to individual service implementations (verification if a particular implementation can be used in the current environment, along with the instantiation of that implementation). The "singleton lookup" should only load all service providers, filter out what cannot be used for whatever reason and then pick one of the remaining candidates in their preferred order (which can be controlled by a system property, for example lucene.vectorization-provider=*,panama,default would indicate the desired ordering among available implementations). So, by default we'd have the "default" (DefaultVectorizationProvider) fallback and "panama" (PanamaVectorizationProvider) but you could add another service implementation (native). The service would need to implement two methods - one to check if it can be used and the other to provide an instance of VectorizedProvider. It isn't a straightforward patch because a lot of the code is currently package private and intentionally hidden (and won't allow subclassing). But I have a gut feeling it's possible and it would be a lot more elegant in the long run. It's also related to PRs like this one - #15294 which would like to "know" which service implementation is being used. This could be the name (or class) of the service provider, for example. I'm unfortunately away this week and won't be able to contribute directly. I would hold this patch until the above avenue can be explored though (either turns out to indeed work out nicely or won't work, for some odd reason). |
|
This is a quick and dirty PoC that I quickly wrote to better show what I mean - https://github.com/apache/lucene/compare/main...dweiss:lucene:vector-prov-service?expand=1 it doesn't fully work (there are some clashes between service loader and the module system, plus a ton of cleanups to be made) but if you take a look at VectorizationProvider class it'll give you an idea what I think those losely-pluggable implementations should be loaded like. Now... it's just an idea - I'm not really heavily advocating to switch to it but I think it'd be better in the long term if we somehow decoupled those multiple implementations (and this includes potentially removing the need for multi-release jars, which are difficult to work with). |
|
Hi @shubhamvishu . I've tried to untangle this on my branch but I failed, sorry. My motivation was to make those alternative vector support implementations fairly independent but they're all so tightly coupled that I couldn't make any progress. Native delegates to panama, panama delegates to default... I'm not happy with the outcome of my refactoring and I suggest we roll back the changes you made (following my suggestions) to minimize the surface of this patch over current main. |
|
Thank you @dweiss, really appreciate you taking the time to test it out! So should we revert to the earlier approach (pre |
|
Yes, I think so. |
|
@dweiss I made those changes in the latest revision. Thank you! |
| @@ -178,9 +178,9 @@ static VectorizationProvider lookup(boolean testMode) { | |||
| } | |||
| } catch (NoSuchMethodException | IllegalAccessException e) { | |||
| throw new LinkageError( | |||
| "PanamaVectorizationProvider is missing correctly typed constructor", e); | |||
| "NativeVectorizationProvider is missing correctly typed constructor", e); | |||
| } catch (ClassNotFoundException cnfe) { | |||
| throw new LinkageError("PanamaVectorizationProvider is missing in Lucene JAR file", cnfe); | |||
| throw new LinkageError("NativeVectorizationProvider is missing in Lucene JAR file", cnfe); | |||
| } | |||
There was a problem hiding this comment.
This will always attempt to load the native library, won't it? Maybe this native vectorization provider should be an opt-in here (controlled by an explicit property?), followed by the panama provider that we already have?
This addition really should have a minimum surface impact over the current codebase because we don't have any test support for this native wrapper.
I think it'll confuse people if we start to emit "NativeVectorizationProvider" and they don't know what it is or where to get it.
If you make it an opt-in property, you could also remove test.native.dotProduct and use that property to enable native provider in tests and benchmarks.
There was a problem hiding this comment.
Makes sense, I'll make it opt-in here.
| if (int4DotProduct$MH != null) { | ||
| return int4DotProduct(MemorySegment.ofArray(a), MemorySegment.ofArray(b)); | ||
| } | ||
| return delegateVectorUtilSupport.int4DotProduct(a, b); |
There was a problem hiding this comment.
The same here. It adds another branch/ delegation layer... I'd be more comfortable with some code duplication (scorers) but either calling the "native" code or panama code - without all these conditionals.
There was a problem hiding this comment.
I agree we can have some scorers(dedicated to native; something like NativeDotProductScorer etc) but I wonder if the current way of branching is better for a scenario where a user do not need to override all the VectorUtil impls and it automatically fallback to Panama if there isn't a implemented function in the library. Not having this branching would mean user needs to implement all the functions from VectorUtil compulsorily(since there will be no fallback). Do you think we should mandate that or maybe what you are mentioning solves that problem also or you have some ideas?.
I kinda liked the fact that I was able to override just a specific dot product implementation out of many (which is apparently slow on AArch machine) but it might not be true for all?
Let me know what you think.
There was a problem hiding this comment.
If that's the case then sure - leave the branching - but only in those classes that are dedicated to this "native" vector util implementation. The bottom line for me is that if it isn't available (can't be loaded) there should be no change compared to the current code - it should flow directly to panama. It should also make it easier to untagle those implementations, should someone retry my efforts with better outcome. ;)
Let me know if it makes sense.
There was a problem hiding this comment.
I agree, I'll soon push a version that is least intrusive.
|
@dweiss I pushed a version that's much less intrusive and tried to simplify/refactor some bits as you suggested. Now we do not even initialize the Let me know your thoughts! |
|
Thank you @dweiss ! I'll wait for a few days to see if anyone has more feedback and then merge. |
|
Thank you. |
lucene/core/src/java25/org/apache/lucene/internal/vectorization/NativeVectorUtilSupport.java
Show resolved
Hide resolved
uschindler
left a comment
There was a problem hiding this comment.
Please update this to make it consistent. No class except the official entry points in the internal vectorization package should be public
lucene/core/src/java25/org/apache/lucene/internal/vectorization/NativeVectorUtilSupport.java
Show resolved
Hide resolved
...ne/core/src/java25/org/apache/lucene/internal/vectorization/NativeVectorizationProvider.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java
Show resolved
Hide resolved
|
Thanks @uschindler ! I opened #15866 to make the classes pkg-private and also fix the method @ExE-Boss pointed out above(we can eventually get rid of it with #15840 as a followup). Looking for your thoughts and review. |
Description
This is continuation of the work @goankur started in #13572. Quoting the comment below from the initial PR.
To make use of faster SIMD instructions,
PanamaVectorUtilSupport#{dotProduct|uint8DotProduct}will switch to native dot product implementation whenorg.apache.lucene.util.Constants#NATIVE_DOT_PRODUCT_ENABLEDistrue(i.e. system property-Dlucene.useNativeDotProduct=trueis passed).GCC version : >= 10