-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Panama vector accelerated optimized scalar quantization #127118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panama vector accelerated optimized scalar quantization #127118
Conversation
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Hi @benwtrent, I've created a changelog YAML for you. |
| if (vector.length > 2 * FLOAT_SPECIES.length()) { | ||
| FloatVector vecMeanVec = FloatVector.zero(FLOAT_SPECIES); | ||
| FloatVector m2Vec = FloatVector.zero(FLOAT_SPECIES); | ||
| FloatVector norm2Vec = FloatVector.zero(FLOAT_SPECIES); | ||
| FloatVector minVec = FloatVector.broadcast(FLOAT_SPECIES, Float.MAX_VALUE); | ||
| FloatVector maxVec = FloatVector.broadcast(FLOAT_SPECIES, -Float.MAX_VALUE); | ||
| int count = 0; | ||
| for (; i < FLOAT_SPECIES.loopBound(vector.length); i += FLOAT_SPECIES.length()) { | ||
| ++count; | ||
| FloatVector v = FloatVector.fromArray(FLOAT_SPECIES, vector, i); | ||
| FloatVector c = FloatVector.fromArray(FLOAT_SPECIES, centroid, i); | ||
| FloatVector centeredVec = v.sub(c); | ||
| FloatVector deltaVec = centeredVec.sub(vecMeanVec); | ||
| norm2Vec = fma(centeredVec, centeredVec, norm2Vec); | ||
| vecMeanVec = vecMeanVec.add(deltaVec.div(count)); | ||
| FloatVector delta2Vec = centeredVec.sub(vecMeanVec); | ||
| m2Vec = fma(deltaVec, delta2Vec, m2Vec); | ||
| minVec = minVec.min(centeredVec); | ||
| maxVec = maxVec.max(centeredVec); | ||
| centeredVec.intoArray(centered, i); | ||
| } | ||
| min = minVec.reduceLanes(MIN); | ||
| max = maxVec.reduceLanes(MAX); | ||
| norm2 = norm2Vec.reduceLanes(ADD); | ||
| vecMean = vecMeanVec.reduceLanes(ADD) / FLOAT_SPECIES.length(); | ||
| FloatVector d2Mean = vecMeanVec.sub(vecMean); | ||
| m2Vec = fma(d2Mean, d2Mean, m2Vec); | ||
| vectCount = count * FLOAT_SPECIES.length(); | ||
| vecVar = m2Vec.reduceLanes(ADD); | ||
| } | ||
|
|
||
| float tailMean = 0; | ||
| float tailM2 = 0; | ||
| int tailCount = 0; | ||
| // handle the tail | ||
| for (; i < vector.length; i++) { | ||
| centered[i] = vector[i] - centroid[i]; | ||
| float delta = centered[i] - tailMean; | ||
| ++tailCount; | ||
| tailMean += delta / tailCount; | ||
| float delta2 = centered[i] - tailMean; | ||
| tailM2 = fma(delta, delta2, tailM2); | ||
| min = Math.min(min, centered[i]); | ||
| max = Math.max(max, centered[i]); | ||
| norm2 = fma(centered[i], centered[i], norm2); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tveasey could you take a look here? I think I did the variance calculation here correctly. But I might have missed something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formulas look correct to me of course checking with vector lengths 10 - 100 vs slow calculation is within a few epsilon will prove there are no errors. Specifically, I would add testing of vectorised stats calculation against the super simple vestions, i.e. compute mean, then compute mean of square residuals, etc, so there is no chance of errors. If you're v.close to that a bunch of random length vectors you're good. (This may be what the Lucene reference does, but if it using online calculation my inclination would be to simplify further so there is no chance of errors.)
…nwtrent/elasticsearch into feature/panama-vector-accelerated-osq
ChrisHegarty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
I will run the benchmark on my Linux machine, but that can be done post merge as a follow up.
john-wagster
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
centerAndCalculateOSQStatsEuclidean and centerAndCalculateOSQStatsDp were a bit difficult to follow in places, but reading through each they make sense and I didn't see anything obviously wrong.
tveasey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Statistics calculations look correct to me. I think you can make the tail handling a bit cleaner and a bit faster, but functionally looks good to me.
| float min = Float.MAX_VALUE; | ||
| float max = -Float.MAX_VALUE; | ||
| int i = 0; | ||
| int vectCount = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit for consistency
| int vectCount = 0; | |
| int vecCount = 0; |
| FloatVector d2Mean = vecMeanVec.sub(vecMean); | ||
| m2Vec = fma(d2Mean, d2Mean, m2Vec); | ||
| vectCount = count * FLOAT_SPECIES.length(); | ||
| vecVar = m2Vec.reduceLanes(ADD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My inclination is to add tail handling on reduced vector stats directly , it simplifies matters...
// Note i will be equal to vector.length if it is a multiple of FLOAT_SPECIES.length().
for (; i < vector.length; i++) {
centered[i] = vector[i] - centroid[i];
float delta = centered[i] - tailMean;
++vecCount;
vecMean += delta / vecCount;
float delta2 = centered[i] - vecMean;
vecVar = fma(delta, delta2, vecVar);
min = Math.min(min, centered[i]);
max = Math.max(max, centered[i]);
norm2 = fma(centered[i], centered[i], norm2);
}and job done so no need for extra steps to combine.
| if (vector.length > 2 * FLOAT_SPECIES.length()) { | ||
| FloatVector vecMeanVec = FloatVector.zero(FLOAT_SPECIES); | ||
| FloatVector m2Vec = FloatVector.zero(FLOAT_SPECIES); | ||
| FloatVector norm2Vec = FloatVector.zero(FLOAT_SPECIES); | ||
| FloatVector minVec = FloatVector.broadcast(FLOAT_SPECIES, Float.MAX_VALUE); | ||
| FloatVector maxVec = FloatVector.broadcast(FLOAT_SPECIES, -Float.MAX_VALUE); | ||
| int count = 0; | ||
| for (; i < FLOAT_SPECIES.loopBound(vector.length); i += FLOAT_SPECIES.length()) { | ||
| ++count; | ||
| FloatVector v = FloatVector.fromArray(FLOAT_SPECIES, vector, i); | ||
| FloatVector c = FloatVector.fromArray(FLOAT_SPECIES, centroid, i); | ||
| FloatVector centeredVec = v.sub(c); | ||
| FloatVector deltaVec = centeredVec.sub(vecMeanVec); | ||
| norm2Vec = fma(centeredVec, centeredVec, norm2Vec); | ||
| vecMeanVec = vecMeanVec.add(deltaVec.div(count)); | ||
| FloatVector delta2Vec = centeredVec.sub(vecMeanVec); | ||
| m2Vec = fma(deltaVec, delta2Vec, m2Vec); | ||
| minVec = minVec.min(centeredVec); | ||
| maxVec = maxVec.max(centeredVec); | ||
| centeredVec.intoArray(centered, i); | ||
| } | ||
| min = minVec.reduceLanes(MIN); | ||
| max = maxVec.reduceLanes(MAX); | ||
| norm2 = norm2Vec.reduceLanes(ADD); | ||
| vecMean = vecMeanVec.reduceLanes(ADD) / FLOAT_SPECIES.length(); | ||
| FloatVector d2Mean = vecMeanVec.sub(vecMean); | ||
| m2Vec = fma(d2Mean, d2Mean, m2Vec); | ||
| vectCount = count * FLOAT_SPECIES.length(); | ||
| vecVar = m2Vec.reduceLanes(ADD); | ||
| } | ||
|
|
||
| float tailMean = 0; | ||
| float tailM2 = 0; | ||
| int tailCount = 0; | ||
| // handle the tail | ||
| for (; i < vector.length; i++) { | ||
| centered[i] = vector[i] - centroid[i]; | ||
| float delta = centered[i] - tailMean; | ||
| ++tailCount; | ||
| tailMean += delta / tailCount; | ||
| float delta2 = centered[i] - tailMean; | ||
| tailM2 = fma(delta, delta2, tailM2); | ||
| min = Math.min(min, centered[i]); | ||
| max = Math.max(max, centered[i]); | ||
| norm2 = fma(centered[i], centered[i], norm2); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formulas look correct to me of course checking with vector lengths 10 - 100 vs slow calculation is within a few epsilon will prove there are no errors. Specifically, I would add testing of vectorised stats calculation against the super simple vestions, i.e. compute mean, then compute mean of square residuals, etc, so there is no chance of errors. If you're v.close to that a bunch of random length vectors you're good. (This may be what the Lucene reference does, but if it using online calculation my inclination would be to simplify further so there is no chance of errors.)
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
… (#127269) * Panama vector accelerated optimized scalar quantization (#127118) * Adds accelerates optimized scalar quantization with vectorized functions * Adding benchmark * Update docs/changelog/127118.yaml * adjusting benchmark and delta (cherry picked from commit 059f91c) * fixing compilation * reverting unnecessary change
When scalar quantizing for bbq, optimizing intervals can take time, especially at higher bit sizes.
While for bbq, the impact will be marginal, its still a frustrating bottleneck, especially at query time where the bit size is larger (e.g. 4 bits).
Here are some results from the new JMH benchmark, this is on my laptop. Panama vector is 3-4x faster. It can likely be made even faster, my panama vector skills aren't the absolute best.