Skip to content

Commit d48cf56

Browse files
tyronecaidweiss
andauthored
improve BytesRefHash.sort performance by rearranging ids (#15772)
* improve BytesRefHash.sort performance Adjusting the data order in ids during compaction, which can improve data access continuity and reduce cache-misses. finally enhance sort performance by 20% in million-term tests * Update changes. Added a new entry to CHANGES.txt to document performance improvement in BytesRefHash.sort. * add comment in compact * Fix comment typo in compact method * update comment --------- Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
1 parent 6657172 commit d48cf56

File tree

2 files changed

+7
-10
lines changed

2 files changed

+7
-10
lines changed

lucene/CHANGES.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,8 @@ Optimizations
235235

236236
* GITHUB#15729: Lucene90DocValuesProducer is not prefetching any data for DocValueSkippers anymore (Alexander Reelsen)
237237

238+
* GITHUB#15772: improve BytesRefHash.sort performance by rearranging ids#15772 (tyronecai)
239+
238240
* GITHUB#15742: Optimize int4 dotProduct and squareDistance computations by replacing vector conversions with reinterpret casting + bit manipulation. (Trevor McCulloch, Kaival Parikh)
239241

240242
Bug Fixes

lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -178,18 +178,13 @@ public BytesRef get(int bytesID, BytesRef ref) {
178178
*/
179179
public int[] compact() {
180180
assert bytesStart != null : "bytesStart is null - not initialized";
181-
int upto = 0;
182-
for (int i = 0; i < hashSize; i++) {
183-
if (ids[i] != -1) {
184-
ids[upto] = ids[i] & hashMask;
185-
if (upto < i) {
186-
ids[i] = -1;
187-
}
188-
upto++;
189-
}
181+
182+
// id is the sequence number when bytes added to the pool
183+
for (int i = 0; i < count; i++) {
184+
ids[i] = i;
190185
}
186+
Arrays.fill(ids, count, hashSize, -1);
191187

192-
assert upto == count;
193188
lastCount = count;
194189
return ids;
195190
}

0 commit comments

Comments
 (0)