Skip to content

Commit 9545daf

Browse files
authored
Modest memory savings in date_histogram>terms (#68592)
This saves 16 bytes of memory per bucket for some aggregations. Specifically, it kicks in when there is a parent bucket and we have a good estimate on its upper bound cardinality, and we have good estimate on the per-bucket cardinality of this aggregation, and both those upper bounds will fit into a single `long`. That sounds unlikely, but there is a fairly common case where we have it: a `date_histogram` followed by a `terms` aggregation powered by global ordinals. This is common enough that we already had at least two rally operations for it: * `date-histo-string-terms-via-global-ords` * `filtered-date-histo-string-terms-via-global-ords` Running those rally tracks shows that the space savings yields a small but statistically significant perform bump. The 90th percentile service time drops by about 4% in the unfiltered case and 1% for the filtered case. That's not great but it good to know saving 16 bytes doesn't slow us down. ``` | 50th percentile latency | date-histo | 3185.77 | 3028.65 | -157.118 | ms | | 90th percentile latency | date-histo | 3237.07 | 3101.32 | -135.752 | ms | | 100th percentile latency | date-histo | 3270.53 | 3178.7 | -91.8319 | ms | | 50th percentile service time | date-histo | 3181.55 | 3024.32 | -157.238 | ms | | 90th percentile service time | date-histo | 3232.91 | 3097.67 | -135.238 | ms | | 100th percentile service time | date-histo | 3266.63 | 3175.08 | -91.5494 | ms | | 50th percentile latency | filtered-date-histo | 1349.22 | 1331.94 | -17.2717 | ms | | 90th percentile latency | filtered-date-histo | 1402.71 | 1383.7 | -19.0131 | ms | | 100th percentile latency | filtered-date-histo | 1412.41 | 1397.7 | -14.7139 | ms | | 50th percentile service time | filtered-date-histo | 1345.18 | 1326.2 | -18.9806 | ms | | 90th percentile service time | filtered-date-histo | 1397.24 | 1378.14 | -19.1031 | ms | | 100th percentile service time | filtered-date-histo | 1406.69 | 1391.63 | -15.0529 | ms | ``` The microbenchmarks for `LongKeyedBucketOrds`, the interface we're targeting, show a performance boost on the method in the path of about 13%. This is obvious not the entire hot path, given that th 13% savings translated to a 4% performance savings over the whole agg. But its something. ``` Benchmark Mode Cnt Score Error Units multiBucketMany avgt 5 10.038 ± 0.009 ns/op multiBucketManySmall avgt 5 8.738 ± 0.029 ns/op singleBucketIntoMulti avgt 5 7.701 ± 0.073 ns/op singleBucketIntoSingleImmutableBimorphicInvocation avgt 5 6.160 ± 0.029 ns/op singleBucketIntoSingleImmutableMonmorphicInvocation avgt 5 6.571 ± 0.043 ns/op singleBucketIntoSingleMutableBimorphicInvocation avgt 5 7.714 ± 0.010 ns/op singleBucketIntoSingleMutableMonmorphicInvocation avgt 5 7.459 ± 0.017 ns/op ``` While I was touching the JMH benchmarks for `LongKeyedBucketOrds` I took the opportunity to try and make the runs that collect from a single bucket more comparable to the ones that collect from many buckets. It only seemed fair.
1 parent 5f676ec commit 9545daf

File tree

6 files changed

+391
-80
lines changed

6 files changed

+391
-80
lines changed

benchmarks/src/main/java/org/elasticsearch/benchmark/search/aggregations/bucket/terms/LongKeyedBucketOrdsBenchmark.java

Lines changed: 72 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -41,14 +41,18 @@ public class LongKeyedBucketOrdsBenchmark {
4141
/**
4242
* The number of distinct values to add to the buckets.
4343
*/
44-
private static final long DISTINCT_VALUES = 10;
44+
private static final long DISTINCT_VALUES = 210;
4545
/**
4646
* The number of buckets to create in the {@link #multiBucket} case.
4747
* <p>
48-
* If this is not relatively prime to {@link #DISTINCT_VALUES} then the
49-
* values won't be scattered evenly across the buckets.
48+
* If this is not relatively prime to {@link #DISTINCT_VALUES_IN_BUCKETS}
49+
* then the values won't be scattered evenly across the buckets.
5050
*/
5151
private static final long DISTINCT_BUCKETS = 21;
52+
/**
53+
* Number of distinct values to add to values within buckets.
54+
*/
55+
private static final long DISTINCT_VALUES_IN_BUCKETS = 10;
5256

5357
private final PageCacheRecycler recycler = new PageCacheRecycler(Settings.EMPTY);
5458
private final BigArrays bigArrays = new BigArrays(recycler, null, "REQUEST");
@@ -63,6 +67,7 @@ public class LongKeyedBucketOrdsBenchmark {
6367
public void forceLoadClasses(Blackhole bh) {
6468
bh.consume(LongKeyedBucketOrds.FromSingle.class);
6569
bh.consume(LongKeyedBucketOrds.FromMany.class);
70+
bh.consume(LongKeyedBucketOrds.FromManySmall.class);
6671
}
6772

6873
/**
@@ -75,6 +80,9 @@ public void singleBucketIntoSingleImmutableMonmorphicInvocation(Blackhole bh) {
7580
for (long i = 0; i < LIMIT; i++) {
7681
ords.add(0, i % DISTINCT_VALUES);
7782
}
83+
if (ords.size() != DISTINCT_VALUES) {
84+
throw new IllegalArgumentException("Expected [" + DISTINCT_VALUES + "] but found [" + ords.size() + "]");
85+
}
7886
bh.consume(ords);
7987
}
8088
}
@@ -83,11 +91,14 @@ public void singleBucketIntoSingleImmutableMonmorphicInvocation(Blackhole bh) {
8391
* Emulates the way that most aggregations use {@link LongKeyedBucketOrds}.
8492
*/
8593
@Benchmark
86-
public void singleBucketIntoSingleImmutableBimorphicInvocation(Blackhole bh) {
94+
public void singleBucketIntoSingleImmutableMegamorphicInvocation(Blackhole bh) {
8795
try (LongKeyedBucketOrds ords = LongKeyedBucketOrds.build(bigArrays, CardinalityUpperBound.ONE)) {
8896
for (long i = 0; i < LIMIT; i++) {
8997
ords.add(0, i % DISTINCT_VALUES);
9098
}
99+
if (ords.size() != DISTINCT_VALUES) {
100+
throw new IllegalArgumentException("Expected [" + DISTINCT_VALUES + "] but found [" + ords.size() + "]");
101+
}
91102
bh.consume(ords);
92103
}
93104
}
@@ -106,6 +117,10 @@ public void singleBucketIntoSingleMutableMonmorphicInvocation(Blackhole bh) {
106117
}
107118
ords.add(0, i % DISTINCT_VALUES);
108119
}
120+
if (ords.size() != DISTINCT_VALUES) {
121+
ords.close();
122+
throw new IllegalArgumentException("Expected [" + DISTINCT_VALUES + "] but found [" + ords.size() + "]");
123+
}
109124
bh.consume(ords);
110125
ords.close();
111126
}
@@ -116,7 +131,7 @@ public void singleBucketIntoSingleMutableMonmorphicInvocation(Blackhole bh) {
116131
* {@link #singleBucketIntoSingleMutableMonmorphicInvocation monomorphic invocation}.
117132
*/
118133
@Benchmark
119-
public void singleBucketIntoSingleMutableBimorphicInvocation(Blackhole bh) {
134+
public void singleBucketIntoSingleMutableMegamorphicInvocation(Blackhole bh) {
120135
LongKeyedBucketOrds ords = LongKeyedBucketOrds.build(bigArrays, CardinalityUpperBound.ONE);
121136
for (long i = 0; i < LIMIT; i++) {
122137
if (i % 100_000 == 0) {
@@ -125,7 +140,9 @@ public void singleBucketIntoSingleMutableBimorphicInvocation(Blackhole bh) {
125140
ords = LongKeyedBucketOrds.build(bigArrays, CardinalityUpperBound.ONE);
126141
}
127142
ords.add(0, i % DISTINCT_VALUES);
128-
143+
}
144+
if (ords.size() != DISTINCT_VALUES) {
145+
throw new IllegalArgumentException("Expected [" + DISTINCT_VALUES + "] but found [" + ords.size() + "]");
129146
}
130147
bh.consume(ords);
131148
ords.close();
@@ -134,28 +151,68 @@ public void singleBucketIntoSingleMutableBimorphicInvocation(Blackhole bh) {
134151
/**
135152
* Emulates an aggregation that collects from a single bucket "by accident".
136153
* This can happen if an aggregation is under, say, a {@code terms}
137-
* aggregation and there is only a single value for that term in the index.
154+
* aggregation and there is only a single value for that term in the index
155+
* but we can't tell that up front.
138156
*/
139157
@Benchmark
140158
public void singleBucketIntoMulti(Blackhole bh) {
141159
try (LongKeyedBucketOrds ords = LongKeyedBucketOrds.build(bigArrays, CardinalityUpperBound.MANY)) {
142-
for (long i = 0; i < LIMIT; i++) {
143-
ords.add(0, i % DISTINCT_VALUES);
144-
}
160+
singleBucketIntoMultiSmall(ords);
145161
bh.consume(ords);
146162
}
147163
}
148164

165+
/**
166+
* Emulates an aggregation that collects from a single bucket "by accident"
167+
* and gets a "small" bucket ords. This can happen to a {@code terms} inside
168+
* of another {@code terms} when the "inner" terms only even has a single
169+
* bucket.
170+
*/
171+
@Benchmark
172+
public void singleBucketIntoMultiSmall(Blackhole bh) {
173+
try (LongKeyedBucketOrds ords = new LongKeyedBucketOrds.FromManySmall(bigArrays, 60)) {
174+
singleBucketIntoMultiSmall(ords);
175+
bh.consume(ords);
176+
}
177+
}
178+
179+
private void singleBucketIntoMultiSmall(LongKeyedBucketOrds ords) {
180+
for (long i = 0; i < LIMIT; i++) {
181+
ords.add(0, i % DISTINCT_VALUES);
182+
}
183+
if (ords.size() != DISTINCT_VALUES) {
184+
throw new IllegalArgumentException("Expected [" + DISTINCT_VALUES + "] but found [" + ords.size() + "]");
185+
}
186+
}
187+
188+
/**
189+
* Emulates an aggregation that collects from many buckets with a known
190+
* bounds on the values.
191+
*/
192+
@Benchmark
193+
public void multiBucketManySmall(Blackhole bh) {
194+
try (LongKeyedBucketOrds ords = new LongKeyedBucketOrds.FromManySmall(bigArrays, 5)) {
195+
multiBucket(bh, ords);
196+
}
197+
}
198+
149199
/**
150200
* Emulates an aggregation that collects from many buckets.
151201
*/
152202
@Benchmark
153-
public void multiBucket(Blackhole bh) {
203+
public void multiBucketMany(Blackhole bh) {
154204
try (LongKeyedBucketOrds ords = LongKeyedBucketOrds.build(bigArrays, CardinalityUpperBound.MANY)) {
155-
for (long i = 0; i < LIMIT; i++) {
156-
ords.add(i % DISTINCT_BUCKETS, i % DISTINCT_VALUES);
157-
}
158-
bh.consume(ords);
205+
multiBucket(bh, ords);
159206
}
160207
}
208+
209+
private void multiBucket(Blackhole bh, LongKeyedBucketOrds ords) {
210+
for (long i = 0; i < LIMIT; i++) {
211+
ords.add(i % DISTINCT_BUCKETS, i % DISTINCT_VALUES_IN_BUCKETS);
212+
}
213+
if (ords.size() != DISTINCT_VALUES) {
214+
throw new IllegalArgumentException("Expected [" + DISTINCT_VALUES + "] but found [" + ords.size() + "]");
215+
}
216+
bh.consume(ords);
217+
}
161218
}

server/src/internalClusterTest/java/org/elasticsearch/search/profile/aggregation/AggregationProfilerIT.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ public void testMultiLevelProfile() {
219219
}
220220

221221
private void assertRemapTermsDebugInfo(ProfileResult termsAggResult) {
222-
assertThat(termsAggResult.getDebugInfo(), hasEntry(COLLECTION_STRAT, "remap"));
222+
assertThat(termsAggResult.getDebugInfo(), hasEntry(COLLECTION_STRAT, "remap using many bucket ords"));
223223
assertThat(termsAggResult.getDebugInfo(), hasEntry(RESULT_STRAT, "terms"));
224224
assertThat(termsAggResult.getDebugInfo(), hasEntry(HAS_FILTER, false));
225225
// TODO we only index single valued docs but the ordinals ends up with multi valued sometimes

server/src/main/java/org/elasticsearch/search/aggregations/bucket/terms/GlobalOrdinalsStringTermsAggregator.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -464,12 +464,12 @@ private class RemapGlobalOrds extends CollectionStrategy {
464464
private final LongKeyedBucketOrds bucketOrds;
465465

466466
private RemapGlobalOrds(CardinalityUpperBound cardinality) {
467-
bucketOrds = LongKeyedBucketOrds.build(bigArrays(), cardinality);
467+
bucketOrds = LongKeyedBucketOrds.buildForValueRange(bigArrays(), cardinality, 0, valueCount - 1);
468468
}
469469

470470
@Override
471471
String describe() {
472-
return "remap";
472+
return "remap using " + bucketOrds.decribe();
473473
}
474474

475475
@Override

0 commit comments

Comments
 (0)