-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Fix CPU stall in geotile_grid aggregation (#20413) #20461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -31,6 +31,8 @@ | |
|
|
||
| package org.opensearch.geo.search.aggregations.bucket.geogrid; | ||
|
|
||
| import org.apache.logging.log4j.LogManager; | ||
| import org.apache.logging.log4j.Logger; | ||
| import org.apache.lucene.index.LeafReaderContext; | ||
| import org.apache.lucene.index.SortedNumericDocValues; | ||
| import org.apache.lucene.search.ScoreMode; | ||
|
|
@@ -57,7 +59,9 @@ | |
| * | ||
| * @opensearch.api | ||
| */ | ||
|
|
||
| public abstract class GeoGridAggregator<T extends BaseGeoGrid> extends BucketsAggregator { | ||
| private static final Logger logger = LogManager.getLogger(GeoGridAggregator.class); | ||
|
|
||
| protected final int requiredSize; | ||
| protected final int shardSize; | ||
|
|
@@ -94,13 +98,21 @@ public ScoreMode scoreMode() { | |
| public LeafBucketCollector getLeafCollector(LeafReaderContext ctx, final LeafBucketCollector sub) throws IOException { | ||
| SortedNumericDocValues values = valuesSource.longValues(ctx); | ||
| return new LeafBucketCollectorBase(sub, null) { | ||
| final int MAX_TILES_PER_DOCUMENT = 10000; | ||
|
|
||
| @Override | ||
| public void collect(int doc, long owningBucketOrd) throws IOException { | ||
| if (values.advanceExact(doc)) { | ||
| final int valuesCount = values.docValueCount(); | ||
|
|
||
| if (valuesCount > MAX_TILES_PER_DOCUMENT) { | ||
| // Log a warning so the user knows why data is missing | ||
| logger.warn("Skipping doc [{}] in aggregation [{}] due to excessive tiles: [{}]", doc, name, valuesCount); | ||
| return; | ||
| } | ||
|
Comment on lines
108
to
112
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Silent data omission may mislead users with incomplete results. Skipping documents without any indication to the user means aggregation results will be incomplete without any warning. Users relying on these results won't know data is missing, which could lead to incorrect conclusions. Consider:
At minimum, the commented-out logging should be enabled rather than left as dead code. Proposed fix: Enable logging if (valuesCount > MAX_TILES_PER_DOCUMENT) {
- // Log a warning so the user knows why data is missing
- // logger.warn("Skipping doc [{}] in aggregation [{}] due to excessive tiles: [{}]", doc, name, valuesCount);
+ logger.warn("Skipping doc [{}] in aggregation [{}] due to excessive tiles: [{}]", doc, name(), valuesCount);
return;
}Note: You'll need to add a logger field at the class level: private static final Logger logger = LogManager.getLogger(GeoGridAggregator.class);🤖 Prompt for AI Agents |
||
| long previous = Long.MAX_VALUE; | ||
| for (int i = 0; i < valuesCount; ++i) { | ||
|
|
||
| final long val = values.nextValue(); | ||
| if (previous != val || i == 0) { | ||
| long bucketOrdinal = bucketOrds.add(owningBucketOrd, val); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A user is unlikely to see the server-side logs. Instead they will see their query complete with no indication that they are getting incomplete/incorrect results.