Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@

package org.opensearch.geo.search.aggregations.bucket.geogrid;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.SortedNumericDocValues;
import org.apache.lucene.search.ScoreMode;
Expand All @@ -57,7 +59,9 @@
*
* @opensearch.api
*/

public abstract class GeoGridAggregator<T extends BaseGeoGrid> extends BucketsAggregator {
private static final Logger logger = LogManager.getLogger(GeoGridAggregator.class);

protected final int requiredSize;
protected final int shardSize;
Expand Down Expand Up @@ -94,13 +98,21 @@ public ScoreMode scoreMode() {
public LeafBucketCollector getLeafCollector(LeafReaderContext ctx, final LeafBucketCollector sub) throws IOException {
SortedNumericDocValues values = valuesSource.longValues(ctx);
return new LeafBucketCollectorBase(sub, null) {
final int MAX_TILES_PER_DOCUMENT = 10000;

@Override
public void collect(int doc, long owningBucketOrd) throws IOException {
if (values.advanceExact(doc)) {
final int valuesCount = values.docValueCount();

if (valuesCount > MAX_TILES_PER_DOCUMENT) {
// Log a warning so the user knows why data is missing
logger.warn("Skipping doc [{}] in aggregation [{}] due to excessive tiles: [{}]", doc, name, valuesCount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user is unlikely to see the server-side logs. Instead they will see their query complete with no indication that they are getting incomplete/incorrect results.

return;
}
Comment on lines 108 to 112
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Silent data omission may mislead users with incomplete results.

Skipping documents without any indication to the user means aggregation results will be incomplete without any warning. Users relying on these results won't know data is missing, which could lead to incorrect conclusions.

Consider:

  1. Enable the logging (uncomment line 106) so operators can at least see warnings in logs
  2. Add a response header or metadata to indicate documents were skipped
  3. Track a metric for observability (e.g., count of skipped documents)

At minimum, the commented-out logging should be enabled rather than left as dead code.

Proposed fix: Enable logging
             if (valuesCount > MAX_TILES_PER_DOCUMENT) {
-                // Log a warning so the user knows why data is missing
-                // logger.warn("Skipping doc [{}] in aggregation [{}] due to excessive tiles: [{}]", doc, name, valuesCount);
+                logger.warn("Skipping doc [{}] in aggregation [{}] due to excessive tiles: [{}]", doc, name(), valuesCount);
                 return;
             }

Note: You'll need to add a logger field at the class level:

private static final Logger logger = LogManager.getLogger(GeoGridAggregator.class);
🤖 Prompt for AI Agents
In
`@modules/geo/src/main/java/org/opensearch/geo/search/aggregations/bucket/geogrid/GeoGridAggregator.java`
around lines 104 - 108, The code currently silently returns in GeoGridAggregator
when a document exceeds MAX_TILES_PER_DOCUMENT; restore the commented
logger.warn call and add a class-level logger (e.g., private static final Logger
logger = LogManager.getLogger(GeoGridAggregator.class)) so skipped documents are
recorded (include doc, name, and valuesCount in the message), and additionally
increment a skipped-documents counter field (e.g., skippedDocs or
skippedTilesCount) inside the same branch; finally expose that counter in the
aggregation metadata or response header from the aggregator result creation path
so clients can detect that documents were omitted.

long previous = Long.MAX_VALUE;
for (int i = 0; i < valuesCount; ++i) {

final long val = values.nextValue();
if (previous != val || i == 0) {
long bucketOrdinal = bucketOrds.add(owningBucketOrd, val);
Expand Down