Skip to content

Conversation

craigtaverner
Copy link
Contributor

@craigtaverner craigtaverner commented Jun 17, 2025

The PR at #125143 added support for ST_GEOHASH, ST_GEOTILE and ST_GEOHEX. However, since it used long as the internal type for the grid id, there was need for many additional functions for converting the long to and from keyword as well as generating geo_shape cell bounds for map display. Each pf these involved many more files (for the functions, their docs and the generated evaluators). With inspiration from PostGIS we decided to take a different direction, and instead use a new internal type for each grid:

  • geohash for the ST_GEOHASH function, created from literal using either TO_GEOHASH(hash) or hash::geohash
  • geotile for the ST_GEOTILE function, created from literal using either TO_GEOTILE(tile) or tile::geotile
  • geohex for the ST_GEOHEX function, created from literal using either TO_GEOHEX(h3) or h3::geohex

This also leads to much stricter type checking as we can no longer use the long as a plain long in all functions that accept longs, or inadvertently using a geohash in a geotile function. However, the addition of new types involves a lot of boilerplate, especially considering the large number of functions that operate on all types, and need to be informed of the existence of these three new types.

One of the main goals of this work was to also support the concept of a geogrid search. This means "find all documents that intersect a grid cell", similar to the Query DSL version. This is achieved by enabling the use of these three new types in the ST_INTERSECTS function, but that will be done in a followup PR.

Checklist

  • Move all new functions to SNAPSHOT
  • Move spatial search capabilities into a separate PR (ST_INTERSECTS and related) - Support geohash, geotile and geohex grid types in ST_INTERSECTS #133546
  • Make PR to move out of SNAPSHOT and link docs
  • Add issues for missing pieces:
    • Support for shapes
    • Lucene pushdown
    • Missing evaluators (literals vs fields, separate evaluators for grid types, etc.)

@craigtaverner craigtaverner added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.1.0 labels Jun 17, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've created a changelog YAML for you.

long geoGridId = getGridId(point, gridId, gridType);
return gridId == geoGridId;
} else {
throw new IllegalArgumentException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be easy to implement though. It is ok to do it in a follow up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More over, I think we should do it generic, no point in distinguish a point from another geometry, we have code that that this already.

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like a lot this approach as a grid cell cannot be really represented by a geometry. I am +1 to this approach as far as ESQL folks are good with it.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to carefully consider how to deal with the breaking change that this proposes, because we remove some functions altogether. Update: Breaking change itself was performed in #129839, this PR isn't breaking anymore but adds functionality.

Additionally, the new functions added by this aren't snapshot-only, either, and will be made available immediately. It's better to have them bake a little before un-snapshotting them in a dedicated PR.

Finally, this adds multiple data types and wires them immediately - I much prefer we do that in a dedicated PR, this is a big change and can easily introduce bugs.

alex-spies added a commit that referenced this pull request Jun 23, 2025
#125143 added 9 spatial grid functions and released them into Serverless. We think this is not the best long-term approach and the functions in #129581 are likely better.

As a first step, rmove the spatial grid functions added in #125143 from release builds so they don't get released into 8.19/9.1.

---------

Co-authored-by: Craig Taverner <[email protected]>
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Jun 23, 2025
elastic#125143 added 9 spatial grid functions and released them into Serverless. We think this is not the best long-term approach and the functions in elastic#129581 are likely better.

As a first step, rmove the spatial grid functions added in elastic#125143 from release builds so they don't get released into 8.19/9.1.

---------

Co-authored-by: Craig Taverner <[email protected]>
(cherry picked from commit efb1397)

# Conflicts:
#	docs/reference/query-languages/esql/_snippets/lists/spatial-functions.md
#	docs/reference/query-languages/esql/functions-operators/spatial-functions.md
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is lovely. Thanks for splitting this @craigtaverner, much much easier to digest this way.

I found one inconsistency in the behavior of the conversion functions that should be addressed: you cannot convert a geotile type to itself, in contrast to our other data types.

Otherwise I only have minor test suggestions.

Please proceed at your own discretion and feel free to address anything in follow-up PRs if it's easier to get this chonky PR merged, first; I imagine the merge conflicts are not fun to deal with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to support count_distinct in a follow-up? It currently raises a verification exception when used with geohash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I investigated and it appears CountDistinct uses isExact to ensure that only actual fields (not reference attributes) are ever used. Not sure why this distinction is there, but it does block all types that are not fields (and the grid types are never es fields). So if we ever want to support this in future it is a bigger lift because it means supported non-fields types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my mistake, I read the code wrong. It does not require fields, but does have explicit support for each type, unlike Count. However, I can see that none of the spatial types are supported. It would be easy to support all spatial types as BytesRef, and grid types as long, but I think we should do that in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we add support for this, let's also add a csv test?

We could index some documents with keywords representing geohashes, sometimes being null. Then from idx | eval x = hash_as_keyword::geohash | stats count(x) should return the non-null count. And similarly for hex and tiles.

Also applies to the other agg functions that get geotile/hash/hex support in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Updated the existing tests to use COUNT(grid) instead of COUNT(*).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a csv test that uses this with geotiles/hex/hashes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grid converters are missing the trivial conversion for the same type to itself. Our other converters support this, but for instance row x = (\"u3bu\"::geohash)::geohash raises a verification exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add csv tests with the MV_... functions and the new data types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added an MV test for each testing a few MV functions, including the new MV_CONTAINS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, a csv test to confirm this working e2e would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one doesn't have a corresponding test update, neither in union tests nor in csv tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, dear, this one is a bit of a can of worms, as there seems to be wider support in the operator than in covered by the tests, outside of grid support. Notably datetime. I think a separate issue to resolve all inconsistencies is needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@craigtaverner craigtaverner merged commit f7dd604 into elastic:main Aug 30, 2025
33 checks passed
craigtaverner added a commit that referenced this pull request Sep 1, 2025
The PR at #129581 did not carefully check that all tests also work as release tests. This lead to some CI failures, which we are fixing here.
craigtaverner added a commit that referenced this pull request Sep 5, 2025
)

In the work at #129581 we added new geo-grid types: `GEOHASH`, `GEOTILE` and `GEOHEX`, as well as support functions for creating these from `geo_point`, `long` and `keyword`.

However, one of the key use cases we wish to replicate from the Query DSL is the grid search, and this involves the need to include the grid id inside a search predicate, in particular the `ST_INTERSECTS` function. For example:

```
FROM airports
| WHERE ST_INTERSECTS(location, TO_GEOTILE("3/4/3"))
| STATS
    count = COUNT(*),
    centroid = ST_CENTROID_AGG(location)
```

Since ST_INTERSECTS and ST_DISJOINT are converse of each other, we also added this support to that function.
elasticsearchmachine pushed a commit that referenced this pull request Sep 9, 2025
The PR at #129581 added
three new types: `geohash`, `geotile` and `geohex`, and support
functions for creating these from strings, longs and geo_point fields.
However all this was done under SNAPSHOT. Now it's time to move it into
`tech-preview`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL :Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants