NOTE: This sample uses the term 'UDF' throughout, which is just a synonym for Spanner Remote Functions.
This sample demonstrates how to perform geo-spatial indexing and querying on Google Cloud Spanner using the S2 Geometry Library. Spanner does not natively support spatial data types or spatial indexes, so we use S2 to encode geographic coordinates into indexable 64-bit cell IDs.
The sample includes two approaches to querying:
- Client-side S2: The application computes S2 coverings and binds cell ID ranges as query parameters.
- Remote UDFs: Spanner calls Cloud Functions server-side to compute coverings and distances, so the query is self-contained SQL with no client-side S2 dependency.
- Java 17+
- Maven 3.8+
- A Google Cloud project with a Spanner instance and database
gcloudCLI installed and authenticated (gcloud auth application-default login)- For Remote UDFs: permissions to deploy Cloud Functions (Gen 2)
-
Create the Spanner schema:
gcloud spanner databases ddl update YOUR_DATABASE \ --instance=YOUR_INSTANCE \ --ddl-file=infra/schema.sql
-
Build the project:
mvn compile
-
Run the demo:
mvn exec:java -Dexec.args="YOUR_PROJECT YOUR_INSTANCE YOUR_DATABASE" -
(Optional) Deploy Remote UDFs for the server-side S2 query demos:
./deploy/setup.sh --project YOUR_PROJECT ./deploy/deploy-function.sh --project YOUR_PROJECT ./deploy/grant-permissions.sh --project YOUR_PROJECT
After deployment,
deploy-function.shprints the function URLs. Updateinfra/udf_definition.sqlwith the actual URLs (replacingPLACEHOLDER_URL), then apply the DDL:gcloud spanner databases ddl update YOUR_DATABASE \ --instance=YOUR_INSTANCE \ --ddl-file=infra/udf_definition.sql
We walk through three progressively refined schema designs. The recommended pattern is v3 (token index table).
v1: Naive Lat/Lng Columns (schemas/v1_naive.sql)
Store raw coordinates with a composite index. Simple but inefficient for radius queries -- scans an entire latitude band.
v2: Single S2 Cell ID (schemas/v2_single_cell.sql)
Add an S2 Cell ID column at a fixed level (e.g., level 16, ~150m cells). Better, but a fixed level means either too coarse or too fine for different query radii.
v3: Interleaved Token Index (Recommended) (schemas/v3_token_index.sql)
The canonical pattern. Store multiple S2 tokens per location at varying cell levels in an interleaved child table. This balances precision vs. index size and supports queries at any radius. The production schema (infra/schema.sql) uses this design.
CREATE TABLE PointOfInterest (
PoiId STRING(36) NOT NULL,
Name STRING(MAX),
Category STRING(256),
Latitude FLOAT64 NOT NULL,
Longitude FLOAT64 NOT NULL,
) PRIMARY KEY (PoiId);
-- One row per S2 token per location, at levels 12, 14, and 16
CREATE TABLE PointOfInterestLocationIndex (
PoiId STRING(36) NOT NULL,
S2CellId INT64 NOT NULL,
CellLevel INT64 NOT NULL,
) PRIMARY KEY (PoiId, S2CellId),
INTERLEAVE IN PARENT PointOfInterest ON DELETE CASCADE;
CREATE INDEX LocationIndexByS2Cell
ON PointOfInterestLocationIndex(S2CellId)
STORING (CellLevel);Why interleaving? The child table rows are physically co-located with their parent row in Spanner's storage. This minimizes the number of splits traversed during a join-back from the index to the parent.
Why multiple levels? Level 12 cells (~3.3km) are good for coarse regional queries; level 16 cells (~150m) provide fine-grained precision. The covering algorithm picks the right level automatically.
Gotcha -- Signed vs. Unsigned: S2 Cell IDs are unsigned 64-bit integers, but Spanner's
INT64is signed. We store the raw bit pattern reinterpreted as a signed long. This preserves sort order for range scans within the same face of the S2 cube. The application code handles the sign bit transparently (Java'slongis signed, andS2CellId.id()returns the raw bits).
All queries follow the covering + post-filter pattern:
- Compute a covering: Find the set of S2 cells that cover the search region.
- Query the index: Match covering cells against
PointOfInterestLocationIndex.S2CellIdvia theLocationIndexByS2Cellsecondary index. - Post-filter: The covering is an approximation -- some cells extend beyond the search region. Filter candidates with exact distance or bounds.
These queries require the application to compute S2 coverings and bind cell ID ranges as parameters.
- Radius search (
queries/radius_search.sql) -- Find all POIs within a given distance from a point. - Bounding box search (
queries/bbox_search.sql) -- Find all POIs within a rectangle. - Approximate k-NN (
queries/knn_approx.sql) -- Find the k closest POIs using iterative radius expansion.
With Remote UDFs deployed, queries become self-contained SQL -- no client-side S2 library needed. The client only provides coordinates and a radius or bounding box.
- Radius search (
queries/udf_query.sql) -- Usesgeo.s2_covering()for covering cells andgeo.s2_distance()for post-filtering. - Bounding box search (
queries/udf_bbox_query.sql) -- Usesgeo.s2_covering_rect()for covering cells with exact lat/lng post-filter. - Approximate k-NN (
queries/udf_knn_query.sql) -- Usesgeo.s2_covering()andgeo.s2_distance()withLIMIT @k. Iterative expansion is handled in application code.
Here is the UDF radius search query as an example. The client only provides (lat, lng, radius):
WITH candidates AS (
SELECT DISTINCT poi.PoiId, poi.Name, poi.Category, poi.Latitude, poi.Longitude
FROM (SELECT geo.s2_covering(@centerLat, @centerLng, @radiusMeters) AS cells),
UNNEST(cells) AS covering_cell
JOIN PointOfInterestLocationIndex idx ON idx.S2CellId = covering_cell
JOIN PointOfInterest poi ON poi.PoiId = idx.PoiId
),
with_distance AS (
SELECT c.PoiId, c.Name, c.Category, c.Latitude, c.Longitude,
geo.s2_distance(c.Latitude, c.Longitude, @centerLat, @centerLng) AS distance_meters
FROM candidates c
)
SELECT * FROM with_distance
WHERE distance_meters <= @radiusMeters
ORDER BY distance_meters;Three Remote UDFs power these queries:
| UDF | Purpose |
|---|---|
geo.s2_covering(lat, lng, radius) |
Returns ARRAY<INT64> of S2 cell IDs covering a search circle |
geo.s2_covering_rect(minLat, minLng, maxLat, maxLng) |
Returns ARRAY<INT64> of S2 cell IDs covering a bounding box |
geo.s2_distance(lat1, lng1, lat2, lng2) |
Returns great-circle distance in meters between two points |
Note: Remote UDFs must live in a named schema (Spanner does not allow them in the default schema). This sample uses the
geoschema. Additionally,UNNESTof a Remote UDF result requires materializing the array in a subquery first --UNNEST(geo.s2_covering(...))directly inFROMis not supported.
Remote UDFs push S2 logic into Spanner so queries don't require a client-side S2 library. Three Cloud Functions back the UDFs:
| Cloud Function | Entry Point | Spanner UDF |
|---|---|---|
s2-covering |
S2CoveringFunction |
geo.s2_covering() |
s2-covering-rect |
S2CoveringRectFunction |
geo.s2_covering_rect() |
s2-distance |
S2DistanceFunction |
geo.s2_distance() |
The Cloud Functions live in cloud-function/ as a separate Maven project:
S2CoveringFunction.java-- Computes S2 coverings for a circular region at levels 12, 14, 16. Returns cell IDs as JSON strings (not numbers) because S2 cell IDs exceed JSON's safe integer limit of 2^53.S2CoveringRectFunction.java-- Computes S2 coverings for a rectangular region (bounding box) at levels 12, 14, 16. Same wire protocol and cell ID encoding asS2CoveringFunction.S2DistanceFunction.java-- Computes great-circle distance usingS2LatLng.getDistance().
All three implement the Spanner Remote UDF wire protocol:
- Request:
{"requestId": "...", "calls": [[args_row1], [args_row2], ...]} - Response:
{"replies": [result1, result2, ...]}(array length must matchcalls)
Deployment scripts are in deploy/. Run them in order:
# 1. Enable required GCP APIs (Cloud Functions, Cloud Build, Cloud Run, Spanner, Artifact Registry)
./deploy/setup.sh --project YOUR_PROJECT
# 2. Deploy all three Cloud Functions (builds with Maven, then deploys)
./deploy/deploy-function.sh --project YOUR_PROJECT
# 3. Grant Spanner's service agent permission to invoke the functions
./deploy/grant-permissions.sh --project YOUR_PROJECTAfter deployment, deploy-function.sh prints the function URLs. Update infra/udf_definition.sql with the actual URLs (replacing PLACEHOLDER_URL), then apply the DDL:
gcloud spanner databases ddl update YOUR_DATABASE \
--instance=YOUR_INSTANCE \
--ddl-file=infra/udf_definition.sql./deploy/teardown.sh --project YOUR_PROJECTThis deletes the Cloud Functions and removes the project-level roles/spanner.serviceAgent IAM binding that was granted to the Spanner service agent. To also remove the UDF definitions from Spanner:
DROP FUNCTION IF EXISTS geo.s2_covering;
DROP FUNCTION IF EXISTS geo.s2_covering_rect;
DROP FUNCTION IF EXISTS geo.s2_distance;
DROP SCHEMA IF EXISTS geo;The Java application demonstrates the full workflow: seeding data, indexing with S2, and running all six query types (three client-side, three Remote UDF).
| Class | Purpose |
|---|---|
App.java |
Entry point -- seeds data and runs all demo queries |
S2Util.java |
S2 helper: cell ID encoding at levels 12/14/16, covering computation for circles and rectangles, Haversine distance |
SpannerGeoDao.java |
Spanner data access with parameterized queries -- both client-side and UDF variants |
Location.java |
POJO for a geo-tagged point of interest |
When saving a location, the app:
- Computes S2 Cell IDs at levels 12, 14, and 16 using
S2Util.encodeCellIds() - Writes the parent
PointOfInterestrow and onePointOfInterestLocationIndexrow per cell level - Uses
INSERT_OR_UPDATEmutations so re-running the demo is idempotent (no duplicate rows)
All mutations happen in a single Spanner transaction.
Client-side approach (e.g., SpannerGeoDao.radiusSearch()):
S2Util.computeCovering()returns a list of S2 cell ID ranges- The DAO builds a parameterized query with
BETWEENclauses per range - Results are post-filtered by exact Haversine distance and sorted
Remote UDF approach (e.g., SpannerGeoDao.radiusSearchWithUdf()):
- The DAO sends a single parameterized query with
@centerLat,@centerLng,@radiusMeters geo.s2_covering()computes covering cells server-sidegeo.s2_distance()computes exact distances server-side- No S2 library dependency in the DAO method -- pure SQL
Remote UDF bounding box (SpannerGeoDao.bboxSearchWithUdf()):
- Sends a single query with
@minLat,@minLng,@maxLat,@maxLng geo.s2_covering_rect()computes covering cells server-side- Post-filters with exact lat/lng bounds -- no distance computation needed
Remote UDF k-NN (SpannerGeoDao.knnSearchWithUdf()):
- Wraps
radiusSearchWithUdf()with iterative radius doubling (starts at 500m, up to 8 attempts) - Returns the top k results by distance
The UDF demos in App.java are wrapped in try-catch, so the app runs cleanly whether or not UDFs are deployed.
=== Seeding sample data (San Francisco landmarks) ===
Inserted 15 locations.
============================================================
CLIENT-SIDE QUERIES (S2 computed on the application)
============================================================
--- Radius Search ---
Center: (37.7880, -122.4075), Radius: 2000m
Found 8 location(s):
0m - Union Square (shopping) [37.788000, -122.407500]
345m - Chinatown Gate (landmark) [37.790800, -122.405800]
724m - Moscone Center (convention) [37.784000, -122.401000]
901m - Transamerica Pyramid (landmark) [37.795200, -122.402800]
1418m - City Hall (government) [37.779300, -122.419300]
1492m - Ferry Building (landmark) [37.795600, -122.393500]
1608m - Coit Tower (landmark) [37.802400, -122.405800]
1911m - AT&T Park (Oracle Park) (sports) [37.778600, -122.389300]
--- Bounding Box Search ---
Box: (37.775, -122.420) to (37.795, -122.400)
Found 4 location(s):
...
--- Approximate k-NN Search (k=3) ---
Center: (37.7880, -122.4075)
Closest 3 location(s):
0m - Union Square (shopping) [37.788000, -122.407500]
345m - Chinatown Gate (landmark) [37.790800, -122.405800]
724m - Moscone Center (convention) [37.784000, -122.401000]
============================================================
REMOTE UDF QUERIES (S2 computed server-side)
============================================================
--- Radius Search with Remote UDFs ---
Center: (37.7880, -122.4075), Radius: 2000m
Found 8 location(s):
0m - Union Square (shopping) [37.788000, -122.407500]
345m - Chinatown Gate (landmark) [37.790800, -122.405800]
...
--- Bounding Box Search with Remote UDFs ---
Box: (37.775, -122.420) to (37.795, -122.400)
Found 4 location(s):
...
--- Approximate k-NN Search with Remote UDFs (k=3) ---
Center: (37.7880, -122.4075)
Closest 3 location(s):
...
Done.
If Remote UDFs are not deployed, the UDF section prints a message and continues:
--- Radius Search with Remote UDFs ---
Center: (37.7880, -122.4075), Radius: 2000m
Remote UDFs not deployed, skipping radius search.
Deploy with: ./deploy/deploy-function.sh
Error: ...
sample/
├── README.md # This file
├── pom.xml # Maven project (main application)
│
├── infra/
│ ├── schema.sql # Production schema (v3 token index)
│ └── udf_definition.sql # Remote UDF DDL (geo.s2_covering, geo.s2_covering_rect, geo.s2_distance)
│
├── schemas/ # All schema iterations (for reference)
│ ├── v1_naive.sql
│ ├── v2_single_cell.sql
│ └── v3_token_index.sql
│
├── queries/
│ ├── radius_search.sql # Client-side radius search
│ ├── bbox_search.sql # Client-side bounding box query
│ ├── knn_approx.sql # Client-side approximate k-NN query
│ ├── udf_query.sql # UDF radius search
│ ├── udf_bbox_query.sql # UDF bounding box search
│ └── udf_knn_query.sql # UDF approximate k-NN search
│
├── cloud-function/ # Cloud Functions backing Remote UDFs
│ ├── pom.xml # Separate Maven project
│ └── src/main/java/.../functions/
│ ├── S2CoveringFunction.java
│ ├── S2CoveringRectFunction.java
│ └── S2DistanceFunction.java
│
├── deploy/ # Deployment & IAM scripts
│ ├── setup.sh # Enable GCP APIs
│ ├── deploy-function.sh # Deploy Cloud Functions
│ ├── grant-permissions.sh # IAM: Spanner service agent -> project-level serviceAgent role
│ └── teardown.sh # Clean up Cloud Functions
│
└── src/main/java/.../
├── App.java # Entry point -- seeds data, runs all queries
├── S2Util.java # S2 geometry utilities
├── SpannerGeoDao.java # Spanner data access (client-side + UDF queries)
└── model/
└── Location.java # Location POJO
| Level | Approx. Cell Size | Good For |
|---|---|---|
| 12 | ~3.3 km | City-wide queries, coarse regional filtering |
| 14 | ~800 m | Neighborhood-level queries |
| 16 | ~150 m | Street-level queries, fine-grained precision |
| 20 | ~10 m | Building-level (not used in this sample) |
| 30 | ~1 cm | Maximum precision (leaf cell) |
Remote Functions must live in a named schema. Spanner does not allow Remote Functions in the default schema. We use CREATE SCHEMA IF NOT EXISTS geo and qualify all calls as geo.s2_covering(), geo.s2_covering_rect(), and geo.s2_distance().
UNNEST of a Remote Function result requires a subquery. Writing FROM UNNEST(geo.s2_covering(...)) directly in the FROM clause does not work. You must materialize the array in a subquery first: FROM (SELECT geo.s2_covering(...) AS cells), UNNEST(cells). This is a peculiarity of how UNNEST works with remote functions in Spanner today.
Signed vs Unsigned integers. S2 Cell IDs are unsigned 64-bit integers. Spanner’s INT64 is signed. We store the raw bit pattern, which means some Cell IDs appear as negative numbers in Spanner. Java's long is signed too, and S2CellId.id() returns the same raw bits. Range scans within the same S2 cube face (which is the common case for geographically bounded queries) sort correctly regardless of the sign interpretation.
Cold start. You may see increased latency in your queries due to cold start time associated with Java based Cloud functions. You can address this by setting a minimum number of instances for use by Cloud Run functions to 1 (it's 0 by default) as recommended here.