Skip to content

Add MultiPoint datatype support #4662

Open
Gopalverma062 wants to merge 8 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
Gopalverma062:BABEL_6310_MULTIPOINT
Open

Add MultiPoint datatype support #4662
Gopalverma062 wants to merge 8 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
Gopalverma062:BABEL_6310_MULTIPOINT

Conversation

@Gopalverma062
Copy link
Copy Markdown
Contributor

@Gopalverma062 Gopalverma062 commented Mar 19, 2026

Description

Currently, Babelfish supports geometry and geography spatial types for Point, LineString, and Polygon geometries. Attempting to use MultiPoint geometry type results in an "unsupported geometry type" error across all input/output paths including WKT text input, binary (varbinary) conversion, and spatial functions like STGeomFromText and STMPointFromText.

With this change, Babelfish now fully supports the MultiPoint geometry type for both geometry and geography data types. Users can create, store, retrieve, convert, and validate MultiPoint geometries through all existing spatial interfaces, matching T-SQLServer behavior.

Why: MultiPoint is a fundamental OGC geometry type used in real-world spatial applications to represent collections of discrete point locations (e.g., sensor networks, store locations, sample sites). Supporting it closes a significant gap in spatial type coverage and moves Babelfish closer to full T-SQL spatial compatibility.

How the code was changed:

  1. WKT Parser (grammar + rewrite functions): Added grammar rules for parsing MULTIPOINT((x y), ...) and MULTIPOINT Z/M/ZM(...) syntax in both parenthesized and bare coordinate formats. Added rewrite_multipoint_wkt() and rewrite_dim_multipoint_wkt() functions for WKT normalization.

  2. T-SQL CLR Binary ↔ PostGIS WKB conversion (spatialtypes.c):

    • Bytea → PostGIS (T-SQL input): Added CLR figure/shape metadata parsing for MultiPoint, validation of STROKE figures and POINT child shapes, and conversion from CLR columnar coordinate layout to PostGIS WKB MultiPoint format.
    • PostGIS → Bytea (T-SQL output): Added de-interleaving of PostGIS per-point WKB into CLR columnar format (XY block, Z block, M block), plus CLR metadata serialization (figures + shapes arrays).
  3. Geography validation: Added MultiPoint coordinate validation for latitude/longitude range constraints using ST_GeometryN PostGIS accessor — consistent with the pattern used for other geometry types.

  4. SQL functions (geometry.sql): Added STMPointFromText functions for both geometry and geography types with proper NULL handling, SRID validation, and type checking.

  5. Empty geometry handling: Added support for empty MultiPoint in both CLR binary detection (props byte 0x04 with EMPTY_COORD pattern and npoints=0 with Z/M flags) and PostGIS WKB output.

  6. Test coverage: Added 12–13 MultiPoint test cases across geometry and geography ODBC tests covering 2D, Z, M, ZM, mixed dimensions, NULL coordinate stripping, single-point, empty, and different SRID scenarios.


Issues Resolved

  • MultiPoint geometry type support for sys.geometry and sys.geography data types
  • geometry::STGeomFromText('MULTIPOINT(...)') now works
  • geography::STGeomFromText('MULTIPOINT(...)') now works with latitude/longitude validation
  • geometry::STMPointFromText() and geography::STMPointFromText() functions added
  • Binary round-trip (varbinarygeometry/geography) for MultiPoint
  • STAsText() and STAsBinary() output for MultiPoint
  • Empty MultiPoint (MULTIPOINT EMPTY) handling in all paths
  • MultiPoint with Z, M, and ZM dimensions including mixed-dimension point collections

Test Scenarios Covered

  • Use case based -

    • Insert/Select MultiPoint via STGeomFromText with geometry and geography types
    • Insert/Select MultiPoint via STMPointFromText with type validation
    • Binary round-trip: geometryvarbinarygeometry for all dimension variants
    • WKT round-trip: text → geometry → text for MultiPoint
    • View creation and querying with MultiPoint columns
    • Update operations replacing other geometry types with MultiPoint and vice versa
  • Boundary conditions -

    • Single-point MultiPoint: MULTIPOINT((1 2)) — minimum valid collection
    • Empty MultiPoint: MULTIPOINT EMPTY — zero points
    • Large point count MultiPoint collections
    • Mixed dimensions: points with Z/M mixed with points without Z/M in same collection
    • NULL coordinate handling: MULTIPOINT((1 2 NULL), (3 4 NULL)) stripped to 2D
    • NULL Z and NULL M: MULTIPOINT((1 2 NULL NULL), (3 4 NULL NULL)) stripped to 2D
  • Arbitrary inputs -

    • 2D MultiPoint: MULTIPOINT((1 2), (3 4))
    • 3D (Z) MultiPoint: MULTIPOINT((1 2 3), (4 5 6))
    • M MultiPoint: MULTIPOINT((1 2 NULL 3), (4 5 NULL 6))
    • ZM MultiPoint: MULTIPOINT((1 2 3 4), (5 6 7 8))
    • Mixed Z: MULTIPOINT((1 2 3), (4 5)) — second point gets Z=NaN
    • Mixed M: MULTIPOINT((1 2 NULL 3), (4 5)) — second point gets M=NaN
    • Mixed ZM: MULTIPOINT((1 2 3 4), (5 6)) — second point gets Z=NaN, M=NaN
    • Bare coordinate format: MULTIPOINT(1 2, 3 4) — without inner parentheses
    • Different SRID: MULTIPOINT((1 2), (3 4)) with SRID=0 (geometry only)
  • Negative test cases -

    • STMPointFromText with non-MultiPoint input (e.g., POINT, LINESTRING) — expects error
    • STMPointFromText with NULL SRID — expects parameter error
    • STMPointFromText with negative SRID — expects error
    • Geography MultiPoint with latitude > 90 or < -90 — expects error
    • Geography MultiPoint with longitude > 15069 or < -15069 — expects error
    • Geography MultiPoint with NaN/Infinity coordinates — expects error
    • Invalid varbinary data conversion to MultiPoint — expects conversion error
    • Unsupported geometry type passed through MultiPoint path — expects error
  • Minor version upgrade tests -

    • Existing Point, LineString, and Polygon tests continue to pass unchanged
    • No changes to existing binary format for other geometry types
  • Major version upgrade tests -

    • N/A — new feature addition, no schema migration required
  • Performance tests -

    • MultiPoint with multiple points verified for correct CLR metadata serialization
    • No additional PostGIS function calls per point in binary conversion path (columnar batch processing)
  • Tooling impact -

    • ODBC driver compatibility verified through psqlodbc test framework
    • No changes to ODBC driver required — uses existing UDT binary protocol
  • Client tests -

    • ODBC insert, update, select, and view tests for both geometry and geography MultiPoint
    • Binary representation verified byte-by-byte against MSSQL Server output

Check List

  • Commits are signed per the DCO using --gopalgv@amazon.com

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Mohitraj-g Mohitraj-g closed this Mar 23, 2026
@Mohitraj-g Mohitraj-g reopened this Mar 23, 2026
@Gopalverma062 Gopalverma062 force-pushed the BABEL_6310_MULTIPOINT branch from d85e56b to 5dfc3ce Compare March 23, 2026 12:10
RAISE EXCEPTION '''geography::STMPointFromText'' failed because parameter 2 is not allowed to be null.';
ELSIF $1 IS NULL THEN
RETURN NULL;
ELSIF $2 < 0 THEN
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this check since we already have SRID validity check later on

RAISE EXCEPTION '''geometry::STMPointFromText'' failed because parameter 2 is not allowed to be null.';
ELSIF $1 IS NULL THEN
RETURN NULL;
ELSIF $2 < 0 THEN
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Comment on lines +255 to +260
| MPOINT_TOK Z_TOK EMPTY_TOK
{ $$ = pstrdup("MULTIPOINT EMPTY"); }
| MPOINT_TOK M_TOK EMPTY_TOK
{ $$ = pstrdup("MULTIPOINT EMPTY"); }
| MPOINT_TOK ZM_TOK EMPTY_TOK
{ $$ = pstrdup("MULTIPOINT EMPTY"); }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need rules for these like : MULTIPOINT M EMPTY, TSQL doesn't allow it, does postgis create these ?

Comment on lines +158 to +160
#define WKB_POINT_TYPE 1
#define WKB_LINESTRING_TYPE 2
#define WKB_POLYGON_TYPE 3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can resuse already defined macros e.g.#define POINT_TYPE

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or better to convert this to an ENUM.

StringInfoData output;

if (!pa)
return NULL;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return pstrdup("MULTIPOINT EMPTY") instead similar to rewrite_dim_multipoint_wkt?



char*
rewrite_dim_multipoint_wkt(PointArray *pa)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the above function share a lot of common code, can we extract the logic to helper function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +129 to +132
#define EMPTY_POINT_Binary "\x01\x04\x00\x00\x00\x00\x00\x00\x00" /* Binary for empty point */
#define EMPTY_LINE_Binary "\x01\x02\x00\x00\x00\x00\x00\x00\x00" /* Binary for empty linestring */
#define EMPTY_POLYGON_Binary "\x01\x03\x00\x00\x00\x00\x00\x00\x00" /* Binary for empty polygon */
#define EMPTY_MULTIPOINT_Binary "\x01\x04\x00\x00\x00\x00\x00\x00\x00"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rather use _Bytes instead of _Binary

Comment on lines +158 to +160
#define WKB_POINT_TYPE 1
#define WKB_LINESTRING_TYPE 2
#define WKB_POLYGON_TYPE 3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or better to convert this to an ENUM.

@@ -0,0 +1,82 @@
# Geometry Views — Properties
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this new test to lastest/schedule file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@rishabhtanwar29
Copy link
Copy Markdown
Contributor

Let's fix the PR title.

@Gopalverma062 Gopalverma062 changed the title BABEL_6348_MULTIPOINT_SUPPORT Add MultiPoint datatype support Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants