Skip to content

Commit 35572f9

Browse files
authored
[doc] Add gep type documention (#2663)
## Versions - [x] dev - [ ] 3.0 - [ ] 2.1 - [ ] 2.0 ## Languages - [ ] Chinese - [ ] English ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built
1 parent bd8ea6f commit 35572f9

File tree

6 files changed

+774
-0
lines changed

6 files changed

+774
-0
lines changed
Lines changed: 395 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,395 @@
1+
---
2+
{
3+
"title": "GEO_TYPE",
4+
"language": "en"
5+
}
6+
---
7+
# GEO Type Documentation
8+
9+
Geospatial types are special data types in databases used to store and manipulate geospatial data, which can represent geometric objects such as points, lines, and polygons.
10+
- Core purposes:
11+
- Store geographic location information (e.g., longitude and latitude).
12+
- Support spatial queries (e.g., distance calculation, area inclusion, intersection judgment).
13+
- Process geospatial analysis (e.g., buffer analysis, path planning).
14+
Geographic Information Systems are widely used in map services, logistics scheduling, location-based social networking, meteorological monitoring, etc. The core requirement is to efficiently store massive spatial data and support low-latency spatial computing.
15+
16+
17+
# Core Encoding Technologies
18+
## S2 Geometry Library
19+
S2 Geometry is a spherical geometry encoding system developed by Google. Its core idea is to achieve efficient indexing of global geospatial data through projection from a sphere to a plane.
20+
21+
### Core Principles
22+
- Spherical projection: Project the Earth's sphere onto the 6 faces of a regular hexahedron, converting 3D spherical data into 2D planar data.
23+
- Hierarchical grid division: Each face is recursively divided into quadrilateral grids (cells), and each cell can be further subdivided into 4 smaller sub-cells, forming a hierarchical structure with 30 levels of precision (the higher the level, the smaller the cell area and the higher the precision).
24+
- 64-bit encoding: Each cell is assigned a unique 64-bit ID, through which spatial positions can be quickly located and spatial relationships can be judged.
25+
- Hilbert curve ordering: Hilbert space-filling curves are used to encode cells, making spatially adjacent cells have continuous IDs and optimizing range query performance.
26+
27+
### Advantages
28+
- High precision and smooth transition: 30 levels of hierarchy, with precision ranging from global (level 0) to centimeter-level (level 30), ensuring smooth transition to meet the needs of different scenarios.
29+
- Efficiency in global range queries: Suitable for large-scale spatial queries (e.g., cross-continental, cross-country regional analysis) with no significant performance degradation.
30+
- Efficient spatial relationship calculation: Inclusion, intersection, and other relationships can be quickly judged through cell IDs, avoiding complex geometric operations.
31+
32+
33+
## GeoHash Encoding
34+
GeoHash is a geocoding method based on equirectangular projection, which realizes spatial indexing by converting longitude and latitude into strings.
35+
36+
### Core Principles
37+
- Planar projection: Approximate the Earth's sphere as a plane, and recursively divide the area through binary division of longitude and latitude.
38+
- Rectangular grid division: Divide the Earth's surface into rectangular cells with different precisions. The length of the string determines the precision (up to 12 characters), and each additional character increases the precision by approximately 10 times.
39+
- Z-order curve encoding: Form a Z-order curve by alternately truncating the binary bits of longitude and latitude, converting 2D coordinates into 1D strings.
40+
41+
### Features
42+
- Indexing convenience: Adjacent areas can be quickly queried through string prefix matching (e.g., GeoHash codes with the same prefix correspond to spatially adjacent areas).
43+
- Limitations:
44+
- Limited precision levels: Up to 12 levels, with steep transitions between levels, making it difficult to meet the needs of high-precision smooth division.
45+
- Mutability of Z-order curves: Spatially adjacent areas may have discontinuous codes due to curve jumps, affecting the accuracy of range queries.
46+
- Low efficiency in large-scale queries: When querying global ranges, a large number of discrete cells need to be scanned, resulting in poor performance.
47+
48+
49+
### Comprehensive Comparison and Selection
50+
Comprehensively comparing the characteristics of S2 Geometry Library and GeoHash, we choose S2 Geometry Library as the third-party dependency for geospatial processing, mainly for the following reasons:
51+
- Adaptability to global range queries: S2's hierarchical grid design is more suitable for large-scale spatial analysis, while GeoHash has performance bottlenecks in cross-region queries.
52+
- Precision and smoothness: S2's 30-level hierarchy can achieve smooth transition from global to centimeter-level, meeting multi-scenario precision requirements, which is better than GeoHash's 12-level division.
53+
- Spatial continuity: Hilbert curves have better spatial continuity than Z-order curves, which can reduce redundant calculations in range queries.
54+
55+
56+
# Introduction to WKT
57+
WKT (Well-Known Text) is a standard text format for representing geospatial data.
58+
59+
## Definition
60+
- Text format: Describe the structure and coordinates of geometric objects with text strings.
61+
- Features: Human-readable, easy to edit, suitable for manual input or simple data exchange.
62+
63+
## Syntax Structure
64+
- Basic format: GeometryType(CoordinateValues)
65+
- Common geometric types:
66+
- Point: POINT(longitude, latitude)
67+
Example: POINT(112.46, 45.23) represents the longitude and latitude of a point.
68+
- LineString: LINESTRING(point1, point2)
69+
Example: LINESTRING(0 0, 1 1) represents a line segment connecting two points.
70+
- Polygon: POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))
71+
72+
73+
# Introduction to WKB
74+
WKB (Well-Known Binary) is a standard binary data format for representing geospatial data.
75+
76+
## Definition
77+
- Binary format: Represent geometric objects with binary encoding, which is more compact and efficient than WKT.
78+
- Features: Optimized for internal storage and transmission by computers, saving space and enabling fast parsing.
79+
80+
## Encoding Structure
81+
WKB consists of the following parts:
82+
- Byte order (1 byte):
83+
- 0x00: Big Endian (network byte order)
84+
- 0x01: Little Endian (common in Intel/AMD architectures)
85+
- Geometry type (4-byte integer):
86+
- 1: Point
87+
- 2: LineString
88+
- 3: Polygon
89+
- ... (other types)
90+
- Coordinate values:
91+
- Point: x, y (or x, y, z)
92+
- LineString: coordinates of point1, coordinates of point2
93+
- Polygon: coordinates of point1, coordinates of point2...
94+
95+
### Example
96+
```sql
97+
01 01 00 00 00 00 00 00 00 00 F0 3F 00 00 00 00 00 00 00 40
98+
└─┘ └─┘ └───────────────┘ └───────────────┘
99+
│ │ │ │
100+
Little Endian Point type x=1.0 y=2.0
101+
```
102+
103+
# GeoPoint Type
104+
1. Storing WKT Format Using String or Varchar
105+
106+
```sql
107+
CREATE TABLE simple_point ( id INT, wkt STRING) ;
108+
INSERT INTO simple_point VALUES(1,'POINT(121.4737 31.2304)');
109+
110+
create table simple_point(id int, wkt VARCHAR(255);
111+
INSERT INTO simple_point VALUES(1,'POINT(121.4737 31.2304)');
112+
```
113+
114+
115+
Querying WKT Format
116+
117+
```sql
118+
select st_astext(st_geometryfromtext(wkt)) from simple_point;
119+
+-------------------------------------+
120+
| st_astext(st_geometryfromtext(wkt)) |
121+
+-------------------------------------+
122+
| POINT (121.4737 31.2304) |
123+
+-------------------------------------+
124+
```
125+
126+
2. Storing Using WKB Format
127+
128+
```sql
129+
CREATE TABLE simple_point ( id INT, wkb STRING) ;
130+
INSERT INTO simple_point VALUES(1,'\x01010000005f07ce19515e5e4097ff907efb3a3f40');
131+
132+
create table simple_point(id int, wkb VARCHAR(255);
133+
INSERT INTO simple_point VALUES(1,'\x01010000005f07ce19515e5e4097ff907efb3a3f40');
134+
135+
```
136+
137+
Querying WKB Format
138+
139+
```sql
140+
select st_astext(st_geometryfromwkb(wkb)) from simple_point;
141+
+------------------------------------+
142+
| st_astext(st_geometryfromwkb(wkb)) |
143+
+------------------------------------+
144+
| POINT (121.4737 31.2304) |
145+
+------------------------------------+
146+
```
147+
148+
3. Storing Coordinates Using Floating-Point Numbers (x for latitude, y for longitude)
149+
150+
```sql
151+
CREATE TABLE simple_point_double (id INT,x DOUBLE,y DOUBLE)
152+
INSERT INTO simple_point_double VALUES(0,1,2);
153+
```
154+
155+
156+
Querying Floating-Point Format
157+
158+
```sql
159+
select st_astext(st_point(x,y)) from simple_point_double;
160+
+--------------------------+
161+
| st_astext(st_point(x,y)) |
162+
+--------------------------+
163+
| POINT (1 2) |
164+
+--------------------------+
165+
```
166+
167+
168+
# GeoLine type
169+
170+
1. Storing WKT Format Using String or Varchar
171+
172+
```sql
173+
CREATE TABLE simple_line ( id INT, wkt STRING)
174+
INSERT INTO simple_line VALUES(1,'LINESTRING(116.4074 39.9042, 121.4737 31.2304)');
175+
176+
CREATE TABLE simple_line ( id INT, wkt VARCHAR(255))
177+
INSERT INTO simple_line VALUES(1,'LINESTRING(116.4074 39.9042, 121.4737 31.2304)');
178+
```
179+
180+
181+
Querying WKT Format
182+
183+
```sql
184+
select st_astext(st_linefromtext(wkt)) from simple_line;
185+
+-------------------------------------------------+
186+
| st_astext(st_linefromtext(wkt)) |
187+
+-------------------------------------------------+
188+
| LINESTRING (116.4074 39.9042, 121.4737 31.2304) |
189+
+-------------------------------------------------+
190+
```
191+
192+
2. Storing Using WKB Format
193+
194+
```sql
195+
CREATE TABLE simple_line ( id INT, wkb STRING)
196+
INSERT INTO simple_line VALUES(1,'\x010200000002000000fc1873d7121a5d4088855ad3bcf343405f07ce19515e5e4097ff907efb3a3f40');
197+
198+
CREATE TABLE simple_line ( id INT, wkb VARCHAR(255))
199+
INSERT INTO simple_line VALUES(1,'\x010200000002000000fc1873d7121a5d4088855ad3bcf343405f07ce19515e5e4097ff907efb3a3f40');
200+
```
201+
202+
Querying WKB Format
203+
204+
```sql
205+
select st_astext(st_geometryfromwkb(wkb)) from simple_line;
206+
+-------------------------------------------------+
207+
| st_astext(st_geometryfromwkb(wkb)) |
208+
+-------------------------------------------------+
209+
| LINESTRING (116.4074 39.9042, 121.4737 31.2304) |
210+
+-------------------------------------------------+
211+
```
212+
213+
# GeoPolygon type
214+
215+
1. Storing WKT Format Using String or Varchar
216+
217+
```sql
218+
CREATE TABLE simple_polygon ( id INT, wkt STRING)
219+
INSERT INTO simple_polygon VALUES(1,'POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))');
220+
221+
CREATE TABLE simple_polygon ( id INT, wkt VARCHAR(255))
222+
INSERT INTO simple_polygon VALUES(1,'POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))');
223+
```
224+
225+
Querying WKT Format
226+
227+
```sql
228+
select st_astext(st_polygon(wkt)) from simple_polygon;
229+
+------------------------------------------+
230+
| st_astext(st_polygon(wkt)) |
231+
+------------------------------------------+
232+
| POLYGON ((10 0, 10 10, 0 10, 0 0, 10 0)) |
233+
+------------------------------------------+
234+
```
235+
236+
2. Storing Using WKB Format
237+
238+
```sql
239+
CREATE TABLE simple_polygon_wkb ( id INT, wkb STRING)
240+
INSERT INTO simple_polygon_wkb VALUES(1,'\x010300000001000000050000000000000000002440000000000000000000000000000024400000000000002440000000000000000000000000000024400000000000000000000000000000000000000000000024400000000000000000');
241+
242+
CREATE TABLE simple_polygon_wkb ( id INT, wkb VARCHAR(255))
243+
INSERT INTO simple_polygon_wkb VALUES(1,'\x010300000001000000050000000000000000002440000000000000000000000000000024400000000000002440000000000000000000000000000024400000000000000000000000000000000000000000000024400000000000000000');
244+
```
245+
Querying WKB Format
246+
247+
```sql
248+
select st_astext(st_geometryfromwkb(wkb)) from simple_polygon_wkb;
249+
+------------------------------------------+
250+
| st_astext(st_geometryfromwkb(wkb)) |
251+
+------------------------------------------+
252+
| POLYGON ((10 0, 10 10, 0 10, 0 0, 10 0)) |
253+
+------------------------------------------+
254+
```
255+
256+
# GeoMultiPolygon type
257+
258+
259+
1. Storing WKT Format Using String or Varchar
260+
261+
```sql
262+
CREATE TABLE simple_multipolygon ( id INT, wkt STRING)
263+
INSERT INTO simple_multipolygon VALUES(1,'MULTIPOLYGON(((0 0, 0 10, 10 10, 10 0, 0 0)),((20 20, 20 30, 30 30, 30 20, 20 20)))');
264+
265+
CREATE TABLE simple_multipolygon ( id INT, wkt VARCHAR(255))
266+
INSERT INTO simple_multipolygon VALUES(1,'MULTIPOLYGON(((0 0, 0 10, 10 10, 10 0, 0 0)), -- 第一个多边形((20 20, 20 30, 30 30, 30 20, 20 20)) -- 第二个多边形)');
267+
268+
```
269+
270+
271+
Querying WKT Format
272+
273+
```sql
274+
select st_astext(st_geometryfromtext(wkt)) from simple_multipolygon;
275+
+----------------------------------------------------------------------------------------+
276+
| st_astext(st_geometryfromtext(wkt)) |
277+
+----------------------------------------------------------------------------------------+
278+
| MULTIPOLYGON (((10 0, 10 10, 0 10, 0 0, 10 0)), ((30 20, 30 30, 20 30, 20 20, 30 20))) |
279+
+----------------------------------------------------------------------------------------+
280+
```
281+
Note: WKB format conversion for GeoMultiPolygon is not yet supported
282+
283+
# GeoCircle type
284+
285+
Storage Method (Storing Center Coordinates and Radius Using Floating-Point Numbers)
286+
Since circles do not conform to WKB and WKT formats, three floating-point numbers are needed to store the center coordinates (x, y) and radius (R) respectively:
287+
288+
```sql
289+
create table simple_circle(id int, X double,Y double, R double)
290+
INSERT INTO simple_circle VALUES(1,1.0,1.0,2);
291+
```
292+
Query circle
293+
294+
```sql
295+
select st_astext(st_circle(X,Y,R)) from simple_circle;
296+
+-----------------------------+
297+
| st_astext(st_circle(X,Y,R)) |
298+
+-----------------------------+
299+
| CIRCLE ((1 1), 2) |
300+
+-----------------------------+
301+
```
302+
303+
# Constraints
304+
## Index
305+
Since Doris does not directly implement the Geo type but stores and converts it using WKT and WKB, query acceleration for GEO type queries through indexing technology is not possible.
306+
307+
Only 13-digit precision can be guaranteed when converting WKT to GEO output:
308+
309+
```sql
310+
mysql> SELECT ST_AsText(ST_GeometryFromText("POINT (1 3.1415926535897223)"));
311+
+----------------------------------------------------------------+
312+
| ST_AsText(ST_GeometryFromText("POINT (1 3.1415926535897223)")) |
313+
+----------------------------------------------------------------+
314+
| POINT (1 3.14159265358972) |
315+
+----------------------------------------------------------------+
316+
```
317+
318+
319+
Only 13-digit precision can be guaranteed when converting binary to GEO output:
320+
321+
```sql
322+
mysql> select ST_AsText(ST_GeomFromWKB(ST_AsBinary(ST_Point(24.7,3.141592653589793))));
323+
+--------------------------------------------------------------------------+
324+
| ST_AsText(ST_GeomFromWKB(ST_AsBinary(ST_Point(24.7,3.141592653589793)))) |
325+
+--------------------------------------------------------------------------+
326+
| POINT (24.7 3.1415926535898) |
327+
+--------------------------------------------------------------------------+
328+
```
329+
330+
331+
332+
# Common Uses and Methods of Geo Types in Doris
333+
## Calculating Distance Between Two Points on Earth
334+
335+
The distance of Beijing to Shanghai
336+
Coordinates of Beijing (116.4074, 39.9042) and Shanghai (121.4737, 31.2304):
337+
338+
```sql
339+
select ST_DISTANCE_SPHERE(116.4074, 39.9042, 121.4737, 31.2304);
340+
+----------------------------------------------------------+
341+
| ST_DISTANCE_SPHERE(116.4074, 39.9042, 121.4737, 31.2304) |
342+
+----------------------------------------------------------+
343+
| 1067311.8461903075 |
344+
+----------------------------------------------------------+
345+
```
346+
347+
348+
![alt text](/images/BeijingToShanghai.png)
349+
350+
351+
Distance of Beijing to New York
352+
Coordinates of Beijing (116.4074, 39.9042) and New York (-74.0060, 40.7128):
353+
354+
```sql
355+
select ST_DISTANCE_SPHERE(116.4074, 39.9042, -74.0060, 40.7128);
356+
+----------------------------------------------------------+
357+
| ST_DISTANCE_SPHERE(116.4074, 39.9042, -74.0060, 40.7128) |
358+
+----------------------------------------------------------+
359+
| 10989107.361809434 |
360+
+----------------------------------------------------------+
361+
```
362+
363+
![alt text](/images/BeijingToNewyork.png)
364+
365+
366+
## Calculating Area of a Region on the Earth's Sphere
367+
368+
Estimating New York's Area
369+
Outline the New York area roughly with a polygon and calculate the area:
370+
371+
```sql
372+
SELECT ST_AREA_SQUARE_KM(
373+
ST_GeomFromText('POLYGON((
374+
-74.2591 40.9155,
375+
-73.8726 40.9147,
376+
-73.7004 40.7506,
377+
-73.9442 40.5840,
378+
-74.0817 40.6437,
379+
-74.1502 40.6110,
380+
-74.0984 40.6550,
381+
-74.0431 40.7290,
382+
-74.0136 40.7903,
383+
-73.9352 40.8448,
384+
-74.2591 40.9155
385+
))'));
386+
387+
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
388+
| ST_AREA_SQUARE_KM( ST_GeomFromText('POLYGON((-74.2591 40.9155, -73.8726 40.9147, -73.7004 40.7506, -73.9442 40.5840, -74.0817 40.6437,-74.1502 40.6110,-74.0984 40.6550,-74.0431 40.7290,-74.0136 40.7903, -73.9352 40.8448, -74.2591 40.9155))' )) |
389+
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
390+
| 744.3806189617659 |
391+
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
392+
```
393+
394+
![alt text](/images/Newyork.png)
395+

0 commit comments

Comments
 (0)