Skip to content

Commit aa4f80a

Browse files
authored
docs: add sedonadb programming guide (#64)
1 parent b380e30 commit aa4f80a

File tree

2 files changed

+388
-0
lines changed

2 files changed

+388
-0
lines changed

docs/programming-guide.ipynb

Lines changed: 387 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,387 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "1932983e-1cd2-41d0-a5eb-0537b3ac3feb",
6+
"metadata": {},
7+
"source": [
8+
"# SedonaDB Guide\n",
9+
"\n",
10+
"This page explains how to process vector data with SedonaDB.\n",
11+
"\n",
12+
"You will learn how to create SedonaDB DataFrames, run spatial queries, and perform I/O operations with various types of files.\n",
13+
"\n",
14+
"Let’s start by establishing a SedonaDB connection.\n",
15+
"\n",
16+
"## Establish SedonaDB connection\n",
17+
"\n",
18+
"Here’s how to create the SedonaDB connection:"
19+
]
20+
},
21+
{
22+
"cell_type": "code",
23+
"execution_count": 2,
24+
"id": "53c3b7a8-c42a-407a-a454-6ee1e943fbcc",
25+
"metadata": {},
26+
"outputs": [],
27+
"source": [
28+
"import sedonadb\n",
29+
"\n",
30+
"sd = sedonadb.connect()"
31+
]
32+
},
33+
{
34+
"cell_type": "markdown",
35+
"id": "7aeaa60f-2325-418c-8e72-4344bd4a75fe",
36+
"metadata": {},
37+
"source": [
38+
"Now let’s see how to create SedonaDB DataFrames.\n",
39+
"\n",
40+
"## Create SedonaDB DataFrame\n",
41+
"\n",
42+
"**Manually creating SedonaDB DataFrame**\n",
43+
"\n",
44+
"Here’s how to manually create a SedonaDB DataFrame:"
45+
]
46+
},
47+
{
48+
"cell_type": "code",
49+
"execution_count": 3,
50+
"id": "b3377767-d747-407c-92c0-8786c1998131",
51+
"metadata": {},
52+
"outputs": [],
53+
"source": [
54+
"df = sd.sql(\"\"\"\n",
55+
"SELECT * FROM (VALUES\n",
56+
" ('one', ST_GeomFromWkt('POINT(1 2)')),\n",
57+
" ('two', ST_GeomFromWkt('POLYGON((-74.0 40.7, -74.0 40.8, -73.9 40.8, -73.9 40.7, -74.0 40.7))')),\n",
58+
" ('three', ST_GeomFromWkt('LINESTRING(-74.0060 40.7128, -73.9352 40.7306, -73.8561 40.8484)')))\n",
59+
"AS t(val, point)\"\"\")"
60+
]
61+
},
62+
{
63+
"cell_type": "markdown",
64+
"id": "0f9e1319-2e7a-4d98-9df0-47a9a73cfff3",
65+
"metadata": {},
66+
"source": [
67+
"Check the type of the DataFrame."
68+
]
69+
},
70+
{
71+
"cell_type": "code",
72+
"execution_count": 4,
73+
"id": "e8be30ab-4818-4db8-bae2-83e973ad1b77",
74+
"metadata": {},
75+
"outputs": [
76+
{
77+
"data": {
78+
"text/plain": [
79+
"sedonadb.dataframe.DataFrame"
80+
]
81+
},
82+
"execution_count": 4,
83+
"metadata": {},
84+
"output_type": "execute_result"
85+
}
86+
],
87+
"source": [
88+
"type(df)"
89+
]
90+
},
91+
{
92+
"cell_type": "markdown",
93+
"id": "8225ed1f-45a4-4915-a582-8ae191ec53ed",
94+
"metadata": {},
95+
"source": [
96+
"**Create SedonaDB DataFrame from files in S3**\n",
97+
"\n",
98+
"For most production applications, you will create SedonaDB DataFrames by reading data from a file. Let’s see how to read GeoParquet files in AWS S3 into a SedonaDB DataFrame."
99+
]
100+
},
101+
{
102+
"cell_type": "code",
103+
"execution_count": 5,
104+
"id": "151df287-4b2d-433e-9769-c3378df03b1b",
105+
"metadata": {},
106+
"outputs": [],
107+
"source": [
108+
"sd.read_parquet(\n",
109+
" \"s3://overturemaps-us-west-2/release/2025-08-20.0/theme=divisions/type=division_area/\",\n",
110+
" options={\"aws.skip_signature\": True, \"aws.region\": \"us-west-2\"},\n",
111+
").to_view(\"division_area\")"
112+
]
113+
},
114+
{
115+
"cell_type": "markdown",
116+
"id": "858fcc66-816d-4c71-8875-82b74169eccd",
117+
"metadata": {},
118+
"source": [
119+
"Let’s now run some spatial queries.\n",
120+
"\n",
121+
"**Read from GeoPandas DataFrame**\n",
122+
"\n",
123+
"This section shows how to convert a GeoPandas DataFrame into a SedonaDB DataFrame.\n",
124+
"\n",
125+
"Start by reading a FlatGeoBuf file into a GeoPandas DataFrame:"
126+
]
127+
},
128+
{
129+
"cell_type": "code",
130+
"execution_count": 12,
131+
"id": "b81549f2-0f58-49e4-9011-8de6578c2b0e",
132+
"metadata": {},
133+
"outputs": [],
134+
"source": [
135+
"import geopandas as gpd\n",
136+
"\n",
137+
"path = \"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities.fgb\"\n",
138+
"gdf = gpd.read_file(path)"
139+
]
140+
},
141+
{
142+
"cell_type": "markdown",
143+
"id": "2265f94b-ccb3-4634-8c52-a8799c68c76a",
144+
"metadata": {},
145+
"source": [
146+
"Now convert the GeoPandas DataFrame to a SedonaDB DataFrame and view three rows of content:"
147+
]
148+
},
149+
{
150+
"cell_type": "code",
151+
"execution_count": 7,
152+
"id": "0e4819db-bf58-42d7-8b5b-f272d0f19266",
153+
"metadata": {},
154+
"outputs": [
155+
{
156+
"name": "stdout",
157+
"output_type": "stream",
158+
"text": [
159+
"┌──────────────┬──────────────────────────────┐\n",
160+
"│ name ┆ geometry │\n",
161+
"│ utf8 ┆ geometry │\n",
162+
"╞══════════════╪══════════════════════════════╡\n",
163+
"│ Vatican City ┆ POINT(12.4533865 41.9032822) │\n",
164+
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
165+
"│ San Marino ┆ POINT(12.4417702 43.9360958) │\n",
166+
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
167+
"│ Vaduz ┆ POINT(9.5166695 47.1337238) │\n",
168+
"└──────────────┴──────────────────────────────┘\n"
169+
]
170+
}
171+
],
172+
"source": [
173+
"df = sd.create_data_frame(gdf)\n",
174+
"df.show(3)"
175+
]
176+
},
177+
{
178+
"cell_type": "markdown",
179+
"id": "6890bcc3-f3bd-4c47-bf86-2607bed5e480",
180+
"metadata": {},
181+
"source": [
182+
"## Spatial queries\n",
183+
"\n",
184+
"Let’s see how to run spatial operations like filtering, joins, and clustering algorithms.\n",
185+
"\n",
186+
"***Spatial filtering***\n",
187+
"\n",
188+
"Let’s run a spatial filtering operation to fetch all the objects in the following polygon:"
189+
]
190+
},
191+
{
192+
"cell_type": "code",
193+
"execution_count": 8,
194+
"id": "8c8a4b48-8c4e-412e-900f-8c0f6f4ccc1d",
195+
"metadata": {},
196+
"outputs": [
197+
{
198+
"name": "stdout",
199+
"output_type": "stream",
200+
"text": [
201+
"┌──────────┬──────────┬────────────────────────────────────────────────────────────────────────────┐\n",
202+
"│ country ┆ region ┆ geometry │\n",
203+
"│ utf8view ┆ utf8view ┆ geometry │\n",
204+
"╞══════════╪══════════╪════════════════════════════════════════════════════════════════════════════╡\n",
205+
"│ CA ┆ CA-NS ┆ POLYGON((-66.0528452 43.4531336,-66.0883401 43.3978188,-65.9647654 43.361… │\n",
206+
"├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
207+
"│ CA ┆ CA-NS ┆ POLYGON((-66.0222822 43.5166842,-66.0252286 43.5100071,-66.0528452 43.453… │\n",
208+
"├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
209+
"│ CA ┆ CA-NS ┆ POLYGON((-65.7451389 43.5336263,-65.7450818 43.5347004,-65.7449545 43.535… │\n",
210+
"└──────────┴──────────┴────────────────────────────────────────────────────────────────────────────┘\n"
211+
]
212+
}
213+
],
214+
"source": [
215+
"nova_scotia_bbox_wkt = (\n",
216+
" \"POLYGON((-66.5 43.4, -66.5 47.1, -59.8 47.1, -59.8 43.4, -66.5 43.4))\"\n",
217+
")\n",
218+
"\n",
219+
"ns = sd.sql(f\"\"\"\n",
220+
"SELECT country, region, geometry\n",
221+
"FROM division_area\n",
222+
"WHERE ST_Intersects(geometry, ST_SetSRID(ST_GeomFromText('{nova_scotia_bbox_wkt}'), 4326))\n",
223+
"\"\"\")\n",
224+
"\n",
225+
"ns.show(3)"
226+
]
227+
},
228+
{
229+
"cell_type": "markdown",
230+
"id": "32076e01-d807-40ed-8457-9d8c4244e89f",
231+
"metadata": {},
232+
"source": [
233+
"You can see it only includes the divisions in the Nova Scotia area. Skip to the visualization section to see how this data can be graphed on a map.\n",
234+
"\n",
235+
"***K-nearest neighbors (KNN) joins***\n",
236+
"\n",
237+
"Create `restaurants` and `customers` tables so we can demonstrate the KNN join functionality."
238+
]
239+
},
240+
{
241+
"cell_type": "code",
242+
"execution_count": 9,
243+
"id": "deaa36db-2fee-4ba2-ab79-1dc756cb1655",
244+
"metadata": {},
245+
"outputs": [],
246+
"source": [
247+
"df = sd.sql(\"\"\"\n",
248+
"SELECT name, ST_Point(lng, lat) AS location\n",
249+
"FROM (VALUES \n",
250+
" (101, -74.0, 40.7, 'Pizza Palace'),\n",
251+
" (102, -73.99, 40.69, 'Burger Barn'),\n",
252+
" (103, -74.02, 40.72, 'Taco Town'),\n",
253+
" (104, -73.98, 40.75, 'Sushi Spot'),\n",
254+
" (105, -74.05, 40.68, 'Deli Direct')\n",
255+
") AS t(id, lng, lat, name)\n",
256+
"\"\"\")\n",
257+
"sd.sql(\"drop view if exists restaurants\")\n",
258+
"df.to_view(\"restaurants\")\n",
259+
"\n",
260+
"df = sd.sql(\"\"\"\n",
261+
"SELECT name, ST_Point(lng, lat) AS location\n",
262+
"FROM (VALUES \n",
263+
" (1, -74.0, 40.7, 'Alice'),\n",
264+
" (2, -73.9, 40.8, 'Bob'),\n",
265+
" (3, -74.1, 40.6, 'Carol')\n",
266+
") AS t(id, lng, lat, name)\n",
267+
"\"\"\")\n",
268+
"sd.sql(\"drop view if exists customers\")\n",
269+
"df.to_view(\"customers\")"
270+
]
271+
},
272+
{
273+
"cell_type": "code",
274+
"execution_count": 10,
275+
"id": "e3bc4976-4245-432f-b265-7f6aa13f35b9",
276+
"metadata": {},
277+
"outputs": [
278+
{
279+
"name": "stdout",
280+
"output_type": "stream",
281+
"text": [
282+
"┌───────┬───────────────────┐\n",
283+
"│ name ┆ location │\n",
284+
"│ utf8 ┆ geometry │\n",
285+
"╞═══════╪═══════════════════╡\n",
286+
"│ Alice ┆ POINT(-74 40.7) │\n",
287+
"├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
288+
"│ Bob ┆ POINT(-73.9 40.8) │\n",
289+
"├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
290+
"│ Carol ┆ POINT(-74.1 40.6) │\n",
291+
"└───────┴───────────────────┘\n"
292+
]
293+
}
294+
],
295+
"source": [
296+
"df.show()"
297+
]
298+
},
299+
{
300+
"cell_type": "markdown",
301+
"id": "9df227d6-0972-457a-87e3-5a89802c460f",
302+
"metadata": {},
303+
"source": [
304+
"Perform a KNN join to identify the two restaurants that are nearest to each customer:"
305+
]
306+
},
307+
{
308+
"cell_type": "code",
309+
"execution_count": 11,
310+
"id": "05565e15-ee18-431c-8fd2-673291d8d0ee",
311+
"metadata": {},
312+
"outputs": [
313+
{
314+
"name": "stdout",
315+
"output_type": "stream",
316+
"text": [
317+
"┌──────────┬──────────────┐\n",
318+
"│ customer ┆ restaurant │\n",
319+
"│ utf8 ┆ utf8 │\n",
320+
"╞══════════╪══════════════╡\n",
321+
"│ Alice ┆ Burger Barn │\n",
322+
"├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
323+
"│ Alice ┆ Pizza Palace │\n",
324+
"├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
325+
"│ Bob ┆ Pizza Palace │\n",
326+
"├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
327+
"│ Bob ┆ Sushi Spot │\n",
328+
"├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
329+
"│ Carol ┆ Deli Direct │\n",
330+
"├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
331+
"│ Carol ┆ Pizza Palace │\n",
332+
"└──────────┴──────────────┘\n"
333+
]
334+
}
335+
],
336+
"source": [
337+
"sd.sql(\"\"\"\n",
338+
"SELECT\n",
339+
" c.name AS customer,\n",
340+
" r.name AS restaurant\n",
341+
"FROM customers c, restaurants r\n",
342+
"WHERE ST_KNN(c.location, r.location, 2, false)\n",
343+
"ORDER BY c.name, r.name;\n",
344+
"\"\"\").show()"
345+
]
346+
},
347+
{
348+
"cell_type": "markdown",
349+
"id": "2e93fe6a-b0a7-4ec0-952c-dde9edcacdc4",
350+
"metadata": {},
351+
"source": [
352+
"Notice how each customer has two rows - one for each of the two closest restaurants.\n",
353+
"\n",
354+
"## Files\n",
355+
"\n",
356+
"You can read GeoParquet files with SedonaDB, see the following example:\n",
357+
"\n",
358+
"```python\n",
359+
"df = sd.read_parquet(\"some_file.parquet\")\n",
360+
"```\n",
361+
"\n",
362+
"Once you read the file, you can easily expose it as a view and query it with spatial SQL, as we demonstrated in the example above."
363+
]
364+
}
365+
],
366+
"metadata": {
367+
"kernelspec": {
368+
"display_name": "Python 3 (ipykernel)",
369+
"language": "python",
370+
"name": "python3"
371+
},
372+
"language_info": {
373+
"codemirror_mode": {
374+
"name": "ipython",
375+
"version": 3
376+
},
377+
"file_extension": ".py",
378+
"mimetype": "text/x-python",
379+
"name": "python",
380+
"nbconvert_exporter": "python",
381+
"pygments_lexer": "ipython3",
382+
"version": "3.12.4"
383+
}
384+
},
385+
"nbformat": 4,
386+
"nbformat_minor": 5
387+
}

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ site_description: "Documentation for Apache SedonaDB"
33
nav:
44
- Home: index.md
55
- SedonaDB Guides:
6+
- SedonaDB Guide: programming-guide.ipynb
67
- CLI Quickstart: quickstart-cli.md
78
- Python Quickstart: quickstart-python.ipynb
89
- Development: development.md

0 commit comments

Comments
 (0)