Skip to content

Commit 824328d

Browse files
authored
Merge pull request #71 from VIDA-NYU/feat/line_polygon_geometries
feat(loader): Add Support for geometry_column in UrbanMapper Loaders
2 parents a840447 + fbbeb90 commit 824328d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+1004
-327
lines changed

docs/api/filters.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
!!! tip "What is the Filter module?"
44
The `filter` module is responsible for filtering geospatial datasets based on specific criteria or conditions out
5-
of your `urban layer`.
5+
of your `urban layer` and based on information of latitude-longitude data columns or geometry specified in [WKT format](https://libgeos.org/specifications/wkt/).
66

77
Meanwhile, we recommend to look through the [`Example`'s Filter](../copy_of_examples/1-Per-Module/4-filter/) for a more hands-on introduction about
88
the Filter module and its usage.

docs/api/imputers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Imputers
22

33
!!! tip "What is the Imputer module?"
4-
The `imputer` module is responsible for handling missing data in geospatial datasets.
4+
The `imputer` module is responsible for handling missing data in geospatial datasets, dealing with missing information of latitude-longitude data columns or geometry specified in [WKT format](https://libgeos.org/specifications/wkt/).
55

66
Meanwhile, we recommend to look through the [`Example`'s Imputer](../copy_of_examples/1-Per-Module/3-imputer/) for a more hands-on introduction about
77
the Imputer module and its usage.

docs/api/loaders.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@
44
The `loader` module is responsible for loading geospatial data into `UrbanMapper`.
55
It provides a unified interface for loading various data formats, including `shapefiles`, `parquet`, and `CSV` files
66
with geospatial information.
7-
87
`UrbanMapper` steps support using multiple datasets. The user can create multiple loader instances, one for each dataset,
98
combine them in a single dictionary with suitable keys, and use it in your pipeline.
9+
Besides, geolocation can be loaded from latitude-longitude data columns or geometry specified in [WKT format](https://libgeos.org/specifications/wkt/).
1010

1111
Meanwhile, we recommend to look through the [`Example`'s Loader](../copy_of_examples/1-Per-Module/1-loader/) for a more hands-on introduction about
1212
the Loader module and its usage.

docs/api/urban_layers.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
The `urban_layer` module is responsible for the spatial canvases on which your datasets are displayed. These layers
55
provide structure for urban insights, such as mapping taxi trips to busy `intersections` or analysing `neighbourhood`
66
demographics.
7+
A dataset is projected over an `urban_layer` based on latitude-longitude data columns or geometry specified in [WKT format](https://libgeos.org/specifications/wkt/).
78

89
Meanwhile, we recommend to look through the [`Example`'s Urban layer](../copy_of_examples/1-Per-Module/2-urban_layer/) for a more hands-on introduction about
910
the Urban layer module and its usage.

docs/copy_of_examples/1-Per-Module/1-loader.ipynb

Lines changed: 37 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343
"source": [
4444
"## Loading CSV Data\n",
4545
"\n",
46-
"First up, let’s load a CSV file with PLUTO data. We’ll tell UrbanMapper where to find the longitude and latitude columns so it knows what’s what and can make sure those colums are well formatted prior any analysis.\n",
46+
"First up, let’s load a CSV file with PLUTO data. We’ll tell UrbanMapper where to find the longitude-latitude or geometry columns so it knows what’s what and can make sure those colums are well formatted prior any analysis.\n",
4747
"\n",
4848
"Note that below we employ a given csv, but you can put your own path, try it out!"
4949
]
@@ -59,6 +59,7 @@
5959
" .loader # From the loader module\n",
6060
" .from_file(\"<path_to>/pluto.csv\") # To update with your own path\n",
6161
" .with_columns(longitude_column=\"longitude\", latitude_column=\"latitude\") # Inform your long and lat columns\n",
62+
"# .with_columns(geometry_column=<geometry_column_name>\") # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns.\n",
6263
")\n",
6364
"\n",
6465
"gdf = csv_loader.load() # Load the data and create a geodataframe's instance\n",
@@ -87,6 +88,7 @@
8788
" loader. # From the loader module\n",
8889
" from_file(\"<path_to>/taxisvis5M.parquet\") # To update with your own path\n",
8990
" .with_columns(\"pickup_longitude\", \"pickup_latitude\") # Inform your long and lat columns\n",
91+
"# .with_columns(geometry_column=<geometry_column_name>\") # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns.\n",
9092
")\n",
9193
"\n",
9294
"gdf = parquet_loader.load() # Load the data and create a geodataframe's instance\n",
@@ -163,6 +165,7 @@
163165
" .loader # From the loader module\n",
164166
" .from_dataframe(df) # To update with your dataframe\n",
165167
" .with_columns(longitude_column=\"longitude\", latitude_column=\"latitude\") # Inform your long and lat columns\n",
168+
"# .with_columns(geometry_column=<geometry_column_name>\") # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns.\n",
166169
")\n",
167170
"\n",
168171
"gdf = df_loader.load() # Load the data and create a geodataframe's instance\n",
@@ -187,7 +190,30 @@
187190
"outputs": [],
188191
"source": [
189192
"# Load a full dataset directly from Hugging Face\n",
190-
"loader = mapper.loader.from_huggingface(\"oscur/pluto\", number_of_rows=100).with_columns(longitude_column=\"longitude\", latitude_column=\"latitude\")\n",
193+
"loader = (\n",
194+
" mapper\n",
195+
" .loader\n",
196+
" .from_huggingface(\"oscur/pluto\", number_of_rows=100)\n",
197+
" .with_columns(longitude_column=\"longitude\", latitude_column=\"latitude\")\n",
198+
"# .with_columns(geometry_column=<geometry_column_name>\") # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns. \n",
199+
") \n",
200+
"gdf = loader.load()\n",
201+
"gdf # Next steps: analyze or visualize the data"
202+
]
203+
},
204+
{
205+
"cell_type": "code",
206+
"execution_count": null,
207+
"metadata": {},
208+
"outputs": [],
209+
"source": [
210+
"# Load a full dataset directly from Hugging Face\n",
211+
"loader = (\n",
212+
" mapper\n",
213+
" .loader\n",
214+
" .from_huggingface(\"oscur/NYC_raised_crosswalk\", number_of_rows=100)\n",
215+
" .with_columns(geometry_column=\"WKT Geometry\") # Inform your geometry column instead of longitude and latitude columns. \n",
216+
") \n",
191217
"gdf = loader.load()\n",
192218
"gdf # Next steps: analyze or visualize the data"
193219
]
@@ -224,12 +250,20 @@
224250
"outputs": [],
225251
"source": [
226252
"# Load datasets directly from Hugging Face\n",
227-
"pluto_data = mapper.loader.from_huggingface(\"oscur/pluto\", number_of_rows=100).with_columns(longitude_column=\"longitude\", latitude_column=\"latitude\").load()\n",
253+
"pluto_data = (\n",
254+
" mapper\n",
255+
" .loader\n",
256+
" .from_huggingface(\"oscur/pluto\", number_of_rows=100)\n",
257+
" .with_columns(longitude_column=\"longitude\", latitude_column=\"latitude\")\n",
258+
"# .with_columns(geometry_column=<geometry_column_name>\") # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns. \n",
259+
" .load()\n",
260+
")\n",
228261
"taxi_data = (\n",
229262
" mapper\n",
230263
" .loader\n",
231264
" .from_huggingface(\"oscur/taxisvis1M\", number_of_rows=100)\n",
232265
" .with_columns(longitude_column=\"pickup_longitude\", latitude_column=\"pickup_latitude\")\n",
266+
"# .with_columns(geometry_column=<geometry_column_name>\") # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns. \n",
233267
" .with_map({\"pickup_longitude\": \"longitude\", \"pickup_latitude\": \"latitude\"}) ## Routines like layer.map_nearest_layer needs datasets with the same longitude_column and latitude_column\n",
234268
" .load()\n",
235269
")\n",

docs/copy_of_examples/1-Per-Module/3-imputer.ipynb

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,12 @@
4848
"data = (\n",
4949
" mapper\n",
5050
" .loader\n",
51-
" .from_huggingface(\"oscur/pluto\", number_of_rows=20000, streaming=True).with_columns(\"longitude\", \"latitude\").load()\n",
52-
" # From the loader module, from the following file within the OSCUR HuggingFace datasets hub and with the `longitude` and `latitude`\n",
53-
")"
51+
" .from_huggingface(\"oscur/pluto\", number_of_rows=20000, streaming=True)\n",
52+
" .with_columns(\"longitude\", \"latitude\")\n",
53+
"# .with_columns(geometry_column=<geometry_column_name>\") # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns.\n",
54+
" .load()\n",
55+
" # From the loader module, from the following file within the OSCUR HuggingFace datasets hub and with the `longitude` and `latitude` or only with `geometry`\n",
56+
") "
5457
]
5558
},
5659
{
@@ -88,6 +91,7 @@
8891
" .imputer # From the imputer module\n",
8992
" .with_type(\"SimpleGeoImputer\") # With the type SimpleGeoImputer\n",
9093
" .on_columns(longitude_column=\"longitude\", latitude_column=\"latitude\") # On the columns longitude and latitude\n",
94+
"# .on_columns(geometry_column=<geometry_column_name>\") # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns. \n",
9195
" .transform(data, layer) # All imputers require access to the urban layer in case they need to extract information from it.\n",
9296
")\n",
9397
"\n",
@@ -136,8 +140,11 @@
136140
"data1 = (\n",
137141
" mapper\n",
138142
" .loader\n",
139-
" .from_huggingface(\"oscur/pluto\", number_of_rows=1000, streaming=True).with_columns(\"longitude\", \"latitude\").load()\n",
140-
" # From the loader module, from the following file and with the `longitude` and `latitude`\n",
143+
" .from_huggingface(\"oscur/pluto\", number_of_rows=1000, streaming=True)\n",
144+
" .with_columns(\"longitude\", \"latitude\")\n",
145+
"# .with_columns(geometry_column=<geometry_column_name>\") # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns.\n",
146+
" .load()\n",
147+
" # From the loader module, from the following file and with the `longitude` and `latitude` or only `geometry`\n",
141148
")\n",
142149
"\n",
143150
"# Load Parquet data\n",
@@ -146,6 +153,7 @@
146153
" .loader\n",
147154
" .from_huggingface(\"oscur/taxisvis1M\", number_of_rows=1000, streaming=True) # To update with your own path\n",
148155
" .with_columns(\"pickup_longitude\", \"pickup_latitude\").load() # Inform your long and lat columns\n",
156+
"# .with_columns(geometry_column=<geometry_column_name>\") # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns. \n",
149157
")\n",
150158
"\n",
151159
"data = {\n",
@@ -161,6 +169,7 @@
161169
" .imputer # From the imputer module\n",
162170
" .with_type(\"SimpleGeoImputer\") # With the type SimpleGeoImputer\n",
163171
" .on_columns(longitude_column=\"longitude\", latitude_column=\"latitude\") # On the columns longitude and latitude\n",
172+
"# .on_columns(geometry_column=<geometry_column_name>\") # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns. \n",
164173
" .with_data(data_id=\"pluto_data\") # On a specific data from the dictionary\n",
165174
" .transform(data, layer) # All imputers require access to the urban layer in case they need to extract information from it.\n",
166175
")"

docs/copy_of_examples/1-Per-Module/4-filter.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@
5050
" mapper\n",
5151
" .loader\n",
5252
" .from_huggingface(\"oscur/pluto\", number_of_rows=5000, streaming=True).with_columns(\"longitude\", \"latitude\").load()\n",
53-
" # From the loader module, from the following file within the HuggingFace OSCUR datasets hub and with the `longitude` and `latitude`\n",
53+
" # From the loader module, from the following file within the HuggingFace OSCUR datasets hub and with the `longitude` and `latitude` or only `geometry`\n",
5454
")\n",
5555
"\n",
5656
"# Create urban layer\n",
@@ -127,15 +127,15 @@
127127
" mapper\n",
128128
" .loader\n",
129129
" .from_huggingface(\"oscur/pluto\", number_of_rows=1000, streaming=True).with_columns(\"longitude\", \"latitude\").load()\n",
130-
" # From the loader module, from the following file and with the `longitude` and `latitude`\n",
130+
" # From the loader module, from the following file and with the `longitude` and `latitude` or only `geometry`\n",
131131
")\n",
132132
"\n",
133133
"# Load Parquet data\n",
134134
"data2 = (\n",
135135
" mapper\n",
136136
" .loader\n",
137137
" .from_huggingface(\"oscur/taxisvis1M\", number_of_rows=1000, streaming=True) # To update with your own path\n",
138-
" .with_columns(\"pickup_longitude\", \"pickup_latitude\").load() # Inform your long and lat columns\n",
138+
" .with_columns(\"pickup_longitude\", \"pickup_latitude\").load() # Inform your long and lat columns or only geometry\n",
139139
")\n",
140140
"\n",
141141
"data = {\n",

docs/copy_of_examples/1-Per-Module/5-enricher.ipynb

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
" mapper\n",
5555
" .loader\n",
5656
" .from_huggingface(\"oscur/pluto\", number_of_rows=5000, streaming=True).with_columns(\"longitude\", \"latitude\").load()\n",
57-
" # From the loader module, from the following file within the HuggingFace OSCUR datasets hub and with the `longitude` and `latitude`\n",
57+
" # From the loader module, from the following file within the HuggingFace OSCUR datasets hub and with the `longitude` and `latitude` or only `geometry`\n",
5858
")\n",
5959
"\n",
6060
"# Create urban layer\n",
@@ -98,8 +98,8 @@
9898
"# so that we can take into account when enriching.\n",
9999
"_, mapped_data = layer.map_nearest_layer(\n",
100100
" data,\n",
101-
" longitude_column=\"longitude\",\n",
102-
" latitude_column=\"latitude\",\n",
101+
" longitude_column=\"longitude\", latitude_column=\"latitude\",\n",
102+
"# geometry_column=<geometry_column_name>\", # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns.\n",
103103
" output_column=\"nearest_intersection\", # Will create this column in the data, so that we can re-use that throughout the enriching process below.\n",
104104
")\n",
105105
"\n",
@@ -182,8 +182,11 @@
182182
"data1 = (\n",
183183
" mapper\n",
184184
" .loader\n",
185-
" .from_huggingface(\"oscur/pluto\", number_of_rows=1000, streaming=True).with_columns(\"longitude\", \"latitude\").load()\n",
186-
" # From the loader module, from the following file and with the `longitude` and `latitude`\n",
185+
" .from_huggingface(\"oscur/pluto\", number_of_rows=1000, streaming=True)\n",
186+
" .with_columns(\"longitude\", \"latitude\")\n",
187+
"# .with_columns(geometry_column=<geometry_column_name>\") # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns. \n",
188+
" .load()\n",
189+
" # From the loader module, from the following file and with the `longitude` and `latitude` or only `geometry`\n",
187190
")\n",
188191
"\n",
189192
"# Load Parquet data\n",
@@ -192,6 +195,7 @@
192195
" .loader\n",
193196
" .from_huggingface(\"oscur/taxisvis1M\", number_of_rows=1000, streaming=True) # To update with your own path\n",
194197
" .with_columns(\"pickup_longitude\", \"pickup_latitude\") # Inform your long and lat columns\n",
198+
"# .with_columns(geometry_column=<geometry_column_name>\") # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns. \n",
195199
" .with_map({\"pickup_longitude\": \"longitude\", \"pickup_latitude\": \"latitude\"}) ## Routines like layer.map_nearest_layer needs datasets with the same longitude_column and latitude_column\n",
196200
" .load() \n",
197201
")\n",
@@ -215,8 +219,8 @@
215219
"# so that we can take into account when enriching.\n",
216220
"_, mapped_data = layer.map_nearest_layer(\n",
217221
" data,\n",
218-
" longitude_column=\"longitude\",\n",
219-
" latitude_column=\"latitude\",\n",
222+
" longitude_column=\"longitude\", latitude_column=\"latitude\",\n",
223+
"# geometry_column=<geometry_column_name>\", # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns.\n",
220224
" output_column=\"nearest_intersection\", # Will create this column in the data, so that we can re-use that throughout the enriching process below.\n",
221225
")\n",
222226
"\n",

0 commit comments

Comments
 (0)