-
Notifications
You must be signed in to change notification settings - Fork 6
Expand file tree
/
Copy pathspatialdataDIY.qmd
More file actions
306 lines (214 loc) · 8.84 KB
/
spatialdataDIY.qmd
File metadata and controls
306 lines (214 loc) · 8.84 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
# Do-It-Yourself {#sec-spatial-data-DIY .unnumbered}
In this session, we will practice your skills in mapping with `R` and `Python`. Create a new quarto or jupyter notebook document you can edit interactively, and let's do this!
::: {.panel-tabset group="language"}
## R
```{r}
#| warning: false
#| message: false
library(dplyr)
library(ggplot2)
library(sf)
library(viridis)
library(viridis)
library(osmdata)
```
## Python
```{python}
#| warning: true
import geopandas as gpd
import pandas as pd
import osmnx as ox
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from matplotlib import cm
from matplotlib.ticker import FuncFormatter
```
:::
## Data preparation
### Polygons
For this section, you will have to push yourself out of the comfort zone when it comes to sourcing the data. As nice as it is to be able to pull a dataset directly from the web at the stroke of a url address, most real-world cases are not that straight forward. Instead, you usually have to download a dataset manually and store it locally on your computer before you can get to work.
We are going to use data from the Consumer Data Research Centre (CDRC) about Liverpool, in particular an extract from the Census. You can download a copy of the data at:
::: callout-important
You will need a username and password to download the data. Create it for free at:
<https://data.cdrc.ac.uk/user/register>
Then [download](https://data.cdrc.ac.uk/system/files/Census_Residential_Data_Pack_2011/Census_Residential_Data_Pack_2011_E08000012.zip) the Liverpool Census'11 Residential data pack
:::
Once you have the `.zip` file on your computer, right-click and "Extract all". The resulting folder will contain all you need. Create a folder called `Liverpool` in `data` folder you created in the first Lab.
::: {.panel-tabset group="language"}
## R
```{r}
#| message: false
library(sf)
lsoas <- read_sf("data/Liverpool/Census_Residential_Data_Pack_2011/Local_Authority_Districts/E08000012/shapefiles/E08000012.shp")
```
## Python
```{python}
lsoas = gpd.read_file("data/Liverpool/Census_Residential_Data_Pack_2011/Local_Authority_Districts/E08000012/shapefiles/E08000012.shp")
```
:::
### Lines
For a line layer, we are going to use a different bit of `osmdata` functionality that will allow us to extract all the highways. Note the code cell below requires internet connectivity.
::: {.panel-tabset group="language"}
## R
```{r roads query}
highway <- opq("Liverpool, U.K.") %>%
add_osm_feature(key = "highway",
value = c("primary", "secondary", "tertiary")) %>%
osmdata_sf()
ggplot() +
geom_sf(data = highway$osm_lines, color = 'darkorange') + theme_minimal()
```
## Python
```{python}
tags = {"highway": True} #OSM tags
roads = ox.features_from_address("Liverpool, United Kingdom", tags = tags, dist = 2000)
roads = roads.reset_index()
# sometimes building footprints are represented by Points, let's disregard them
roads = roads[roads.geometry.geom_type == 'LineString']
fig, ax = plt.subplots(1, 1, figsize=(15, 10))
ax.set_title("Roads in Liverpool")
ax.set_axis_off() # we don't need the ticks function
# only roads within the extent of the buildings layer
roads.plot(ax=ax, color = 'darkorange', lw = 0.5) #linewidth can be also passed as lw
# Display the plot
plt.show()
```
:::
### Points
For points, we will find some POI (Points of Interest) : pubs in Liverpool, as recorded by OpenStreetMap. Note the code cell below requires internet connectivity.
::: {.panel-tabset group="language"}
## R
```{r bar query}
bars <- opq("Liverpool, U.K.") %>%
add_osm_feature(key = "amenity",
value = c("bar")) %>%
osmdata_sf()
ggplot() +
geom_sf(data = bars$osm_points) + theme_minimal()
```
## Python
```{python}
query = "Liverpool, United Kingdom"
bars = ox.features_from_place(query, tags={"amenity": ["bar"]})
bars.plot()
plt.show()
```
:::
## Tasks
### Task I: Tweak your map
With those three layers, try to complete the following tasks:
- Make a map of the Liverpool neighborhoods that includes the following characteristics:
1. Features a title
2. Does not include axes frame
3. Polygons are all in color `#525252` and 50% transparent
4. Boundary lines ("edges") have a width of 0.3 and are of color `#B9EBE3`
5. Includes a basemap different from the one used in class
::: callout-note
Not all of the requirements above are not equally hard to achieve. If you can get some but not all of them, that's also great! The point is you learn something every time you try.
:::
### Task II: Non-spatial manipulations
For this one we will combine some of the ideas we learnt in the previous block with this one.
Focus on the LSOA liverpool layer and use it to do the following:
1. Calculate the area of each neighbourhood
2. Find the five smallest areas in the table. Create a new object (e.g. smallest with them only)
3. Create a multi-layer map of Liverpool where the five smallest areas are coloured in red, and the rest appear in grey.
### Task III: Average price per district
::: {.panel-tabset group="language"}
## R
```{r}
districts <- read_sf("data/London/Polygons/districts.shp")
housesales <- read.csv("data/London/Tables/housesales.csv") # import housesales data from csv
housesales_clean = housesales %>%
filter(price < 500000) %>%
st_as_sf(coords = c(17,18)) %>%
st_set_crs(27700)
```
## Python
```{python}
districts = gpd.read_file("data/London/Polygons/districts.shp")
housesales = pd.read_csv("data/London/Tables/housesales.csv")
housesales_f = housesales[housesales['price'] < 500000]
housesales_gdf = gpd.GeoDataFrame(
housesales_f, geometry=gpd.points_from_xy(housesales_f.greastings, housesales_f.grnorthing), crs="EPSG:27700"
)
```
:::
This one is a bit more advanced, so don't despair if you can't get it on your first try. It relies on the London data you used in the Lab. Here is the questions for you to answer:
**What is the district with the highest housing prices in London?**
Answering this questions involve 3 steps:
1\. Performing a spatial join (`st_join`) between the district layer (polygons) and the households (points).
2\. Aggregating the data at district level: `group_by` & `summarise()`
3\. Figure out the district with the highest price
**Really try** not to open the **answer** below right away.
::: {.callout-tip collapse="true"}
## R Answer
Spatial overlay between points and polygons
```{r spatial join}
housesales_districts <- st_join(districts, housesales_clean)
```
Aggregate at district level
```{r aggregate}
housesales_districts_agg <- housesales_districts %>%
group_by(DIST_CODE, DIST_NAME) %>% # group at district level
summarise(count_sales = n(), # create count
mean_price = mean(price)) # average price
head(housesales_districts_agg)
```
:::
::: {.callout-tip collapse="true"}
## Python Answer
```{python}
# Spatial join
housesales_districts = housesales_gdf.sjoin(districts, how="inner", predicate='intersects')
# Aggregate at district level
housesales_districts_agg = housesales_districts.groupby(['DIST_CODE', 'DIST_NAME']).agg(
count_sales=('price', 'size'), # count number of sales
mean_price=('price', 'mean') # calculate average price
).reset_index()
# Merge the aggregated data back with the original GeoDataFrame to retain geometry
housesales_districts_agg = districts[['DIST_CODE', 'geometry']].drop_duplicates().merge(
housesales_districts_agg, on='DIST_CODE'
)
```
:::
Once that's done, create a map of the data
::: {.callout-tip collapse="true"}
## R Answer
Use `ggplot` if you are working in `R` and if you're feeling adventurous the function `scale_fill_viridis()` to make your map look especially good.
```{r mapoutput2}
# map housesales by wards
map3 <- ggplot()+
geom_sf(data = housesales_districts_agg, inherit.aes = FALSE, aes(fill = mean_price)) + # add the district level housing price
scale_fill_viridis("Price", direction = -1, labels = scales::dollar_format(prefix = "£"), option = "magma" )+ # change the legend scale to £ and the colour to magma
xlab("") +
ylab("") +
theme_minimal() # choose a nicer theme https://ggplot2.tidyverse.org/reference/ggtheme.html
map3
```
:::
::: {.callout-tip collapse="true"}
## Python Answer
```{python}
# Function to format currency with £
def currency(x, pos):
return f'£{int(x):,}'
# Create a colormap similar to Viridis magma
cmap = plt.colormaps.get_cmap('magma_r')
# Plotting the map
fig, ax = plt.subplots(figsize=(10, 10))
# Plot the district level housing price
housesales_districts_agg.plot(column='mean_price', ax=ax, legend=True,
cmap=cmap, edgecolor='black')
# Modify the legend scale to £
formatter = FuncFormatter(currency)
cbar = ax.get_figure().get_axes()[1] # Get the color bar
cbar.yaxis.set_major_formatter(formatter)
# Remove x and y axis labels
ax.set_xlabel('')
ax.set_ylabel('')
# Set theme to minimal
ax.set_axis_off()
# Show the plot
plt.show()
```
:::