You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
See [openstreetmap.org](https://www.openstreetmap.org/#map=19/53.80689/-1.55637) or search for other open access datasets for more ideas
47
47
48
-
<!-- 2. Work through the transport chapter of Geocomputation with R: https://r.geocompx.org/transport.html -->
49
48
50
-
<!-- See https://github.com/ITSLeeds/TDS/blob/master/practicals/2-software.md -->
51
-
52
-
<!-- - In terms of future work in an evolving job market? -->
53
-
54
-
<!-- - In terms of the kinds of problems you want to solve? -->
55
-
56
-
<!-- ## Sketching research methods (in groups of 2-4, 30 minutes) -->
57
-
58
-
<!-- Starting with the 1000 'desire lines' dataset of Leeds, sketch-out some research ideas that cover -->
59
-
60
-
<!-- 1) Hypotheses: generate two hypotheses that are falsifiable and 2 hypotheses that are not falsifiable -->
61
-
62
-
<!-- 2) Input data: draw schematic representations of additional datasets that you could use alongside the desire lines dataset, with at least one at each of these levels: -->
63
-
64
-
<!-- - Zones -->
65
-
66
-
<!-- - Points -->
67
-
68
-
<!-- - Routes -->
69
-
70
-
<!-- - Route networks -->
71
-
72
-
<!-- - Individual -->
73
-
74
-
<!-- What temporal and spatial resolution could each one have? -->
75
-
76
-
<!-- 3) Methods: using a flow diagram (e.g. as shown below) -->
<!-- ## Practical, group computer task (30 minutes) -->
83
-
84
-
<!-- Create a github account (all). See: https://github.com -->
85
-
86
-
<!-- Building on the follow code chunk (but with no copy-and-pasting), create a data frame that contains the names, coffee habits and like/dislike of bus travel for everyone in your group (just 1 computer per group): -->
<!-- When you are complete, add your code to https://github.com/ITSLeeds/TDS/blob/master/code-r/01-person-data.R -->
126
-
127
-
<!-- ## Learning outcomes -->
128
-
129
-
```{r, echo=FALSE}
130
-
# Identify available datasets and access and clean them
131
-
# Combine datasets from multiple sources
132
-
# Understand what machine learning is, which problems it is appropriate for compared with traditional statistical approaches, and how to implement machine learning techniques
133
-
# Visualise and communicate the results of transport data science, and know about setting-up interactive web applications
134
-
# Deciding when to use local computing power vs cloud services
135
-
```
136
-
137
-
<!-- - Articulate the relevance and limitations of data-centric analysis applied to transport problems, compared with other methods -->
138
49
139
50
# Data Science foundations
140
51
@@ -241,7 +152,12 @@ crashes[[2]]
241
152
242
153
## Data science on real data
243
154
244
-
To get some larger datasets, try the following (from Chapter 8 of RSRR)
155
+
Work through the following example on road traffic data (recommended for most people) or the NTS data (for people more interested in travel survey data).
156
+
You can do both if you have time.
157
+
158
+
### UK Road Safety Data
159
+
160
+
To get some larger datasets, try the following (from Chapter 8 of [RSRR](https://itsleeds.github.io/rrsrr/)):
245
161
246
162
::: {.panel-tabset group="language"}
247
163
## R
@@ -282,6 +198,166 @@ Let's go through these exercises together:
282
198
283
199
- We'll explore this together
284
200
201
+
### UK National Travel Survey (NTS) data
202
+
203
+
<details>
204
+
205
+
Note: you will need to download the modified NTS 2022 data from your Minerva module page and place it in your working directory for this section to work.
Visualising datasets is important when dealing with large volumes of data, as visualisations help convey complex information in an easily interpretable format. Consider the histogram plots of average trip lengths over a week in the UK.
# Note: This requires ggplot2 library to be loaded first
241
+
library(tidyverse) # Tidyverse contains ggplot2 and other useful packages
242
+
ggplot(NTS_data, aes(x = avg_trip_length)) +
243
+
geom_histogram(binwidth = 1, fill = "darkgrey") +
244
+
labs(
245
+
title = "Avg. Trip Length in Whole Week",
246
+
x = "Trip Length (km)",
247
+
y = "Number of Individuals"
248
+
) +
249
+
theme_minimal() +
250
+
xlim(0, 50)
251
+
```
252
+
253
+
Data exploration or "exploratory data analysis" (EDA) involves examining datasets in depth to uncover underlying patterns or differences. The direction of this investigation is largely guided by the research question.
254
+
255
+
Consider different histogram plots for weekdays and weekends. Can you identify any differences between them? (Clue: Check the number of individuals between 0-1 Km)
256
+
257
+
Think: What could be plausible reasons for such difference?
And (this is how you can show figures in Quarto), in a quarto document (.qmd file) that you will use to write and submit your coursework, you can include the saved figure like this (we will come onto this later in the module):
313
+
314
+
```
315
+

316
+
```
317
+
318
+

Don't they largely look the same? Can you stop here and infer that the trip length distributions for weekdays and weekends are largely similar? You might, depending on the resources at your disposal, but from an academic point of view we need to think about other potential dimensions where they could be different.
328
+
329
+
Consider different histogram plots for 'Standard Deviation' of trip lengths over weekdays and weekends. Can you identify any differences between them? (Clue: Again, check the number of individuals with SD 0-2 Km)
330
+
331
+
Think: What could be plausible reasons for such difference?
geom_histogram(binwidth = 0.5, fill = "darkred") +
350
+
labs(
351
+
title = "SD of Trip Length on Weekends",
352
+
x = "SD of trip length (km)",
353
+
y = "Number of Individuals"
354
+
) +
355
+
theme_minimal() +
356
+
xlim(0, 25) + ylim(0,1000)
357
+
```
358
+
359
+
</details>
360
+
285
361
# Self-study practical (1 hr)
286
362
287
363
**Read and try to complete the exercises in Chapters 1 to 5 of the book [Reproducible Road Safety Research with R](https://itsleeds.github.io/rrsrr/).**
@@ -311,4 +387,4 @@ For details on installing packages see [here](https://docs.ropensci.org/stats19/
311
387
312
388
- Think of a research question that you could answer with data science, and write it down in a .qmd file. Include a sketch of the data you would need to answer the question.
313
389
314
-
- Sign-up to the Cadence platform as outlined at [itsleeds.github.io/tds/s2/#the-cadence-platform](https://itsleeds.github.io/tds/s2/#the-cadence-platform)
390
+
- Sign-up to the Cadence platform as outlined at [itsleeds.github.io/tds/s2/#the-cadence-platform](https://itsleeds.github.io/tds/s2/#the-cadence-platform)
0 commit comments