You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/posts/geolife-gps-track-collection-processing-with-duckdb-qgis-trajectools.md
+27-27Lines changed: 27 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,49 +12,49 @@ languages: ["en_gb"]
12
12
available_languages: ["en_gb"]
13
13
---
14
14
15
-
<p>The last time I preprocessed the whole GeoLife dataset, I loaded it into PostGIS. Today, I want to share a new workflow that creates a (Geo)Parquet file and that is much faster. </p>
15
+
<pclass="wp-block-paragraph">The last time I preprocessed the whole GeoLife dataset, I loaded it into PostGIS. Today, I want to share a new workflow that creates a (Geo)Parquet file and that is much faster. </p>
<p>“This GPS trajectory dataset was collected in (Microsoft Research Asia) Geolife project by 182 users in a period of over three years (from April 2007 to August 2012). A GPS trajectory of this dataset is represented by a sequence of time-stamped points, each of which contains the information of latitude, longitude and altitude. This dataset contains 17,621 trajectories with a total distance of about 1.2 million kilometers and a total duration of 48,000+ hours. These trajectories were recorded by different GPS loggers and GPS-phones, and have a variety of sampling rates. 91 percent of the trajectories are logged in a dense representation, e.g. every 1~5 seconds or every 5~10 meters per point.”</p>
18
+
<pclass="wp-block-paragraph">“This GPS trajectory dataset was collected in (Microsoft Research Asia) Geolife project by 182 users in a period of over three years (from April 2007 to August 2012). A GPS trajectory of this dataset is represented by a sequence of time-stamped points, each of which contains the information of latitude, longitude and altitude. This dataset contains 17,621 trajectories with a total distance of about 1.2 million kilometers and a total duration of 48,000+ hours. These trajectories were recorded by different GPS loggers and GPS-phones, and have a variety of sampling rates. 91 percent of the trajectories are logged in a dense representation, e.g. every 1~5 seconds or every 5~10 meters per point.”</p>
19
19
</blockquote>
20
-
<p>The <ahref="https://www.microsoft.com/en-us/download/details.aspx?id=52367">GeoLife GPS Trajectories</a> download contains 182 directories full of .plt files: </p>
20
+
<pclass="wp-block-paragraph">The <ahref="https://www.microsoft.com/en-us/download/details.aspx?id=52367">GeoLife GPS Trajectories</a> download contains 182 directories full of .plt files: </p>
<p>Following the <ahref="https://duckdb.org/install/?platform=macos&environment=cli">official instructions</a>, installation is straightforward:</p>
26
+
<pclass="wp-block-paragraph">Following the <ahref="https://duckdb.org/install/?platform=macos&environment=cli">official instructions</a>, installation is straightforward:</p>
<p>The <ahref="https://duckdb.org/docs/stable/core_extensions/spatial/overview">spatial extension</a> is a DuckDB core extension, so it’s readily available. We can create a spatial db with: </p>
34
+
<pclass="wp-block-paragraph">The <ahref="https://duckdb.org/docs/stable/core_extensions/spatial/overview">spatial extension</a> is a DuckDB core extension, so it’s readily available. We can create a spatial db with: </p>
<p><em>I haven’t tested reading directly from ZIP archives yet, but there seems to be a <ahref="https://duckdb.org/community_extensions/extensions/zipfs.html">community extension (zipfs)</a> for this exact purpose. </em></p>
79
+
<pclass="wp-block-paragraph"><em>I haven’t tested reading directly from ZIP archives yet, but there seems to be a <ahref="https://duckdb.org/community_extensions/extensions/zipfs.html">community extension (zipfs)</a> for this exact purpose. </em></p>
80
80
<h2class="wp-block-heading">Ready to QGIS</h2>
81
-
<p>GeoParquet files can be drag-n-dropped into QGIS:</p>
81
+
<pclass="wp-block-paragraph">GeoParquet files can be drag-n-dropped into QGIS:</p>
<p><em>I’m running QGIS 3.42.1-Münster from conda-forge on Linux Mint.</em></p>
84
-
<p>Yes, it takes a while to render all 25 million points … But you know what? It get’s really snappy once we zoom in closer, e.g. to the situation in Germany: </p>
83
+
<pclass="wp-block-paragraph"><em>I’m running QGIS 3.42.1-Münster from conda-forge on Linux Mint.</em></p>
84
+
<pclass="wp-block-paragraph">Yes, it takes a while to render all 25 million points … But you know what? It get’s really snappy once we zoom in closer, e.g. to the situation in Germany: </p>
<p>Let’s have a closer look at what’s going on here. </p>
86
+
<pclass="wp-block-paragraph">Let’s have a closer look at what’s going on here. </p>
87
87
<h3class="wp-block-heading">Trajectools time</h3>
88
-
<p>Selecting the 9,438 points in this extent, let’s compute movement metrics (speed & direction) and create trajectory lines: </p>
88
+
<pclass="wp-block-paragraph">Selecting the 9,438 points in this extent, let’s compute movement metrics (speed & direction) and create trajectory lines: </p>
<p>When we zoom in to Darmstadt and enable the trajectories layer, we can see each individual trip. Looks like car trips on the highway and walks through the city: </p>
92
+
<pclass="wp-block-paragraph">When we zoom in to Darmstadt and enable the trajectories layer, we can see each individual trip. Looks like car trips on the highway and walks through the city: </p>
<p>DuckDB has been great for this ETL workflow. I didn’t use much of its geospatial capabilities here but I was pleasantly surprised how smooth the GeoParquet creation process has been. Geometries are handled without any special magic and are recognized by QGIS. Same with the timestamps. All ready for more heavy spatiotemporal analysis with <ahref="https://plugins.qgis.org/plugins/processing_trajectory/#plugin-versions">Trajectools</a>. </p>
102
-
<p>If you haven’t tried DuckDB or GeoParquet yet, give it a try, particularly if you’re collaborating with data scientists from other domains and want to exchange data. </p>
103
-
<p></p>
101
+
<pclass="wp-block-paragraph">DuckDB has been great for this ETL workflow. I didn’t use much of its geospatial capabilities here but I was pleasantly surprised how smooth the GeoParquet creation process has been. Geometries are handled without any special magic and are recognized by QGIS. Same with the timestamps. All ready for more heavy spatiotemporal analysis with <ahref="https://plugins.qgis.org/plugins/processing_trajectory/#plugin-versions">Trajectools</a>. </p>
102
+
<pclass="wp-block-paragraph">If you haven’t tried DuckDB or GeoParquet yet, give it a try, particularly if you’re collaborating with data scientists from other domains and want to exchange data. </p>
0 commit comments