Skip to content

Commit 0cb3ab3

Browse files
Xpirixgithub-actions[bot]
authored andcommitted
Posts scraped and committed via a GitHub Action.
1 parent cf87402 commit 0cb3ab3

12 files changed

+112
-112
lines changed

content/posts/analyzing-gtfs-realtime-data-for-public-transport-insights.md

Lines changed: 38 additions & 38 deletions
Large diffs are not rendered by default.

content/posts/geolife-gps-track-collection-processing-with-duckdb-qgis-trajectools.md

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -12,49 +12,49 @@ languages: ["en_gb"]
1212
available_languages: ["en_gb"]
1313
---
1414

15-
<p>The last time I preprocessed the whole GeoLife dataset, I loaded it into PostGIS. Today, I want to share a new workflow that creates a (Geo)Parquet file and that is much faster. </p>
15+
<p class="wp-block-paragraph">The last time I preprocessed the whole GeoLife dataset, I loaded it into PostGIS. Today, I want to share a new workflow that creates a (Geo)Parquet file and that is much faster. </p>
1616
<h2 class="wp-block-heading">The dataset (GeoLife)</h2>
1717
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
18-
<p>“This GPS trajectory dataset was collected in (Microsoft Research Asia) Geolife project by 182 users in a period of over three years (from April 2007 to August 2012). A GPS trajectory of this dataset is represented by a sequence of time-stamped points, each of which contains the information of latitude, longitude and altitude. This dataset contains 17,621 trajectories with a total distance of about 1.2 million kilometers and a total duration of 48,000+ hours. These trajectories were recorded by different GPS loggers and GPS-phones, and have a variety of sampling rates. 91 percent of the trajectories are logged in a dense representation, e.g. every 1~5 seconds or every 5~10 meters per point.”</p>
18+
<p class="wp-block-paragraph">“This GPS trajectory dataset was collected in (Microsoft Research Asia) Geolife project by 182 users in a period of over three years (from April 2007 to August 2012). A GPS trajectory of this dataset is represented by a sequence of time-stamped points, each of which contains the information of latitude, longitude and altitude. This dataset contains 17,621 trajectories with a total distance of about 1.2 million kilometers and a total duration of 48,000+ hours. These trajectories were recorded by different GPS loggers and GPS-phones, and have a variety of sampling rates. 91 percent of the trajectories are logged in a dense representation, e.g. every 1~5 seconds or every 5~10 meters per point.”</p>
1919
</blockquote>
20-
<p>The <a href="https://www.microsoft.com/en-us/download/details.aspx?id=52367">GeoLife GPS Trajectories</a> download contains 182 directories full of .plt files: </p>
20+
<p class="wp-block-paragraph">The <a href="https://www.microsoft.com/en-us/download/details.aspx?id=52367">GeoLife GPS Trajectories</a> download contains 182 directories full of .plt files: </p>
2121
<figure class="wp-block-image size-large"><img alt="" class="wp-image-9584" height="229" src="/img/subscribers/anita_graser/geolife-gps-track-collection-processing-with-duckdb-qgis-trajectools/image.webp" width="531"/></figure>
22-
<p>Basically, CSV files with a custom header: </p>
22+
<p class="wp-block-paragraph">Basically, CSV files with a custom header: </p>
2323
<figure class="wp-block-image size-large"><img alt="" class="wp-image-9585" height="251" src="/img/subscribers/anita_graser/geolife-gps-track-collection-processing-with-duckdb-qgis-trajectools/image-1.webp" width="541"/></figure>
2424
<h2 class="wp-block-heading">Creating the (Geo)Parquet using DuckDB</h2>
2525
<h3 class="wp-block-heading">DuckDB installation</h3>
26-
<p>Following the <a href="https://duckdb.org/install/?platform=macos&amp;environment=cli">official instructions</a>, installation is straightforward:</p>
26+
<p class="wp-block-paragraph">Following the <a href="https://duckdb.org/install/?platform=macos&amp;environment=cli">official instructions</a>, installation is straightforward:</p>
2727
<div class="wp-block-syntaxhighlighter-code"><pre class="brush: bash; title: ; notranslate">
2828
curl https://install.duckdb.org | sh
2929
</pre></div>
30-
<p>From there, I’ve been using the GUI which we can launch using:</p>
30+
<p class="wp-block-paragraph">From there, I’ve been using the GUI which we can launch using:</p>
3131
<div class="wp-block-syntaxhighlighter-code"><pre class="brush: bash; title: ; notranslate">
3232
duckdb -ui
3333
</pre></div>
34-
<p>The <a href="https://duckdb.org/docs/stable/core_extensions/spatial/overview">spatial extension</a> is a DuckDB core extension, so it’s readily available. We can create a spatial db with: </p>
34+
<p class="wp-block-paragraph">The <a href="https://duckdb.org/docs/stable/core_extensions/spatial/overview">spatial extension</a> is a DuckDB core extension, so it’s readily available. We can create a spatial db with: </p>
3535
<div class="wp-block-syntaxhighlighter-code"><pre class="brush: sql; title: ; notranslate">
3636
ATTACH IF NOT EXISTS ':memory:' AS memory;
3737
INSTALL spatial;
3838
LOAD spatial;
3939
</pre></div>
4040
<figure class="wp-block-image size-large"><a href="https://anitagraser.com/wp-content/uploads/2025/10/image-2.png"><img alt="" class="wp-image-9592" height="278" src="/img/subscribers/anita_graser/geolife-gps-track-collection-processing-with-duckdb-qgis-trajectools/image-2.webp" width="733"/></a></figure>
41-
<p>Reading a spatial file is as simple as:</p>
41+
<p class="wp-block-paragraph">Reading a spatial file is as simple as:</p>
4242
<div class="wp-block-syntaxhighlighter-code"><pre class="brush: sql; title: ; notranslate">
4343
SELECT *
4444
FROM '/home/anita/Documents/Codeberg/trajectools/sample_data/geolife.gpkg'
4545
</pre></div>
46-
<p>thanks to the <a href="https://duckdb.org/docs/stable/core_extensions/spatial/gdal">GDAL integration</a>.</p>
47-
<p>But today, we want to do to get a bit more involved …</p>
46+
<p class="wp-block-paragraph">thanks to the <a href="https://duckdb.org/docs/stable/core_extensions/spatial/gdal">GDAL integration</a>.</p>
47+
<p class="wp-block-paragraph">But today, we want to do to get a bit more involved …</p>
4848
<h3 class="wp-block-heading">DuckDB SQL magic</h3>
49-
<p>The issues we need to solve are:</p>
49+
<p class="wp-block-paragraph">The issues we need to solve are:</p>
5050
<ol class="wp-block-list">
5151
<li>Read all CSV files from all subdirectories</li>
5252
<li>Parse the CSV, ignoring the first couple of lines, while assigning proper column names</li>
5353
<li>Assign the CSV file name as the trajectory ID (because there is no ID in the original files)</li>
5454
<li>Create point geometries that will work with our GeoParquet file </li>
5555
<li>Create proper datetimes from the separate date and time fields</li>
5656
</ol>
57-
<p>Luckily, DuckDB’s read_csv function comes with the necessary features built-in. Putting it all together: </p>
57+
<p class="wp-block-paragraph">Luckily, DuckDB’s read_csv function comes with the necessary features built-in. Putting it all together: </p>
5858
<div class="wp-block-syntaxhighlighter-code"><pre class="brush: sql; title: ; notranslate">
5959
CREATE OR REPLACE TABLE geolife AS
6060
SELECT
@@ -74,30 +74,30 @@ FROM read_csv('/home/anita/Documents/Geodata/Geolife/Geolife Trajectories 1.3/Da
7474
'time': 'VARCHAR'
7575
});
7676
</pre></div>
77-
<p>It’s blazingly fast: </p>
77+
<p class="wp-block-paragraph">It’s blazingly fast: </p>
7878
<figure class="wp-block-image size-large is-resized"><a href="https://anitagraser.com/wp-content/uploads/2025/10/image-3.png"><img alt="" class="wp-image-9598" height="794" src="/img/subscribers/anita_graser/geolife-gps-track-collection-processing-with-duckdb-qgis-trajectools/image-3.webp" style="width: 824px; height: auto;" width="824"/></a></figure>
79-
<p><em>I haven’t tested reading directly from ZIP archives yet, but there seems to be a <a href="https://duckdb.org/community_extensions/extensions/zipfs.html">community extension (zipfs)</a> for this exact purpose. </em></p>
79+
<p class="wp-block-paragraph"><em>I haven’t tested reading directly from ZIP archives yet, but there seems to be a <a href="https://duckdb.org/community_extensions/extensions/zipfs.html">community extension (zipfs)</a> for this exact purpose. </em></p>
8080
<h2 class="wp-block-heading">Ready to QGIS</h2>
81-
<p>GeoParquet files can be drag-n-dropped into QGIS:</p>
81+
<p class="wp-block-paragraph">GeoParquet files can be drag-n-dropped into QGIS:</p>
8282
<figure class="wp-block-image size-large"><a href="https://anitagraser.com/wp-content/uploads/2025/10/image-4.png"><img alt="" class="wp-image-9601" height="348" src="/img/subscribers/anita_graser/geolife-gps-track-collection-processing-with-duckdb-qgis-trajectools/image-4.webp" width="1024"/></a></figure>
83-
<p><em>I’m running QGIS 3.42.1-Münster from conda-forge on Linux Mint.</em></p>
84-
<p>Yes, it takes a while to render all 25 million points … But you know what? It get’s really snappy once we zoom in closer, e.g. to the situation in Germany: </p>
83+
<p class="wp-block-paragraph"><em>I’m running QGIS 3.42.1-Münster from conda-forge on Linux Mint.</em></p>
84+
<p class="wp-block-paragraph">Yes, it takes a while to render all 25 million points … But you know what? It get’s really snappy once we zoom in closer, e.g. to the situation in Germany: </p>
8585
<figure class="wp-block-image size-large"><a href="https://anitagraser.com/wp-content/uploads/2025/10/image-5.png"><img alt="" class="wp-image-9603" height="504" src="/img/subscribers/anita_graser/geolife-gps-track-collection-processing-with-duckdb-qgis-trajectools/image-5.webp" width="889"/></a></figure>
86-
<p>Let’s have a closer look at what’s going on here. </p>
86+
<p class="wp-block-paragraph">Let’s have a closer look at what’s going on here. </p>
8787
<h3 class="wp-block-heading">Trajectools time</h3>
88-
<p>Selecting the 9,438 points in this extent, let’s compute movement metrics (speed &amp; direction) and create trajectory lines: </p>
88+
<p class="wp-block-paragraph">Selecting the 9,438 points in this extent, let’s compute movement metrics (speed &amp; direction) and create trajectory lines: </p>
8989
<figure class="wp-block-image size-large"><a href="https://anitagraser.com/wp-content/uploads/2025/10/image-8.png"><img alt="" class="wp-image-9608" height="1004" src="/img/subscribers/anita_graser/geolife-gps-track-collection-processing-with-duckdb-qgis-trajectools/image-8.webp" width="928"/></a></figure>
90-
<p>Looks like we have some high-speed sections in there (with those red &gt; 100 km/h streaks): </p>
90+
<p class="wp-block-paragraph">Looks like we have some high-speed sections in there (with those red &gt; 100 km/h streaks): </p>
9191
<figure class="wp-block-image size-large"><a href="https://anitagraser.com/wp-content/uploads/2025/10/image-9.png"><img alt="" class="wp-image-9610" height="422" src="/img/subscribers/anita_graser/geolife-gps-track-collection-processing-with-duckdb-qgis-trajectools/image-9.webp" width="808"/></a></figure>
92-
<p>When we zoom in to Darmstadt and enable the trajectories layer, we can see each individual trip. Looks like car trips on the highway and walks through the city: </p>
92+
<p class="wp-block-paragraph">When we zoom in to Darmstadt and enable the trajectories layer, we can see each individual trip. Looks like car trips on the highway and walks through the city: </p>
9393
<figure class="wp-block-image size-large"><a href="https://anitagraser.com/wp-content/uploads/2025/10/image-10.png"><img alt="" class="wp-image-9612" height="607" src="/img/subscribers/anita_graser/geolife-gps-track-collection-processing-with-duckdb-qgis-trajectools/image-10.webp" width="833"/></a></figure>
94-
<p>That looks like quite the long round trip: </p>
94+
<p class="wp-block-paragraph">That looks like quite the long round trip: </p>
9595
<figure class="wp-block-image size-large"><a href="https://anitagraser.com/wp-content/uploads/2025/10/image-11.png"><img alt="" class="wp-image-9614" height="456" src="/img/subscribers/anita_graser/geolife-gps-track-collection-processing-with-duckdb-qgis-trajectools/image-11.webp" width="1024"/></a></figure>
96-
<p>Let’s see where they might have stopped to have a break: </p>
96+
<p class="wp-block-paragraph">Let’s see where they might have stopped to have a break: </p>
9797
<figure class="wp-block-image size-large"><a href="https://anitagraser.com/wp-content/uploads/2025/10/image-13.png"><img alt="" class="wp-image-9617" height="1003" src="/img/subscribers/anita_graser/geolife-gps-track-collection-processing-with-duckdb-qgis-trajectools/image-13.webp" width="997"/></a></figure>
98-
<p>If I had to guess, I’d say they stayed at the Best Western: </p>
98+
<p class="wp-block-paragraph">If I had to guess, I’d say they stayed at the Best Western: </p>
9999
<figure class="wp-block-image size-large"><a href="https://anitagraser.com/wp-content/uploads/2025/10/image-15.png"><img alt="" class="wp-image-9621" height="333" src="/img/subscribers/anita_graser/geolife-gps-track-collection-processing-with-duckdb-qgis-trajectools/image-15.webp" width="1024"/></a></figure>
100100
<h2 class="wp-block-heading">Conclusion</h2>
101-
<p>DuckDB has been great for this ETL workflow. I didn’t use much of its geospatial capabilities here but I was pleasantly surprised how smooth the GeoParquet creation process has been. Geometries are handled without any special magic and are recognized by QGIS. Same with the timestamps. All ready for more heavy spatiotemporal analysis with <a href="https://plugins.qgis.org/plugins/processing_trajectory/#plugin-versions">Trajectools</a>. </p>
102-
<p>If you haven’t tried DuckDB or GeoParquet yet, give it a try, particularly if you’re collaborating with data scientists from other domains and want to exchange data. </p>
103-
<p></p>
101+
<p class="wp-block-paragraph">DuckDB has been great for this ETL workflow. I didn’t use much of its geospatial capabilities here but I was pleasantly surprised how smooth the GeoParquet creation process has been. Geometries are handled without any special magic and are recognized by QGIS. Same with the timestamps. All ready for more heavy spatiotemporal analysis with <a href="https://plugins.qgis.org/plugins/processing_trajectory/#plugin-versions">Trajectools</a>. </p>
102+
<p class="wp-block-paragraph">If you haven’t tried DuckDB or GeoParquet yet, give it a try, particularly if you’re collaborating with data scientists from other domains and want to exchange data. </p>
103+
<p class="wp-block-paragraph"></p>

content/posts/linvestissement-open-source-doslandia.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ languages: ["en_gb"]
1212
available_languages: ["en_gb"]
1313
---
1414

15-
<div class="wpb_row vc_row-fluid vc_row standard_section" id="fws_6982938dec0ef" style="padding-top: 0px; padding-bottom: 0px;"><div class="row-bg-wrap"><div class="inner-wrap"> <div class="row-bg"></div></div> </div><div class="col span_12 dark left">
15+
<div class="wpb_row vc_row-fluid vc_row standard_section" id="fws_698338986a591" style="padding-top: 0px; padding-bottom: 0px;"><div class="row-bg-wrap"><div class="inner-wrap"> <div class="row-bg"></div></div> </div><div class="col span_12 dark left">
1616
<div class="vc_col-sm-12 wpb_column column_container vc_column_container col no-extra-padding">
1717
<div class="vc_column-inner">
1818
<div class="wpb_wrapper">
@@ -42,7 +42,7 @@ available_languages: ["en_gb"]
4242
</div>
4343
</div>
4444
</div></div>
45-
<div class="wpb_row vc_row-fluid vc_row standard_section" id="fws_6982938deca03" style="padding-top: 0px; padding-bottom: 0px;"><div class="row-bg-wrap"><div class="inner-wrap"> <div class="row-bg"></div></div> </div><div class="col span_12 dark left">
45+
<div class="wpb_row vc_row-fluid vc_row standard_section" id="fws_698338986af28" style="padding-top: 0px; padding-bottom: 0px;"><div class="row-bg-wrap"><div class="inner-wrap"> <div class="row-bg"></div></div> </div><div class="col span_12 dark left">
4646
<div class="vc_col-sm-12 wpb_column column_container vc_column_container col no-extra-padding">
4747
<div class="vc_column-inner">
4848
<div class="wpb_wrapper">
@@ -55,7 +55,7 @@ available_languages: ["en_gb"]
5555
</div>
5656
</div>
5757
</div></div>
58-
<div class="wpb_row vc_row-fluid vc_row standard_section" id="fws_6982938decc00" style="padding-top: 0px; padding-bottom: 0px;"><div class="row-bg-wrap"><div class="inner-wrap"> <div class="row-bg"></div></div> </div><div class="col span_12 dark left">
58+
<div class="wpb_row vc_row-fluid vc_row standard_section" id="fws_698338986b14b" style="padding-top: 0px; padding-bottom: 0px;"><div class="row-bg-wrap"><div class="inner-wrap"> <div class="row-bg"></div></div> </div><div class="col span_12 dark left">
5959
<div class="vc_col-sm-12 wpb_column column_container vc_column_container col no-extra-padding">
6060
<div class="vc_column-inner">
6161
<div class="wpb_wrapper">

0 commit comments

Comments
 (0)