|
| 1 | +--- |
| 2 | +description: 'Dataset with over 100 million records containing information about places on a map, such as shops, |
| 3 | +restaurants, parks, playgrounds, and monuments.' |
| 4 | +sidebar_label: 'Foursquare places' |
| 5 | +slug: /getting-started/example-datasets/foursquare-places |
| 6 | +title: 'Foursquare places' |
| 7 | +keywords: ['visualizing'] |
| 8 | +--- |
| 9 | + |
| 10 | +import Image from '@theme/IdealImage'; |
| 11 | +import visualization_1 from '@site/static/images/getting-started/example-datasets/visualization_1.png'; |
| 12 | +import visualization_2 from '@site/static/images/getting-started/example-datasets/visualization_2.png'; |
| 13 | +import visualization_3 from '@site/static/images/getting-started/example-datasets/visualization_3.png'; |
| 14 | +import visualization_4 from '@site/static/images/getting-started/example-datasets/visualization_4.png'; |
| 15 | + |
| 16 | +## Dataset {#dataset} |
| 17 | + |
| 18 | +This dataset by Foursquare is available to [download](https://docs.foursquare.com/data-products/docs/access-fsq-os-places) |
| 19 | +and to use for free under the Apache 2.0 license. |
| 20 | + |
| 21 | +It contains over 100 million records of commercial points-of-interest (POI), |
| 22 | +such as shops, restaurants, parks, playgrounds, and monuments. It also includes |
| 23 | +additional metadata about those places, such as categories and social media |
| 24 | +information. |
| 25 | + |
| 26 | +## Data exploration {#data-exploration} |
| 27 | + |
| 28 | +For exploring the data we'll use [`clickhouse-local`](https://clickhouse.com/blog/extracting-converting-querying-local-files-with-sql-clickhouse-local), a small command-line tool |
| 29 | +that provides the full ClickHouse engine, although you could also use |
| 30 | +ClickHouse Cloud, `clickhouse-client` or even `chDB`. |
| 31 | + |
| 32 | +Run the following query to select the data from the s3 bucket where the data is stored: |
| 33 | + |
| 34 | +```sql title="Query" |
| 35 | +SELECT * FROM s3('s3://fsq-os-places-us-east-1/release/dt=2025-04-08/places/parquet/*') LIMIT 1 |
| 36 | +``` |
| 37 | + |
| 38 | +```response title="Response" |
| 39 | +Row 1: |
| 40 | +────── |
| 41 | +fsq_place_id: 4e1ef76cae60cd553dec233f |
| 42 | +name: @VirginAmerica In-flight Via @Gogo |
| 43 | +latitude: 37.62120111687914 |
| 44 | +longitude: -122.39003793803701 |
| 45 | +address: ᴺᵁᴸᴸ |
| 46 | +locality: ᴺᵁᴸᴸ |
| 47 | +region: ᴺᵁᴸᴸ |
| 48 | +postcode: ᴺᵁᴸᴸ |
| 49 | +admin_region: ᴺᵁᴸᴸ |
| 50 | +post_town: ᴺᵁᴸᴸ |
| 51 | +po_box: ᴺᵁᴸᴸ |
| 52 | +country: US |
| 53 | +date_created: 2011-07-14 |
| 54 | +date_refreshed: 2018-07-05 |
| 55 | +date_closed: 2018-07-05 |
| 56 | +tel: ᴺᵁᴸᴸ |
| 57 | +website: ᴺᵁᴸᴸ |
| 58 | +email: ᴺᵁᴸᴸ |
| 59 | +facebook_id: ᴺᵁᴸᴸ |
| 60 | +instagram: ᴺᵁᴸᴸ |
| 61 | +twitter: ᴺᵁᴸᴸ |
| 62 | +fsq_category_ids: ['4bf58dd8d48988d1f7931735'] |
| 63 | +fsq_category_labels: ['Travel and Transportation > Transport Hub > Airport > Plane'] |
| 64 | +placemaker_url: https://foursquare.com/placemakers/review-place/4e1ef76cae60cd553dec233f |
| 65 | +geom: �^��a�^@B� |
| 66 | +bbox: (-122.39003793803701,37.62120111687914,-122.39003793803701,37.62120111687914) |
| 67 | +``` |
| 68 | + |
| 69 | +We see that quite a few fields have `ᴺᵁᴸᴸ`, so we can add some additional conditions |
| 70 | +to our query to get back more usable data: |
| 71 | + |
| 72 | +```sql title="Query" |
| 73 | +SELECT * FROM s3('s3://fsq-os-places-us-east-1/release/dt=2025-04-08/places/parquet/*') |
| 74 | + WHERE address IS NOT NULL AND postcode IS NOT NULL AND instagram IS NOT NULL LIMIT 1 |
| 75 | +``` |
| 76 | + |
| 77 | +```response |
| 78 | +Row 1: |
| 79 | +────── |
| 80 | +fsq_place_id: 59b2c754b54618784f259654 |
| 81 | +name: Villa 722 |
| 82 | +latitude: ᴺᵁᴸᴸ |
| 83 | +longitude: ᴺᵁᴸᴸ |
| 84 | +address: Gijzenveldstraat 75 |
| 85 | +locality: Zutendaal |
| 86 | +region: Limburg |
| 87 | +postcode: 3690 |
| 88 | +admin_region: ᴺᵁᴸᴸ |
| 89 | +post_town: ᴺᵁᴸᴸ |
| 90 | +po_box: ᴺᵁᴸᴸ |
| 91 | +country: ᴺᵁᴸᴸ |
| 92 | +date_created: 2017-09-08 |
| 93 | +date_refreshed: 2020-01-25 |
| 94 | +date_closed: ᴺᵁᴸᴸ |
| 95 | +tel: ᴺᵁᴸᴸ |
| 96 | +website: https://www.landal.be |
| 97 | +email: ᴺᵁᴸᴸ |
| 98 | +facebook_id: 522698844570949 -- 522.70 trillion |
| 99 | +instagram: landalmooizutendaal |
| 100 | +twitter: landalzdl |
| 101 | +fsq_category_ids: ['56aa371be4b08b9a8d5734e1'] |
| 102 | +fsq_category_labels: ['Travel and Transportation > Lodging > Vacation Rental'] |
| 103 | +placemaker_url: https://foursquare.com/placemakers/review-place/59b2c754b54618784f259654 |
| 104 | +geom: ᴺᵁᴸᴸ |
| 105 | +bbox: (NULL,NULL,NULL,NULL) |
| 106 | +``` |
| 107 | + |
| 108 | +Run the following query to view the automatically inferred schema of the data using |
| 109 | +the `DESCRIBE`: |
| 110 | + |
| 111 | +```sql title="Query" |
| 112 | +DESCRIBE s3('s3://fsq-os-places-us-east-1/release/dt=2025-04-08/places/parquet/*') |
| 113 | +``` |
| 114 | + |
| 115 | +```response title="Response" |
| 116 | + ┌─name────────────────┬─type────────────────────────┬ |
| 117 | + 1. │ fsq_place_id │ Nullable(String) │ |
| 118 | + 2. │ name │ Nullable(String) │ |
| 119 | + 3. │ latitude │ Nullable(Float64) │ |
| 120 | + 4. │ longitude │ Nullable(Float64) │ |
| 121 | + 5. │ address │ Nullable(String) │ |
| 122 | + 6. │ locality │ Nullable(String) │ |
| 123 | + 7. │ region │ Nullable(String) │ |
| 124 | + 8. │ postcode │ Nullable(String) │ |
| 125 | + 9. │ admin_region │ Nullable(String) │ |
| 126 | +10. │ post_town │ Nullable(String) │ |
| 127 | +11. │ po_box │ Nullable(String) │ |
| 128 | +12. │ country │ Nullable(String) │ |
| 129 | +13. │ date_created │ Nullable(String) │ |
| 130 | +14. │ date_refreshed │ Nullable(String) │ |
| 131 | +15. │ date_closed │ Nullable(String) │ |
| 132 | +16. │ tel │ Nullable(String) │ |
| 133 | +17. │ website │ Nullable(String) │ |
| 134 | +18. │ email │ Nullable(String) │ |
| 135 | +19. │ facebook_id │ Nullable(Int64) │ |
| 136 | +20. │ instagram │ Nullable(String) │ |
| 137 | +21. │ twitter │ Nullable(String) │ |
| 138 | +22. │ fsq_category_ids │ Array(Nullable(String)) │ |
| 139 | +23. │ fsq_category_labels │ Array(Nullable(String)) │ |
| 140 | +24. │ placemaker_url │ Nullable(String) │ |
| 141 | +25. │ geom │ Nullable(String) │ |
| 142 | +26. │ bbox │ Tuple( ↴│ |
| 143 | + │ │↳ xmin Nullable(Float64),↴│ |
| 144 | + │ │↳ ymin Nullable(Float64),↴│ |
| 145 | + │ │↳ xmax Nullable(Float64),↴│ |
| 146 | + │ │↳ ymax Nullable(Float64)) │ |
| 147 | + └─────────────────────┴─────────────────────────────┘ |
| 148 | +``` |
| 149 | + |
| 150 | +## Loading the data into ClickHouse {#loading-the-data} |
| 151 | + |
| 152 | +If you'd like to persist the data on disk, you can use `clickhouse-server` |
| 153 | +or ClickHouse Cloud. |
| 154 | + |
| 155 | +To create the table, run the following command: |
| 156 | + |
| 157 | +```sql title="Query" |
| 158 | +CREATE TABLE foursquare_mercator |
| 159 | +( |
| 160 | + fsq_place_id String, |
| 161 | + name String, |
| 162 | + latitude Float64, |
| 163 | + longitude Float64, |
| 164 | + address String, |
| 165 | + locality String, |
| 166 | + region LowCardinality(String), |
| 167 | + postcode LowCardinality(String), |
| 168 | + admin_region LowCardinality(String), |
| 169 | + post_town LowCardinality(String), |
| 170 | + po_box LowCardinality(String), |
| 171 | + country LowCardinality(String), |
| 172 | + date_created Nullable(Date), |
| 173 | + date_refreshed Nullable(Date), |
| 174 | + date_closed Nullable(Date), |
| 175 | + tel String, |
| 176 | + website String, |
| 177 | + email String, |
| 178 | + facebook_id String, |
| 179 | + instagram String, |
| 180 | + twitter String, |
| 181 | + fsq_category_ids Array(String), |
| 182 | + fsq_category_labels Array(String), |
| 183 | + placemaker_url String, |
| 184 | + geom String, |
| 185 | + bbox Tuple( |
| 186 | + xmin Nullable(Float64), |
| 187 | + ymin Nullable(Float64), |
| 188 | + xmax Nullable(Float64), |
| 189 | + ymax Nullable(Float64) |
| 190 | + ), |
| 191 | + category LowCardinality(String) ALIAS fsq_category_labels[1], |
| 192 | + mercator_x UInt32 MATERIALIZED 0xFFFFFFFF * ((longitude + 180) / 360), |
| 193 | + mercator_y UInt32 MATERIALIZED 0xFFFFFFFF * ((1 / 2) - ((log(tan(((latitude + 90) / 360) * pi())) / 2) / pi())), |
| 194 | + INDEX idx_x mercator_x TYPE minmax, |
| 195 | + INDEX idx_y mercator_y TYPE minmax |
| 196 | +) |
| 197 | +ORDER BY mortonEncode(mercator_x, mercator_y) |
| 198 | +``` |
| 199 | + |
| 200 | +Take note of the use of the [`LowCardinality`](/sql-reference/data-types/lowcardinality) |
| 201 | +data type for several columns which changes the internal representation of the data |
| 202 | +types to be dictionary-encoded. Operating with dictionary encoded data significantly |
| 203 | +increases the performance of `SELECT` queries for many applications. |
| 204 | + |
| 205 | +Additionally, two `UInt32` `MATERIALIZED` columns, `mercator_x` and `mercator_y` are created |
| 206 | +that map the lat/lon coordinates to the [Web Mercator projection](https://en.wikipedia.org/wiki/Web_Mercator_projection) |
| 207 | +for easier segmentation of the map into tiles: |
| 208 | + |
| 209 | +```sql |
| 210 | +mercator_x UInt32 MATERIALIZED 0xFFFFFFFF * ((longitude + 180) / 360), |
| 211 | +mercator_y UInt32 MATERIALIZED 0xFFFFFFFF * ((1 / 2) - ((log(tan(((latitude + 90) / 360) * pi())) / 2) / pi())), |
| 212 | +``` |
| 213 | + |
| 214 | +Let's break down what is happening above for each column. |
| 215 | + |
| 216 | +**mercator_x** |
| 217 | + |
| 218 | +This column converts a longitude value into an X coordinate in the Mercator projection: |
| 219 | + |
| 220 | +- `longitude + 180` shifts the longitude range from [-180, 180] to [0, 360] |
| 221 | +- Dividing by 360 normalizes this to a value between 0 and 1 |
| 222 | +- Multiplying by `0xFFFFFFFF` (hex for maximum 32-bit unsigned integer) scales this normalized value to the full range of a 32-bit integer |
| 223 | + |
| 224 | +**mercator_y** |
| 225 | + |
| 226 | +This column converts a latitude value into a Y coordinate in the Mercator projection: |
| 227 | + |
| 228 | +- `latitude + 90` shifts latitude from [-90, 90] to [0, 180] |
| 229 | +- Dividing by 360 and multiplying by pi() converts to radians for the trigonometric functions |
| 230 | +- The `log(tan(...))` part is the core of the Mercator projection formula |
| 231 | +- multiplying by `0xFFFFFFFF` scales to the full 32-bit integer range |
| 232 | + |
| 233 | +Specifying `MATERIALIZED` makes sure that ClickHouse calculates the values for these |
| 234 | +columns when we `INSERT` the data, without having to specify these columns (which are not |
| 235 | +part of the original data schema) in the `INSERT statement. |
| 236 | + |
| 237 | +The table is ordered by `mortonEncode(mercator_x, mercator_y)` which produces a |
| 238 | +Z-order space-filling curve of `mercator_x`, `mercator_y` in order to significantly |
| 239 | +improve geospatial query performance. This Z-order curve ordering ensures data is |
| 240 | +physically organized by spatial proximity: |
| 241 | + |
| 242 | +```sql |
| 243 | +ORDER BY mortonEncode(mercator_x, mercator_y) |
| 244 | +``` |
| 245 | + |
| 246 | +Two `minmax` indices are also created for faster search: |
| 247 | + |
| 248 | +```sql |
| 249 | +INDEX idx_x mercator_x TYPE minmax, |
| 250 | +INDEX idx_y mercator_y TYPE minmax |
| 251 | +``` |
| 252 | + |
| 253 | +As you can see, ClickHouse has absolutely everything you need for real-time |
| 254 | +mapping applications! |
| 255 | + |
| 256 | +Run the following query to load the data: |
| 257 | + |
| 258 | +```sql |
| 259 | +INSERT INTO foursquare_mercator |
| 260 | +SELECT * FROM s3('s3://fsq-os-places-us-east-1/release/dt=2025-04-08/places/parquet/*') |
| 261 | +``` |
| 262 | + |
| 263 | +## Visualizing the data {#data-visualization} |
| 264 | + |
| 265 | +To see what's possible with this dataset, check out [adsb.exposed](https://adsb.exposed/?dataset=Places&zoom=5&lat=52.3488&lng=4.9219). |
| 266 | +adsb.exposed was originally built by co-founder and CTO Alexey Milovidov to visualize ADS-B (Automatic Dependent Surveillance-Broadcast) |
| 267 | +flight data, which is 1000x times larger. During a company hackathon Alexey added the Foursquare data to the tool. |
| 268 | + |
| 269 | +Some of our favourite visualizations are produced here below for you to enjoy. |
| 270 | + |
| 271 | +<Image img={visualization_1} size="md" alt="Density map of points of interest in Europe"/> |
| 272 | + |
| 273 | +<Image img={visualization_2} size="md" alt="Sake bars in Japan"/> |
| 274 | + |
| 275 | +<Image img={visualization_3} size="md" alt="ATMs"/> |
| 276 | + |
| 277 | +<Image img={visualization_4} size="md" alt="Map of Europe with points of interest categorised by country"/> |
| 278 | + |
0 commit comments