Skip to content

Add Generation Script for londonTubeLines.json Dataset #667

@dsmedia

Description

@dsmedia

Add Generation Script for londonTubeLines.json Dataset

The londonTubeLines.json dataset, showcased in this example:

  1. has a complex lineage
  2. lacks a generation script

Given the significant community interest in geospatial visualization, maintaining reproducible geographic datasets seems to be a worthwhile priority. A script to add to the repo that can generate (or update) londonTubeLines.json from its original source, which is believed to be OpenStreetMap, would secure the dataset's long-term viability. Input from those with geospatial data expertise would be welcome.

Background and Current Status

As I understand it, the londonTubeLines.json dataset is a TopoJSON file representing selected London Underground rail lines. It appears to have been added to the repository in this commit. The dataset's description, sources, and license are currently being expanded in pull request #663.

The commit history and related documentation suggest the following lineage:

  1. Original Source (Likely OpenStreetMap): The data was likely originally sourced from OpenStreetMap, although a direct link could not be found.
  2. Intermediate Source 1 (oobrien/vis): User @oobrien appears to have processed the data into a simplified GeoJSON format, tfl_lines.json. The file can be found in this commit of the oobrien/vis repository, which cites OpenStreetMap. This file represents a simplified view of London transport lines from the original source.
  3. Intermediate Source 2 (gicentre/litvis): @jwoLondon documented the process of converting tfl_lines.json to a TopoJSON file (similar to londonTubeLines.json) in this tutorial. This involved filtering specific lines and mapping properties using ndjson-cli and topojson. When I attempted to folllow the instructions (code below), I wasn't quite able to match this repo's version. Also, this code still relies on an intermediate source, not the original source.

topoJSON files are not limited to aereal units. Here, for example, we can import a file containing the geographical routes of selected London Underground tube lines. The conversion of the tfl_lines.json follows a similar pattern to the conversion of the borough boundary files, but with some minor differences:

  • The file is already in unprojected geoJSON format so does not need reprojecting or conversion from a shapefile.
  • ndjson-cat converts the original geoJSON file to a single line necessary for further processing.
  • the file contains details of more rail lines than we need to map so ndjson.filter is used with a regular expression to select data for tube and DLR lines only.
  • the property we will use for the id (the tube line name) is inside the first element of an array so we reference it with [0] (where there is more than one element in the array it indicates more than one named tube line shares the same physical line).
ndjson-cat < tfl_lines.json \
  | ndjson-split 'd.features' \
  | ndjson-filter 'd.properties.lines[0].name.match("Ci.*|Di.*|No.*|Ce.*|DLR|Ha.*|Ba.*|Ju.*|Me.*|Pi.*|Vi.*|Wa.*")' \
  | ndjson-map 'd.id = d.properties.lines[0].name,delete d.properties,d' \
  | geo2topo -n -q 1e4 line="-" \
  > londonTubeLines.json

An initial attempt was made to create a generation script using @oobrien 's tfl_lines.json as a starting point. The script involved using ndjson-cli, topojson, and d3-geo-centroid, but the output did not perfectly match the existing londonTubeLines.json in vega-datasets.

1. Setup Commands
npm install -g shapefile ndjson-cli topojson d3-geo-centroid
apt-get install gdal-bin

wget https://raw.githubusercontent.com/oobrien/vis/master/tubecreature/data/tfl_lines.json

ndjson-cat tfl_lines.json \
  | ndjson-split 'd.features' \
  | ndjson-filter 'd.properties.lines.some((l) => l.name == "DLR" || l.name == "Bakerloo" || l.name == "District" || l.name == "Piccadilly" || l.name == "Northern" || l.name == "Hammersmith & City" || l.name == "Jubilee" || l.name == "Circle" || l.name == "Waterloo & City" || l.name == "Victoria" || l.name == "Metropolitan" || l.name == "Central") && !d.properties.lines.some((l) => l.name == "London Overground")' \
  | ndjson-map 'd.id = d.properties.lines[0].name + (d.id ? "_" + d.id : ""), d' \
  > tfl_lines_filtered.ndjson

geo2topo -n -q 1e4 line=tfl_lines_filtered.ndjson > londonTubeLines.json

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions