This repository was archived by the owner on Aug 18, 2025. It is now read-only.
Replies: 2 comments
-
More catalogues for feeds. So far I haven't seen too many examples of archiving gtfs rt data. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Some people have used github + github actions to scrape and store data as a git repository. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
As part of researching for this project, I started looking at prior work in the gtfs transit data space. Firstly, the gtfs documentation has a great list of resources to start looking at.
One resource stuck out to me: Transitland is an open-data platform built on thousands of public-transit data feeds from around the world. Transitland is the largest and most feature-rich aggregator of GTFS, GTFS Realtime, and GBFS data feeds. They are operated by Interline. They do already offer free GTFS archive downloads for hobbyists / academics here. Here's an example feed with historical archives. Now it's unclear if they only archive static gtfs data, or if they also archive gtfs-rt data. For example the MTA realtime bus data feed does not list archived data. Perhaps they do archive the data but downloading is only available for their paid users. In any case, simply creating a catalogue of gtfs feeds from around the world is useful enough. They also have a git repository of tools for handling gtfs and gtfs-rt data.
One challenge they highlight in offering free transit archive data downloads is the cost of bandwidth. They write "Some users are taking advantage of this openness by scraping the entire contents of Transitland's feed version archive. These mass downloads put load on Transitland servers and bandwidth, increasing our operating expenses." For this project, an idea to take the load off of hosting costs could be to host the datasets for free on machine learning dataset websites like kaggle or huggingface. We could publish an update once a month.
This could limit the cost of maintaining the archival to only the storage costs internally, without paying for bandwidth for others to download the data.
Beta Was this translation helpful? Give feedback.
All reactions