You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: functions-python/reverse_geolocation/README.md
+23-12Lines changed: 23 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -52,20 +52,25 @@ Currently, storing `stops.txt` in GCP is a temporary implementation and may be m
52
52
53
53
## 3. `reverse_geolocation_process` Function
54
54
55
-
This function performs the core reverse geolocation logic. It processes each stop in `stops.txt` and determines its geographic location.
55
+
This function performs the core reverse geolocation logic. It processes location data from GTFS or GBFS feeds to determine their geographic context and stores it accordingly.
56
56
57
57
### Parameters:
58
-
-`stable_id`: Identifies the GTFS feed.
59
-
-`dataset_id`: Identifies the dataset being processed.
60
-
-`stops_url`: URL of the `stops.txt` file.
58
+
-`stable_id`: Identifies the feed (GTFS or GBFS).
59
+
-`dataset_id`: Required if `data_type` is not provided or is `gtfs`. Identifies the dataset being processed.
60
+
-`stops_url`: Required if `data_type` is not provided or is `gtfs`. URL of the GTFS `stops.txt` file.
61
+
-`station_information_url`: Required if `data_type` is `gbfs` and `vehicle_status_url` is omitted. URL of the GBFS `station_information.json` file.
62
+
-`vehicle_status_url`: Required if `data_type` is `gbfs` and `station_information_url` is omitted. URL of the GBFS `vehicle_status.json` file.
63
+
-`data_type`: Optional. Specifies the type of data being processed. Can be `gtfs` or `gbfs`. If not provided, the function will attempt to determine the type based on the URLs provided.
61
64
62
65
### Processing Steps:
63
66
64
-
1.**Load Stop Data**
65
-
- The function reads `stops.txt` into a Pandas DataFrame, ensuring unique longitude-latitude combinations to avoid redundant processing.
67
+
1.**Load Location Data**
68
+
- For GTFS: the function reads `stops.txt` into a Pandas DataFrame, ensuring unique longitude-latitude pairs.
69
+
- For GBFS: location data is extracted from `station_information.json` (preferred) or `vehicle_status.json` (fallback), also ensuring uniqueness.
66
70
67
-
2.**Update Dataset Bounding Box**
68
-
- The dataset's bounding box is updated using only the extreme coordinate values, forming a rectangular boundary.
71
+
2.**Updates Bounding Box**
72
+
- For GTFS: the bounding box is derived from stop coordinates. The dataset's bounding box is updated in the database.
73
+
- For GBFS: it’s based on extracted station or vehicle coordinates. No database update is performed. We will use the term `stop` to refer to both GTFS stops and GBFS stations/vehicles.
69
74
70
75
3.**Check for Previously Processed Stops**
71
76
- Stops are matched against existing `Stop` entities in PostgreSQL using geographic coordinates (not `stop_id`).
@@ -79,11 +84,17 @@ This function performs the core reverse geolocation logic. It processes each sto
79
84
5.**Store Results in PostgreSQL**
80
85
- Unique location aggregates are identified, and stop counts per location are recorded.
81
86
-`Location` entities are created based on the extracted administrative hierarchy.
87
+
82
88
6.**GeoJSON Generation**
83
-
- The function creates a **GeoJSON file** containing location aggregates and their corresponding stop counts.
84
-
- The file is stored in **GCP Storage** following a consistent path format:
85
-
-**`<feed_stable_id>/geolocation.geojson`**
86
-
- This file always reflects the **latest dataset** results and is used for **location heatmap visualization on the front end**.
89
+
- A **GeoJSON file** is created representing the aggregated locations and their counts.
90
+
- It is stored in GCS GTFS or GBFS buckets, depending on the data type under:
91
+
-**`<stable_id>/geolocation.geojson`**
92
+
- The file includes:
93
+
- Extracted locations,
94
+
- Timestamp of extraction,
95
+
- URL used for data extraction.
96
+
- This file always reflects the most recent dataset/gbfs version results and powers the **location heatmap visualization** on the front end.
97
+
87
98
### Location Mapping:
88
99
-**`country_code` / `country`**: Taken from the `Geopolygon` with an ISO 3166-1 code.
89
100
-**`subdivision_name`**: Derived from the lowest administrative `Geopolygon` with an ISO 3166-2 code, but no ISO 3166-1 code.
0 commit comments