fix: fix logic to check for nested GTFS files in ZIP#1972
Merged
qcdyx merged 4 commits intoMobilityData:masterfrom Feb 12, 2025
Merged
fix: fix logic to check for nested GTFS files in ZIP#1972qcdyx merged 4 commits intoMobilityData:masterfrom
qcdyx merged 4 commits intoMobilityData:masterfrom
Conversation
qcdyx
reviewed
Feb 7, 2025
Contributor
|
Hey @skalexch could you take a look at the 14 datasets that contains new errors? (You can see all of them by clicking on the arrow) New Errors (14 out of 1808 datasets, ~1%) ✅Details of new errors due to code change, which is less than the provided threshold of 1%.
|
Contributor
|
@qcdyx the screenshot below shows the affected datasets and above them the folders that I extracted from them. I also included mdb-2854 as control. It does seem like for all of the concerned datasets, the GTFS files exist within a subfolder. For the control dataset, the extracted folder has the same name as the zipfile, which means that the files reside in the root directory. Please note that I could not download mdb-612 and mdb-1324 |
c1342e4 to
13bf3f5
Compare
qcdyx
approved these changes
Feb 12, 2025
Contributor
qcdyx
left a comment
There was a problem hiding this comment.
LGTM. Thanks for your contribution!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary:
This PR fixes a bug with our logic to check whether a ZIP file we're loading has GTFS files in a subfolder. It looks like
ZipInputStream.getNextEntrydoesn't always return subfolders, depending on how the ZIP file was created. The subfolder and ZIP file having the same name in #1912 was a red herring.Details
``` $ unzip -l piercetransit-wa-us--flex-v2.zip Archive: piercetransit-wa-us--flex-v2.zip Length Date Time Name --------- ---------- ----- ---- 170 11-28-2023 15:22 piercetransit-wa-us--flex-v2/timetables.txt 81 11-28-2023 15:22 piercetransit-wa-us--flex-v2/fare_attributes.txt 18 11-28-2023 15:22 piercetransit-wa-us--flex-v2/stop_attributes.txt 56 11-28-2023 15:22 piercetransit-wa-us--flex-v2/transfers.txt 183 11-28-2023 15:22 piercetransit-wa-us--flex-v2/agency.txt 12 11-28-2023 15:22 piercetransit-wa-us--flex-v2/areas.txt 54 11-28-2023 15:22 piercetransit-wa-us--flex-v2/fare_rules.txt 437 11-28-2023 15:22 piercetransit-wa-us--flex-v2/calendar_dates.txt 4367 11-28-2023 15:22 piercetransit-wa-us--flex-v2/stop_times.txt 374 11-28-2023 15:22 piercetransit-wa-us--flex-v2/location_groups.txt 137 11-28-2023 15:22 piercetransit-wa-us--flex-v2/directions.txt 53 11-28-2023 15:22 piercetransit-wa-us--flex-v2/frequencies.txt 18 11-28-2023 15:22 piercetransit-wa-us--flex-v2/farezone_attributes.txt 895 11-28-2023 15:22 piercetransit-wa-us--flex-v2/shapes.txt 983 11-28-2023 15:22 piercetransit-wa-us--flex-v2/trips.txt 355 11-28-2023 15:22 piercetransit-wa-us--flex-v2/feed_info.txt 2051 11-28-2023 15:22 piercetransit-wa-us--flex-v2/locations.geojson 104 11-28-2023 15:22 piercetransit-wa-us--flex-v2/runcut.txt 2170 11-28-2023 15:22 piercetransit-wa-us--flex-v2/stops.txt 117 11-28-2023 15:22 piercetransit-wa-us--flex-v2/linked_datasets.txt 131 11-28-2023 15:22 piercetransit-wa-us--flex-v2/calendar_attributes.txt 62 11-28-2023 15:22 piercetransit-wa-us--flex-v2/timetable_stop_order.txt 1745 11-28-2023 15:22 piercetransit-wa-us--flex-v2/booking_rules.txt 265 11-28-2023 15:22 piercetransit-wa-us--flex-v2/calendar.txt 520 11-28-2023 15:22 piercetransit-wa-us--flex-v2/routes.txt --------- ------- 15358 25 files $ mv piercetransit-wa-us--flex-v2.zip foobar.zip $ unzip -l foobar.zip Archive: foobar.zip Length Date Time Name --------- ---------- ----- ---- 170 11-28-2023 15:22 piercetransit-wa-us--flex-v2/timetables.txt 81 11-28-2023 15:22 piercetransit-wa-us--flex-v2/fare_attributes.txt 18 11-28-2023 15:22 piercetransit-wa-us--flex-v2/stop_attributes.txt 56 11-28-2023 15:22 piercetransit-wa-us--flex-v2/transfers.txt 183 11-28-2023 15:22 piercetransit-wa-us--flex-v2/agency.txt 12 11-28-2023 15:22 piercetransit-wa-us--flex-v2/areas.txt 54 11-28-2023 15:22 piercetransit-wa-us--flex-v2/fare_rules.txt 437 11-28-2023 15:22 piercetransit-wa-us--flex-v2/calendar_dates.txt 4367 11-28-2023 15:22 piercetransit-wa-us--flex-v2/stop_times.txt 374 11-28-2023 15:22 piercetransit-wa-us--flex-v2/location_groups.txt 137 11-28-2023 15:22 piercetransit-wa-us--flex-v2/directions.txt 53 11-28-2023 15:22 piercetransit-wa-us--flex-v2/frequencies.txt 18 11-28-2023 15:22 piercetransit-wa-us--flex-v2/farezone_attributes.txt 895 11-28-2023 15:22 piercetransit-wa-us--flex-v2/shapes.txt 983 11-28-2023 15:22 piercetransit-wa-us--flex-v2/trips.txt 355 11-28-2023 15:22 piercetransit-wa-us--flex-v2/feed_info.txt 2051 11-28-2023 15:22 piercetransit-wa-us--flex-v2/locations.geojson 104 11-28-2023 15:22 piercetransit-wa-us--flex-v2/runcut.txt 2170 11-28-2023 15:22 piercetransit-wa-us--flex-v2/stops.txt 117 11-28-2023 15:22 piercetransit-wa-us--flex-v2/linked_datasets.txt 131 11-28-2023 15:22 piercetransit-wa-us--flex-v2/calendar_attributes.txt 62 11-28-2023 15:22 piercetransit-wa-us--flex-v2/timetable_stop_order.txt 1745 11-28-2023 15:22 piercetransit-wa-us--flex-v2/booking_rules.txt 265 11-28-2023 15:22 piercetransit-wa-us--flex-v2/calendar.txt 520 11-28-2023 15:22 piercetransit-wa-us--flex-v2/routes.txt --------- ------- 15358 25 files ```Closes #1912
Expected behavior:
We get an invalid_input_files_in_subfolder notice even if the subfolder is not treated as a standalone entry.
Testing:
Before:

After:

Please make sure these boxes are checked before submitting your pull request - thanks!
gradle testto make sure you didn't break anything