Skip to content

Comments

feat: route type consistency validators#2093

Open
cka-y wants to merge 6 commits intomasterfrom
feat/1925
Open

feat: route type consistency validators#2093
cka-y wants to merge 6 commits intomasterfrom
feat/1925

Conversation

@cka-y
Copy link
Contributor

@cka-y cka-y commented Feb 11, 2026

Summary:

This pull request introduces two new GTFS validation rules to ensure consistency of route types for block IDs and in-seat transfers, along with corresponding tests and a minor schema annotation update.

New validation rules:

  • Added InconsistentRouteTypeForBlockIdValidator, which checks that all trips sharing a block_id also share the same route_type. If not, a warning notice is generated.
  • Added InconsistentRouteTypeForInSeatTransferValidator, which ensures that an in-seat transfer (transfer type 5) only occurs between routes of the same route_type.

Schema and test updates:

  • Annotated the transferType() field in GtfsTransferSchema with @Index to improve lookup performance for transfer type-based validations.
    n

Expected behavior:
image

image

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with gradle test to make sure you didn't break anything
  • Add or update any needed documentation to the repo
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

@github-actions

This comment was marked as outdated.

@cka-y cka-y requested a review from skalexch February 16, 2026 16:19
@MobilityData MobilityData deleted a comment from github-actions bot Feb 16, 2026
@MobilityData MobilityData deleted a comment from github-actions bot Feb 16, 2026
@github-actions
Copy link
Contributor

📝 Acceptance Test Report

📋 Summary

❌ The rule acceptance test has failed for commit 362fbe2
Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (0 out of 987 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Errors (0 out of 987 datasets, ~0%) ✅

No changes were detected due to the code change.

New Warnings (13 out of 987 datasets, ~1%) ❌

Details of new errors due to code change, which is above the provided threshold of 1%.

Dataset Notice Code
mdb-1003 inconsistent_route_type_for_block_id
mdb-1013 inconsistent_route_type_for_block_id
mdb-1094 inconsistent_route_type_for_block_id
mdb-1098 inconsistent_route_type_for_block_id
mdb-2155 inconsistent_route_type_for_block_id
mdb-2253 inconsistent_route_type_for_block_id
mdb-2653 inconsistent_route_type_for_block_id
mdb-2826 inconsistent_route_type_for_block_id
mdb-2832 inconsistent_route_type_for_block_id
mdb-2886 inconsistent_route_type_for_block_id
mdb-660 inconsistent_route_type_for_block_id
mdb-782 inconsistent_route_type_for_block_id
mdb-893 inconsistent_route_type_for_block_id
Dropped Warnings (0 out of 987 datasets, ~0%) ✅

No changes were detected due to the code change.

🛡️ Corruption Check

16 out of 1003 sources (~2 %) are corrupted.
Dataset Ref Report Exists Ref Report Readable Latest Report Exists Latest Report Readable
mdb-1114
mdb-1123
mdb-1332
mdb-1808
mdb-1953
mdb-227
mdb-383
mdb-55
mdb-606
mdb-609
mdb-655
mdb-780
mdb-789
mdb-806
mdb-9
mdb-907

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

Time Metric Dataset ID Reference (s) Latest (s) Difference (s)
Average -- 5.37 5.63 ⬆️+0.26
Median -- 1.56 1.86 ⬆️+0.29
Standard Deviation -- 21.19 21.20 ⬆️+0.01
Minimum in References Reports mdb-2360 0.45 0.53 ⬆️+0.08
Maximum in Reference Reports mdb-2014 570.95 571.90 ⬆️+0.95
Minimum in Latest Reports mdb-1788 0.48 0.50 ⬆️+0.01
Maximum in Latest Reports mdb-2014 570.95 571.90 ⬆️+0.95
📜 Memory Consumption
Metric Dataset ID Reference (s) Latest (s) Difference (s)
Average -- 563.08 MiB 565.66 MiB ⬆️+2.57 MiB
Median -- 327.93 MiB 327.93 MiB ⬇️0 bytes
Standard Deviation -- 1005.33 MiB 973.24 MiB ⬇️-32.09 MiB
Minimum in References Reports mdb-1984 41.74 MiB 39.79 MiB ⬇️-1.94 MiB
Maximum in Reference Reports mdb-2014 9.43 GiB 9.43 GiB ⬆️+1.44 MiB
Minimum in Latest Reports mdb-2034 391.93 MiB 38.39 MiB ⬇️-353.53 MiB
Maximum in Latest Reports mdb-2014 9.43 GiB 9.43 GiB ⬆️+1.44 MiB

Copy link
Contributor

@skalexch skalexch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@github-actions
Copy link
Contributor

📝 Acceptance Test Report

📋 Summary

❌ The rule acceptance test has failed for commit f59aa07
Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (0 out of 987 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Errors (0 out of 987 datasets, ~0%) ✅

No changes were detected due to the code change.

New Warnings (13 out of 987 datasets, ~1%) ❌

Details of new errors due to code change, which is above the provided threshold of 1%.

Dataset Notice Code
mdb-1003 inconsistent_route_type_for_block_id
mdb-1013 inconsistent_route_type_for_block_id
mdb-1094 inconsistent_route_type_for_block_id
mdb-1098 inconsistent_route_type_for_block_id
mdb-2155 inconsistent_route_type_for_block_id
mdb-2253 inconsistent_route_type_for_block_id
mdb-2653 inconsistent_route_type_for_block_id
mdb-2826 inconsistent_route_type_for_block_id
mdb-2832 inconsistent_route_type_for_block_id
mdb-2886 inconsistent_route_type_for_block_id
mdb-660 inconsistent_route_type_for_block_id
mdb-782 inconsistent_route_type_for_block_id
mdb-893 inconsistent_route_type_for_block_id
Dropped Warnings (0 out of 987 datasets, ~0%) ✅

No changes were detected due to the code change.

🛡️ Corruption Check

16 out of 1003 sources (~2 %) are corrupted.
Dataset Ref Report Exists Ref Report Readable Latest Report Exists Latest Report Readable
mdb-1114
mdb-1123
mdb-1332
mdb-1808
mdb-1953
mdb-227
mdb-383
mdb-55
mdb-606
mdb-609
mdb-655
mdb-780
mdb-789
mdb-806
mdb-9
mdb-907

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

Time Metric Dataset ID Reference (s) Latest (s) Difference (s)
Average -- 5.20 5.94 ⬆️+0.73
Median -- 1.58 1.83 ⬆️+0.25
Standard Deviation -- 13.46 24.02 ⬆️+10.56
Minimum in References Reports mdb-2234 0.48 2.59 ⬆️+2.10
Maximum in Reference Reports mdb-2014 212.59 665.06 ⬆️+452.48
Minimum in Latest Reports mdb-6 0.48 0.50 ⬆️+0.01
Maximum in Latest Reports mdb-2014 212.59 665.06 ⬆️+452.48
📜 Memory Consumption
Metric Dataset ID Reference (s) Latest (s) Difference (s)
Average -- 568.90 MiB 565.56 MiB ⬇️-3.34 MiB
Median -- 323.93 MiB 327.93 MiB ⬆️+4.00 MiB
Standard Deviation -- 1.01 GiB 987.26 MiB ⬇️-48.42 MiB
Minimum in References Reports mdb-61 39.99 MiB 415.93 MiB ⬆️+375.94 MiB
Maximum in Reference Reports mdb-2014 11.86 GiB 9.56 GiB ⬇️-2.31 GiB
Minimum in Latest Reports mdb-2794 399.93 MiB 39.55 MiB ⬇️-360.38 MiB
Maximum in Latest Reports mdb-2014 11.86 GiB 9.56 GiB ⬇️-2.31 GiB

this.tripTable = tripTable;
this.routeTable = routeTable;
}

Copy link
Contributor

@jcpitre jcpitre Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a shouldCallValidate method for more efficiency. For example if there is no block_id column in trips.txt (it is optional), there is no point to call validate

Although it's true in that case that the call to byBlockIdMap should return probably an empty map which exits the loop pretty fast.

this.transferTable = transferTable;
this.routeTable = routeTable;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a shouldCallValidate method for more efficiency.
For example transfer.txt could have no from_route_id or to_route_id (they are optional). In that case there's no point to call validate.

Copy link
Contributor

@jcpitre jcpitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving so you can decide if the changes I mentioned are worth it. Their effect is minor.

@github-actions
Copy link
Contributor

📝 Acceptance Test Report

📋 Summary

❌ The rule acceptance test has failed for commit 7172c85
Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (0 out of 987 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Errors (0 out of 987 datasets, ~0%) ✅

No changes were detected due to the code change.

New Warnings (13 out of 987 datasets, ~1%) ❌

Details of new errors due to code change, which is above the provided threshold of 1%.

Dataset Notice Code
mdb-1003 inconsistent_route_type_for_block_id
mdb-1013 inconsistent_route_type_for_block_id
mdb-1094 inconsistent_route_type_for_block_id
mdb-1098 inconsistent_route_type_for_block_id
mdb-2155 inconsistent_route_type_for_block_id
mdb-2253 inconsistent_route_type_for_block_id
mdb-2653 inconsistent_route_type_for_block_id
mdb-2826 inconsistent_route_type_for_block_id
mdb-2832 inconsistent_route_type_for_block_id
mdb-2886 inconsistent_route_type_for_block_id
mdb-660 inconsistent_route_type_for_block_id
mdb-782 inconsistent_route_type_for_block_id
mdb-893 inconsistent_route_type_for_block_id
Dropped Warnings (0 out of 987 datasets, ~0%) ✅

No changes were detected due to the code change.

🛡️ Corruption Check

16 out of 1003 sources (~2 %) are corrupted.
Dataset Ref Report Exists Ref Report Readable Latest Report Exists Latest Report Readable
mdb-1114
mdb-1123
mdb-1332
mdb-1808
mdb-1953
mdb-227
mdb-383
mdb-55
mdb-606
mdb-609
mdb-655
mdb-780
mdb-789
mdb-806
mdb-9
mdb-907

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

Time Metric Dataset ID Reference (s) Latest (s) Difference (s)
Average -- 5.55 5.68 ⬆️+0.12
Median -- 1.66 1.93 ⬆️+0.27
Standard Deviation -- 25.63 20.22 ⬇️-5.41
Minimum in References Reports mdb-2235 0.49 0.63 ⬆️+0.14
Maximum in Reference Reports mdb-2014 736.07 544.07 ⬇️-192.01
Minimum in Latest Reports mdb-2238 0.50 0.54 ⬆️+0.03
Maximum in Latest Reports mdb-2014 736.07 544.07 ⬇️-192.01
📜 Memory Consumption
Metric Dataset ID Reference (s) Latest (s) Difference (s)
Average -- 569.69 MiB 581.74 MiB ⬆️+12.04 MiB
Median -- 327.93 MiB 327.93 MiB ⬇️0 bytes
Standard Deviation -- 988.67 MiB 1.00 GiB ⬆️+38.93 MiB
Minimum in References Reports mdb-86 40.61 MiB 415.93 MiB ⬆️+375.31 MiB
Maximum in Reference Reports mdb-2014 8.48 GiB 8.59 GiB ⬆️+110.88 MiB
Minimum in Latest Reports mdb-77 46.25 MiB 39.27 MiB ⬇️-6.99 MiB
Maximum in Latest Reports mdb-2014 8.48 GiB 8.59 GiB ⬆️+110.88 MiB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add Rule for Block Consistency in route_type Values Across block_id

3 participants