Improvements to island detection and performance of OMF import by yamilbknsu · Pull Request #117 · NatLabRockies/bambam

yamilbknsu · 2026-03-12T19:53:57Z

In this PR I'm addressing the issues encountered during a national run of the OMF import pipeline. In particular, this means:

Changing the predicate of the extent matching right after download from contains to intersects. This slightly reduces the accuracy of the filter but allows for much faster operation.
Implementing a process to remove orphan vertices after the island detection algorithm.

For point 2, an important assumption is that island detection is performed with all the edge lists together (something we want to expand in the future #116) and that we use double adjacency for component search. These assumptions mean that any vertex adjacent to an edge marked as "island" is necessarily an orphan vertex in the clean graph and that significantly reduces the complexity of the algorithm because we can just look for all the vertices that are adjacent to island edges to determine the complete list of vertices to remove.

I'm including a small tree generated after running an import over Lakewood and bambam from the generated output as evidence this is working correctly:

(red lines are isochrones, purple lines are drive traversals and yellow/green are walk/bike traversal (not sure which is which at the moment))

…ere referenced by edges but were filtered out

yamilbknsu · 2026-03-12T19:57:14Z

I'm keeping in mind the need to handle the potential conflicts with the other PRs

Copilot

Pull request overview

This PR improves the OMF import pipeline’s runtime characteristics and graph cleanliness for large (“national run”) imports by widening the spatial extent predicate for segment filtering and introducing vertex cleanup after island-edge removal.

Changes:

Switched extent filtering from contains to intersects (with a bbox short-circuit) to speed up segment filtering.
Added a vertex-remapping step to remove vertices associated with detected island edges and to renumber remaining vertices/edges.
Increased use of Rayon parallel iterators across several graph/vector operations for throughput improvements.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
`rust/bambam-omf/src/graph/serialize_ops.rs`	Adds vertex remapping and updates edge-list cleanup to remap vertex IDs and re-index edges; introduces additional parallel iteration.
`rust/bambam-omf/src/graph/omf_graph.rs`	Integrates island detection post-processing by applying vertex remapping and cleaning edge lists; adds step-level logging.
`rust/bambam-omf/src/app/network.rs`	Updates extent application to segments using `intersects` (bbox + geometry) and changes connector filtering strategy to keep only connectors referenced by retained segments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

rust/bambam-omf/src/app/network.rs

rust/bambam-omf/src/graph/serialize_ops.rs

Copilot · 2026-03-12T19:58:47Z

rust/bambam-omf/src/graph/serialize_ops.rs

+            .par_iter()
            .enumerate()
            .filter_map(|(idx, edge)| mask[idx].then_some(*edge))
-            .collect::<Vec<Edge>>()
+            // correct indices after some edges were removed
+            .collect::<Vec<_>>()
+            .into_iter()


par_iter().enumerate().filter_map(...).collect::<Vec<_>>() does not guarantee preserving the original order (the iterator is no longer indexed after filter_map). If order changes here, the subsequent enumerate() will assign new edge IDs in a non-deterministic order and can also desynchronize edges vs. geometries/classes/speeds/bearings/omf_segment_ids which are filtered similarly below. Consider using a stable-order approach (e.g., collect a Vec<Option<T>> in parallel and then flatten() sequentially), or keep this cleanup step sequential to preserve alignment.

@copilot I will change this par_iter to iter. Strictly based on official rayon documentation, can we conclude this is the only iterator implemented in this pipeline that would not correctly maintain order of the original vector?

@robfitzgerald bringing this to your attention. I keep finding evidence that par_iter should maintain order, but when it may have unexpected behaviors like this (when using enumerate), do you think we should just avoid it in these order critical operations?

I'm curious to see copilots research here.

yes. this is a concern i've raised before, too. typically in these scenarios, you have to enumerate then scatter:

my_collection .into_iter() .enumerate() .into_par_iter() .map() ...

and then on the tail end, you're presented with the problem of re-sorting the result.

i read what the feedback was from copilot though. there is one par iterator that does maintain order, and the problem was filter_map, right? from the other PR:

Rayon's filter_map produces an unindexed ParallelIterator, which — per official Rayon docs — does not guarantee order preservation (only IndexedParallelIterator does)

since that could change the size of the collection, which breaks whatever mechanism they are using under-the-hood to reconstruct ordering. perhaps do the "map" with this indexed par iter, and then perform filter synchronously, after you have gathered the parallel result? filter is a pretty cheap operation; it's mostly whatever is in the .map that has the performance implications.

Ok this makes sense. I'll go with the assumption that into_par_iter().enumerate() is a no-go. I'm not super concerned with the linear scans that we do to filter in these cases so I'll just remove the par_ to avoid headaches. I looked for other instances and we should be good since we don't use enumerate in other places.

rust/bambam-omf/src/graph/omf_graph.rs

rust/bambam-omf/src/app/network.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot · 2026-03-12T20:04:12Z

@yamilbknsu I've opened a new pull request, #118, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot · 2026-03-12T20:05:13Z

@yamilbknsu I've opened a new pull request, #119, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: yamilbknsu <1139432+yamilbknsu@users.noreply.github.com>

Fix stale `vertex_lookup` after island detection vertex remapping

yamilbknsu · 2026-03-17T19:58:10Z

@robfitzgerald Alright, I have patched the problematic iterators to separate the filter and map steps. I have run the before and after and compared the sha sum of the output and it matches. This also produces valid bambam output after adjusting to the new aggregate opportunities output format, so it should be ready for your review!

robfitzgerald

good progress. i have a concern about the final set of VertexIds stored in the Edges, could you read and reply to that? if that concern isn't correct, then i will go ahead and accept, but if the concern is valid, then let's find the minimal requirement to address the concern so we can wrap this up! thanks Yamil.

rust/bambam-omf/src/app/network.rs

rust/bambam-omf/src/graph/component_algorithm.rs

robfitzgerald · 2026-03-19T13:44:14Z

rust/bambam-omf/src/graph/omf_graph.rs

+            }
+
+            // Compute and apply vertex remapping
+            let vertex_remapping = compute_vertex_remapping(&vertices, &edge_lists, &island_edges)?;


we have loaded all segments and connectors and built Edges and Vertices from them. here we drop island vertices and reassign a VertexId sequentially. this invalidates the previous list of vertex ids which was used to assign src/dst VertexIds in each Edge, but, i don't see a spot after this where those identifiers are updated accordingly.

i have been wondering, couldn't all of this island generation + pre-processing take place over the collection of SegmentSplits? with those, we haven't yet invented the VertexIds/EdgeIds/EdgeListIds. if we can remove SegmentSplits due to island detection, and then we build the final Vertices/Edges (and their associated ids), then we don't need to apply this kind of correction. edit: don't let this part of the comment send you down a rabbit hole; this could be a tech debt issue instead.

yamilbknsu · 2026-03-19T19:25:30Z

@robfitzgerald As discussed above and IRL, I have refactored the visit_edges function and created several issues for the tech debt milestone (they are all related):

#116: Island detection for individual edge lists
#123: Change the vertex removal if we modify the weak connectivity logic
#124: Refactor island detection to be performed at segment-connector level instead of fully qualified edge-vertex level

Also, to address the comment above, the vertex_remapping is applied to the edge lists in the clean_omf_list function. Another sign that a refactor would be beneficial.

The output of this import at a national scale is currently running on kestrel 🤞

robfitzgerald

sounds good! will 🙏 for our national dataset generation result. i just went ahead and clicked the "merge" button to merge in the GTFS Flex changes, sorry for going over the line there, it was impulsive! i'm giving this 👍 so when it passes CI it's good to merge. thanks Yamil, and thanks again for adding those tech debt issues as well.

yamilbknsu · 2026-03-19T21:51:25Z

i just went ahead and clicked the "merge"

I'll make sure to get my revenge some day 😈

yamilbknsu added 8 commits March 3, 2026 09:11

add par_ and debug!

e98d94e

change debug! to info!

cac83cd

optimized extent application and changed to intersects

2a26ff9

change predicate back to contains following errors where connectors w…

ae5927f

…ere referenced by edges but were filtered out

implemented intersects + ids optimization

c89d1f0

fixed edge-ids

93ccb0a

cleaning island vertices

30b154a

Merge branch 'main' into yep/lakewood_tests

791985b

yamilbknsu requested review from Copilot and robfitzgerald March 12, 2026 19:53

Copilot started reviewing on behalf of yamilbknsu March 12, 2026 19:54 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

Update rust/bambam-omf/src/graph/serialize_ops.rs

f355487

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI mentioned this pull request Mar 12, 2026

Fix order-unsafe parallel iterators in clean_omf_edge_list #118

Closed

Initial plan

7ab2c2c

Copilot AI mentioned this pull request Mar 12, 2026

Fix stale vertex_lookup after island detection vertex remapping #119

Merged

yamilbknsu and others added 9 commits March 12, 2026 16:07

remove par_ that breaks order

c991d4a

Fix stale vertex_lookup after island detection vertex remapping

555a9d2

Co-authored-by: yamilbknsu <1139432+yamilbknsu@users.noreply.github.com>

Remove unwrap use

9ead189

clippy

a090e96

Merge pull request #119 from NatLabRockies/copilot/sub-pr-117-again

6e8ad67

Fix stale `vertex_lookup` after island detection vertex remapping

remove par_ from iter().enumerate()

9e105c5

Merge branch 'main' into yep/lakewood_tests

11304cb

Merge branch 'main' into yep/lakewood_tests

75dea2b

patch iter operations

cadb167

bugfix (sink and sources in island algorithm)

ec98da8

robfitzgerald reviewed Mar 19, 2026

View reviewed changes

yamilbknsu added 3 commits March 19, 2026 15:04

refactor visit_edge operation

d7ebc42

comment fix

a108dcc

Merge branch 'main' into yep/lakewood_tests

0cc5025

Merge branch 'main' into yep/lakewood_tests

9dbae3f

robfitzgerald approved these changes Mar 19, 2026

View reviewed changes

yamilbknsu merged commit cd745b1 into main Mar 19, 2026
1 check passed

robfitzgerald deleted the yep/lakewood_tests branch March 20, 2026 16:45

Conversation

yamilbknsu commented Mar 12, 2026

Uh oh!

yamilbknsu commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

yamilbknsu Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

yamilbknsu Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

robfitzgerald Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

yamilbknsu Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Mar 12, 2026

Uh oh!

Copilot AI commented Mar 12, 2026

Uh oh!

yamilbknsu commented Mar 17, 2026

Uh oh!

robfitzgerald left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robfitzgerald Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yamilbknsu commented Mar 19, 2026

Uh oh!

robfitzgerald left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yamilbknsu commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

robfitzgerald Mar 19, 2026 •

edited

Loading