Conversation
froydnj
approved these changes
Jan 18, 2026
Collaborator
froydnj
left a comment
There was a problem hiding this comment.
I was looking at this last night and was going to write up a Prism patch that exposes the newline vector, which will probably make this faster? But until we have that information, this is an excellent fix. (Prism release schedules are also not conducive to getting Rust crate changes out quickly.)
Collaborator
Author
|
Yeah that would be an excellent addition to the Prism bindings -- it's always felt a little silly that we have to do this, so it'd be great to have that done during parsing for us. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
At the very beginning of each file, we have to construct a
LineIndexto get the offsets of where each line starts, and we also look up which lines have Ruby vs. just comments/whitespace. Each pass we do is very expensive, since it requires us to iterate byte-by-byte over the entire source, and we currently unnecessarily do two passes instead of one.This PR collapses those into a single pass. It also changes
lines_with_rubyto be a sortedVecinstead of aBTreeSet.BTreeSetdoesn't really make much sense here, since lookups are the same speed as a Vec + binary_search, but insertion is also O(log n). We could use aHashSet, but unless we're formatting 100k line Ruby files, the hashing overhead seems to outweigh the difference in lookup performance --Vecis better for almost all normal usage.This also pulls in the
memchrcrate, which has specialized instructions for doing character searches -- it seems to be a few times faster than a standard.iter()for our use case of searching for newline characters.memchris already a transitive dependency for several crates, so this doesn't end up adding much to compilation time or anything.Looking at profiling data, this was previously taking about 3% of total CPU time on average, and this PR cuts that at least in half on my laptop.