[prism] Don't panic on invalid UTF-8 strings in heredocs#758
Conversation
froydnj
left a comment
There was a problem hiding this comment.
Thanks for this! My efforts over the weekend to format individual files to try to find the invalid UTF8 were weirdly not crashing (vs. the format-in-one-go invocation that was), and I hadn't yet debugged what I did wrong. Ideally this will just magically fix things.
| let raw_leading = raw.iter().take_while(|&&b| b == b' ' || b == b'\t').count(); | ||
| let unescaped_leading = unescaped | ||
| .iter() | ||
| .take_while(|&&b| b == b' ' || b == b'\t') |
There was a problem hiding this comment.
I know that Sorbet's heredoc bits in the parser have to take into account a tab being 8 spaces. Do we have to do that same accounting here as well?
Answering my own question, I guess not, because the number of spaces and tabs in the raw and unescaped are essentially coming from the same source, so things should just match up?
There was a problem hiding this comment.
Yeah I think in this case since they're the same source, it doesn't really matter, and I think if you had a tab that was beyond the common indent, it would be a part of the string contents and just get rendered anyways.
The way we determine the "common indent" for a heredoc (that is, for squiggly heredocs, the farthest-left line that determines the indentation level) is to look for lines where the
unescapedand the content_loc have different leading whitespaces. We previously did this by converting them to strings and looking at their length, but this panics on invalid UTF-8 because all the*_to_strfunctions assume they are given valid UTF-8 strings.Instead, this PR just counts the whitespace chars directly at the beginning of each slice, which should achieve the same thing without panicking on invalid UTF-8.