[prism] Don't panic on invalid UTF-8 strings in heredocs by reese · Pull Request #758 · fables-tales/rubyfmt

reese · 2026-01-05T16:56:26Z

The way we determine the "common indent" for a heredoc (that is, for squiggly heredocs, the farthest-left line that determines the indentation level) is to look for lines where the unescaped and the content_loc have different leading whitespaces. We previously did this by converting them to strings and looking at their length, but this panics on invalid UTF-8 because all the *_to_str functions assume they are given valid UTF-8 strings.

Instead, this PR just counts the whitespace chars directly at the beginning of each slice, which should achieve the same thing without panicking on invalid UTF-8.

froydnj

Thanks for this! My efforts over the weekend to format individual files to try to find the invalid UTF8 were weirdly not crashing (vs. the format-in-one-go invocation that was), and I hadn't yet debugged what I did wrong. Ideally this will just magically fix things.

froydnj · 2026-01-05T17:01:42Z

librubyfmt/src/format_prism.rs

+                    let raw_leading = raw.iter().take_while(|&&b| b == b' ' || b == b'\t').count();
+                    let unescaped_leading = unescaped
+                        .iter()
+                        .take_while(|&&b| b == b' ' || b == b'\t')


I know that Sorbet's heredoc bits in the parser have to take into account a tab being 8 spaces. Do we have to do that same accounting here as well?

Answering my own question, I guess not, because the number of spaces and tabs in the raw and unescaped are essentially coming from the same source, so things should just match up?

Yeah I think in this case since they're the same source, it doesn't really matter, and I think if you had a tab that was beyond the common indent, it would be a part of the string contents and just get rendered anyways.

[prism] Don't panic on invalid UTF-8 strings in heredocs

1a18cdd

froydnj approved these changes Jan 5, 2026

View reviewed changes

reese merged commit 66b27c0 into trunk Jan 5, 2026
8 checks passed

reese deleted the reese-invalid-utf-8-panic branch January 5, 2026 17:07

reese mentioned this pull request Jan 5, 2026

[prism] panics on invalid UTF-8 #751

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prism] Don't panic on invalid UTF-8 strings in heredocs#758

[prism] Don't panic on invalid UTF-8 strings in heredocs#758
reese merged 1 commit intotrunkfrom
reese-invalid-utf-8-panic

reese commented Jan 5, 2026

Uh oh!

froydnj left a comment

Uh oh!

froydnj Jan 5, 2026

Uh oh!

reese Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

reese commented Jan 5, 2026

Uh oh!

froydnj left a comment

Choose a reason for hiding this comment

Uh oh!

froydnj Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

reese Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants