Skip to content

Fix docs_from_attrs truncating mid-UTF-8 codepoint#1769

Merged
leighmcculloch merged 8 commits intomainfrom
docs-from-attrs
Mar 24, 2026
Merged

Fix docs_from_attrs truncating mid-UTF-8 codepoint#1769
leighmcculloch merged 8 commits intomainfrom
docs-from-attrs

Conversation

@leighmcculloch
Copy link
Copy Markdown
Member

What

Use floor_char_boundary to truncate doc comments at a valid UTF-8 character boundary in docs_from_attrs. Add a test that confirms a doc string where a multi-byte character straddles the truncation boundary produces valid UTF-8.

Why

The current implementation truncates doc bytes with Vec::truncate at an arbitrary byte offset, which could split multi-byte UTF-8 codepoints and store invalid UTF-8 in the contract's spec XDR.

Close #1768

@leighmcculloch leighmcculloch requested a review from a team March 17, 2026 12:44
@leighmcculloch leighmcculloch marked this pull request as ready for review March 17, 2026 12:44
Copilot AI review requested due to automatic review settings March 17, 2026 12:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a UTF-8 correctness bug in soroban-sdk-macros where doc-comment truncation could split multi-byte characters, potentially encoding invalid UTF-8 into the contract spec XDR.

Changes:

  • Truncate doc strings using floor_char_boundary so truncation always occurs at a valid UTF-8 boundary.
  • Add a unit test covering the case where a multi-byte character straddles the truncation boundary.

Comment thread Cargo.toml
@leighmcculloch leighmcculloch added this pull request to the merge queue Mar 23, 2026
Merged via the queue into main with commit 0e14f5c Mar 24, 2026
192 of 194 checks passed
@leighmcculloch leighmcculloch deleted the docs-from-attrs branch March 24, 2026 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs_from_attrs can produce invalid UTF-8 when truncating multi-byte characters

3 participants