Skip to content

Comments

fix: use mb_strlen for multibyte characters in LineLength sniff#3971

Closed
dmzoneill wants to merge 1 commit intosquizlabs:masterfrom
dmzoneill-forks:fix/issue-3923-multibyte-line-length
Closed

fix: use mb_strlen for multibyte characters in LineLength sniff#3971
dmzoneill wants to merge 1 commit intosquizlabs:masterfrom
dmzoneill-forks:fix/issue-3923-multibyte-line-length

Conversation

@dmzoneill
Copy link
Contributor

Summary

fixes #3923

changed Generic.Files.LineLength to use mb_strlen() instead of strlen() when calculating comment line lengths, so multibyte UTF-8 characters are counted correctly instead of by byte count

Changes

  • replaced strlen() with mb_strlen() using UTF-8 encoding
  • replaced strrpos() with mb_strrpos() using UTF-8 encoding
  • added fallback to byte-based functions when mb_* functions not available

Test

tested with the reproduction case from the issue - Norwegian text with å, æ, ø characters now correctly reports 32 chars instead of 35 bytes

…squizlabs#3923)

The Generic.Files.LineLength sniff was using strlen() to calculate
comment line lengths, which counts bytes instead of characters.
This caused false positives when comments contained multibyte UTF-8
characters like Norwegian letters (å, æ, ø).

Changed to use mb_strlen() and mb_strrpos() with UTF-8 encoding
when available, falling back to strlen() and strrpos() when the
mb_* functions are not available.
@jrfnl jrfnl closed this Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generic.Files.LineLength miscalculates the length of line containing multibyte characters

2 participants