Improve Unicode support with grapheme clusters by jquast · Pull Request #396 · peterbrittain/asciimatics

jquast · 2026-01-27T06:49:37Z

I read the contributing guidelines
and that's why I include some flake8 and typing/mypy fixes
I also included comprehensive tests
many were designed with TDD (started failing, succeed after change).

Issues fixed by this PR

I did not discover any open issues, I am surprised!
You must not have many CJK and emoji uses.

What does this implement/fix?

Problem: asciimatics text utilities wrongly split up grapheme clusters (emoji ZWJ sequences like 👨‍👩‍👧, regional flags like 🇨🇦, skin tone modifiers, combining characters, etc) -- this causes display corruption in the SpeechBubble, or incorrect width calculations and padding in other places due to missing grapheme support.

Solution: Integrate with wcwidth>=0.5.0 by using:

wcwidth.iter_graphemes() for iteration in _enforce_width_ext(), _find_min_start(), _get_offset().
wcwidth.wrap() for grapheme-aware word wrapping in _split_text().
- Note that you can also pass-through **kwargs for all of the additional built-in features supported by wcwidth.wrap() also, like requested by Add a way to let the user provide a custom text wrapping function #386
wcwidth.ljust() for line padding in SpeechBubble

Any other comments?

I notice that there is a choice to "ignore unicode" for performance improvement, but I can suggest that wcwidth has many "fast path" checks for pure-ascii strings to return len(string) and so on, along with lru_cache, the performance is negligble to always support unicode since, related downstream automatic benchmarking results can be viewed here of upgrade of wcwidth:

View benchmarks of wcwidth 0.2.14 to 0.5.0:

	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	`test_center_ascii`	104.8 µs	62.1 µs	+68.87%
⚡	`test_rjust_cjk`	760.6 µs	640.2 µs	+18.81%
⚡	`test_center_cjk`	763.5 µs	642.6 µs	+18.81%
⚡	`test_truncate_ascii`	33,519.5 µs	83 µs	×400
⚡	`test_center_ansi`	678.1 µs	359.2 µs	+88.78%
⚡	`test_truncate_cjk`	14.2 ms	6.8 ms	×2.1
⚡	`test_length_cjk`	749.7 µs	633 µs	+18.45%
⚡	`test_truncate_emoji_zwj`	5.2 ms	1.6 ms	×3.4
⚡	`test_rjust_ansi`	673.7 µs	356.1 µs	+89.19%
⚡	`test_truncate_ansi`	46.1 ms	23.2 ms	+99.18%
⚡	`test_length_ansi`	672.2 µs	488.9 µs	+37.5%
⚡	`test_length_ascii`	99.2 µs	60.4 µs	+64.23%
⚡	`test_rjust_ascii`	99.8 µs	58.9 µs	+69.33%
⚡	`test_ljust_ansi`	670.5 µs	355.3 µs	+88.74%
⚡	`test_ljust_ascii`	100.2 µs	58.9 µs	+70.1%
⚡	`test_ljust_cjk`	759.6 µs	640.7 µs	+18.55%
⚡	`test_length_emoji_vs16`	739.7 µs	670.7 µs	+10.28%

I can also suggest it is possible to use try/except of import wcwidth, with hasattr, making it an optional co-dependency, to help solve for Reducing number of dependencies #95 and Non-unicode version? Wcwidth is a resource hog. #155

Let me know if you would like any such changes or additional PR's, happy to help.

**Problem**: asciimatics text utilities wrongly split up grapheme clusters (emoji ZWJ sequences like 👨‍👩‍👧gional flags like 🇨🇦skin tone modifiers, combining characters), causing display corruption and incorrect width calculations. **Solution**: Integrate with wcwidth >= 0.5.0 by using: - https://wcwidth.readthedocs.io/en/latest/api.html#wcwidth.iter_graphemes for iteration in _enforce_width_ext(), _find_min_start(), _get_offset() - https://wcwidth.readthedocs.io/en/latest/api.html#wcwidth.wrap for grapheme-aware word wrapping in _split_text() - https://wcwidth.readthedocs.io/en/latest/api.html#wcwidth.ljust for line padding in SpeechBubble I notice that there is a choice to "ignore unicode" for performance improvement, but I can suggest that wcwidth has many "fast path" checks for pure-ascii strings to return len(string) and so on, along with lru_cache, the performance is negligble to always support unicode.

jquast force-pushed the jq/wcwidth-integration branch from ac684a5 to 944238c Compare January 27, 2026 06:57

jquast force-pushed the jq/wcwidth-integration branch from 944238c to f49700d Compare January 27, 2026 07:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Unicode support with grapheme clusters#396

Improve Unicode support with grapheme clusters#396
jquast wants to merge 1 commit intopeterbrittain:masterfrom
jquast:jq/wcwidth-integration

jquast commented Jan 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

jquast commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issues fixed by this PR

What does this implement/fix?

Any other comments?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

jquast commented Jan 27, 2026 •

edited

Loading