Skip to content

Add support for extended grapheme clusters.#37

Merged
salt-die merged 11 commits intomainfrom
egcs
Apr 17, 2025
Merged

Add support for extended grapheme clusters.#37
salt-die merged 11 commits intomainfrom
egcs

Conversation

@salt-die
Copy link
Owner

Add support for extended grapheme clusters (egcs).

To facilitate egcs, the Cell type was changed from:

Cell = np.dtype(
    [
        ("char", "U1"),
        ("bold", "?"),
        ("italic", "?"),
        ("underline", "?"),
        ("strikethrough", "?"),
        ("overline", "?"),
        ("reverse", "?"),
        ("fg_color", "u1", (3,)),
        ("bg_color", "u1", (3,)),
    ]
)

to

Cell = np.dtype(
    [
        ("ord", "uint32"),
        ("style", "u1"),
        ("fg_color", "u1", (3,)),
        ("bg_color", "u1", (3,)),
    ]
)

The "style" field is now a bit field of the SGR parameters of a terminal cell. The style int flags are given by:

  class Style(IntFlag):
    DEFAULT = 0
    BOLD = 0b1
    ITALIC = 0b10
    UNDERLINE = 0b100
    STRIKETHROUGH = 0b1000
    OVERLINE = 0b10000
    REVERSE = 0b100000

The "ord" field now stores the unicode codepoint for a cell or, if the cell is an extended grapheme cluster (egc), an index into a pool of egcs (plus EGC_BASE to mark the codepoint as an egc; EGC_BASE is greater than sys.maxunicode).

Additional changes:

  • Text gadgets now have a chars property that re-views the "ord" field of their canvas as unicode strings to allow characters to be directly added to their canvas as before, e.g.,
    my_text_gadget.chars[0, 0] = "a"
    which is equivalent to
    my_text_gadget.canvas["ord"][0, 0] = ord("a")
    This method should NOT be used to add egcs to the canvas.
  • To add an egc directly to a Text canvas, use the new function text_tools.egc_ord, e.g.,
    my_text_gadget.canvas[0, 0]["ord"] = egc_ord("👩🏽‍🔬")
    However without taking into account the egc column width, this could clip with adjacent cells, so it's recommended to continue to use text_tools.add_text or Text.add_str instead which now account for egcs, e.g.,
    my_text_gadget.add_str("👩🏽‍🔬")
  • The inverse of egc_ord is also provided in text_tools, egc_char.
  • Added the excellent dependencies uwcwidth and ugrapheme and dropped the custom char_width and str_width functions.

@salt-die salt-die merged commit 9fbdf28 into main Apr 17, 2025
5 checks passed
@salt-die salt-die deleted the egcs branch April 17, 2025 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant