Skip to content

Commit d433c70

Browse files
thaafoxlauft
authored andcommitted
Add emoji display width support and width-aware string truncation
mk_wcwidth() returns 0 for emoji codepoints (U+1F300-U+1FAFF) because they fall through as unassigned. Modern terminals render these as 2-cell-wide characters, causing column misalignment in any application that relies on mk_wcwidth() for layout. Add explicit width mappings: - Emoji pictographs (U+1F300-U+1F9FF, U+1FA00-U+1FAFF): width 2 - Arrows, geometric shapes, dingbats, misc symbols: width 1 - Variation selectors (U+FE00-U+FE0F): width 0 Add utf8_truncate_to_width() which truncates a UTF-8 string to fit within a target number of display columns, ensuring multi-byte sequences are never split mid-character.
1 parent 3da1534 commit d433c70

File tree

2 files changed

+44
-0
lines changed

2 files changed

+44
-0
lines changed

src/utf8.cpp

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -299,6 +299,31 @@ std::string utf8_substr (
299299
return result;
300300
}
301301

302+
////////////////////////////////////////////////////////////////////////////////
303+
// Truncate a UTF-8 string to fit within a target display width.
304+
// Unlike substr which counts characters, this counts display columns.
305+
const std::string utf8_truncate_to_width (
306+
const std::string& input,
307+
unsigned int target_width)
308+
{
309+
unsigned int current_width = 0;
310+
std::string::size_type i = 0;
311+
std::string::size_type last_safe = 0;
312+
unsigned int c;
313+
314+
while ((c = utf8_next_char (input, i)))
315+
{
316+
int w = mk_wcwidth (c);
317+
if (w < 0) w = 0;
318+
if (current_width + w > target_width)
319+
break;
320+
current_width += w;
321+
last_safe = i;
322+
}
323+
324+
return input.substr (0, last_safe);
325+
}
326+
302327
////////////////////////////////////////////////////////////////////////////////
303328
int mk_wcwidth(wchar_t ucs)
304329
{
@@ -316,6 +341,23 @@ int mk_wcwidth(wchar_t ucs)
316341
if (width == widechar_ambiguous)
317342
return 1;
318343

344+
// Emoji pictographs (U+1F300+) — width 2 in modern terminals
345+
if ((ucs >= 0x1F300 && ucs <= 0x1F9FF) || // Misc Symbols, Pictographs, Emoticons, Supplemental
346+
(ucs >= 0x1FA00 && ucs <= 0x1FAFF)) // Symbols and Pictographs Extended-A
347+
return 2;
348+
349+
// Common symbols — width 1 in modern terminals
350+
if ((ucs >= 0x2190 && ucs <= 0x21FF) || // Arrows
351+
(ucs >= 0x2300 && ucs <= 0x23FF) || // Misc Technical
352+
(ucs >= 0x25A0 && ucs <= 0x25FF) || // Geometric Shapes
353+
(ucs >= 0x2600 && ucs <= 0x26FF) || // Misc Symbols
354+
(ucs >= 0x2700 && ucs <= 0x27BF)) // Dingbats
355+
return 1;
356+
357+
// Variation selectors — zero width
358+
if (ucs >= 0xFE00 && ucs <= 0xFE0F)
359+
return 0;
360+
319361
// All other negative values
320362
return 0;
321363
}

src/utf8.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ unsigned int utf8_width (const std::string& str);
3939
unsigned int utf8_text_width (const std::string&);
4040
std::string utf8_substr (const std::string&, unsigned int, unsigned int length = 0);
4141

42+
const std::string utf8_truncate_to_width (const std::string&, unsigned int target_width);
43+
4244
int mk_wcwidth (wchar_t);
4345

4446
#endif

0 commit comments

Comments
 (0)