Skip to content

Upgrade to Unicode 16.0 and add support for almost all gemoji aliases#214

Open
shane-tw wants to merge 2 commits into
vdurmont:masterfrom
shane-tw:master
Open

Upgrade to Unicode 16.0 and add support for almost all gemoji aliases#214
shane-tw wants to merge 2 commits into
vdurmont:masterfrom
shane-tw:master

Conversation

@shane-tw
Copy link
Copy Markdown

@shane-tw shane-tw commented Mar 22, 2025

Upgrades the emoji database to Unicode 16.0 and adds support for almost all gemoji aliases.

What's included

  • +330 new emojis from Unicode 10.0–16.0 that were missing
  • +742 new aliases (2711 total, up from 1969), sourced from gemoji
  • FE0F handled transparently: the trie accepts emoji sequences with or without FE0F. For all existing emojis not covered by the alias mapping changes below, getUnicode() return values and emojiChar values are unchanged from v5.1.1.
  • Bugfix: erroneous leading ♾ removed from :pirate_flag: / :jolly_roger:

Alias mapping changes

Six existing aliases now point to different emoji. These are intentional corrections:

Alias Before After Reason
:bride_with_veil: 👰 👰‍♀️ Corrected to gender-explicit ZWJ form
:guardsman: 💂 💂‍♂️ Corrected to gender-explicit ZWJ form
:man_in_tuxedo: 🤵 🤵‍♂️ Corrected to gender-explicit ZWJ form
:man_with_turban: 👳 👳‍♂️ Corrected to gender-explicit ZWJ form
:ok_woman: 🙆 🙆‍♀️ Corrected to gender-explicit ZWJ form
:email: 📧 Aligned with gemoji (:envelope: now maps to ✉)

Note: getUnicode() for the gendered entries returns the ZWJ sequence without FE0F (e.g. 💂‍♂); FE0F is included in emojiChar for correct emoji rendering.

All other existing alias targets are preserved.

Differences from gemoji

This repo intentionally diverges from gemoji on 5 aliases for backward compatibility. The gemoji-preferred emoji is still reachable via an alternative alias:

Alias This repo gemoji Equivalent alias for gemoji emoji
:beetle: 🐞 🪲 :stag_beetle:
:jar: 🏺 🫙 :mason_jar:
:ng: 🇳🇬 🆖 :squared_ng:
:om: 🇴🇲 🕉️ :om_symbol:
:satellite: 🛰 📡 :satellite_antenna:

This repo also carries ~800 aliases that gemoji doesn't have, including 238 two-letter country-code flag aliases (:ac:, :ad:, …) and ~560 others that predate this PR.

@shane-tw shane-tw changed the title Upgrade to Unicode 16.0 Upgrade to Unicode 16.0 and add support for almost all gemoji aliases Mar 24, 2025
@shane-tw shane-tw force-pushed the master branch 4 times, most recently from d0545a0 to cb64ec6 Compare March 24, 2025 11:20
@shane-tw shane-tw force-pushed the master branch 4 times, most recently from 67c2d17 to e6167a2 Compare May 20, 2026 22:53
@shane-tw
Copy link
Copy Markdown
Author

Made various changes to hopefully improve the reviewability and reduce breaking changes in this PR

@shane-tw shane-tw force-pushed the master branch 3 times, most recently from 8c9b327 to 6750954 Compare May 20, 2026 23:10
- Add 330 new emojis from Unicode 10.0–16.0
- Add 742 new gemoji aliases (2711 total, up from 1969)
- Gender-explicit aliases corrected to ZWJ variants:
  👰‍♀️, 💂‍♂️, 🤵‍♂️, 👳‍♂️,
  🙆‍♀️ now map to their gender-specific forms
- Fix erroneous leading ♾ on 🏴‍☠️ / :jolly_roger:
- emojis.json ordered with existing entries in original positions and
  new entries appended, using the original \uXXXX surrogate-pair encoding
- FE0F handled transparently at the trie level: the trie accepts emoji
  sequences with or without FE0F; EmojiParser stores actual emojiEndIndex
  from getEmojiEndPos() so positions are correct when FE0F is in input
- Bump version to 5.1.2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shane-tw shane-tw force-pushed the master branch 2 times, most recently from 14e587d to 1832975 Compare May 27, 2026 16:22
218 existing entries had FE0F appended to their emojiChar field (e.g.
© → ©️, ® → ®️, ☺ → ☺️, ✂ → ✂️, etc.) without a corresponding
change to the emoji field used at runtime. One entry (satellite) also
had FE0F added to its emoji field (🛰 → 🛰️).

Both changes are incorrect for pre-existing entries: the emoji field
(getUnicode() return value) must match emojiChar or the EMOJIS.md docs
become misleading. Newly-added emojis that include FE0F are unaffected.

Regenerate EMOJIS.md from the corrected emojis.json.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants