From 61a437430d4a3218fc3a4e2d0e1fa4e9e0413f13 Mon Sep 17 00:00:00 2001 From: Roozbeh Pournader Date: Mon, 4 Aug 2025 11:01:53 -0700 Subject: [PATCH 1/3] Add comment about CJK compatibility ideographs to DoNotEmit.txt --- unicodetools/data/ucd/dev/DoNotEmit.txt | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/unicodetools/data/ucd/dev/DoNotEmit.txt b/unicodetools/data/ucd/dev/DoNotEmit.txt index b43b856fdb..969f86f3a7 100644 --- a/unicodetools/data/ucd/dev/DoNotEmit.txt +++ b/unicodetools/data/ucd/dev/DoNotEmit.txt @@ -1,5 +1,5 @@ # DoNotEmit-17.0.0.txt -# Date: 2025-07-30 +# Date: 2025-08-04 # © 2025 Unicode®, Inc. # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. # For terms of use and license, see https://www.unicode.org/terms_of_use.html @@ -42,6 +42,14 @@ # Sequences for Egyptian Hieroglyphs are not listed here. See # the kEH_AltSeq property in UAX #57 for that information. # +# CJK compatibility ideographs are not listed here either. Each CJK +# compatibility ideograph is canonically equivalent to a CJK unified +# ideograph, which means that distinctions would be lost in normalization. +# The preferred form for applications that intend to keep the distinction is +# using a standardized variation sequence instead of a CJK compatibility +# ideograph. For a comprehensive list of such sequences, see the section +# "CJK compatibility ideographs" in StandardizedVariants.txt. +# # Note that some sequences could be considered recursive, in the way that # the preferred sequence to use may be a subsequence of the "Do Not Emit" # sequence. This may have implications for some implementations who may want From f3e191deac0f136c2ca2b5e3a349bbd730ee3a1c Mon Sep 17 00:00:00 2001 From: Roozbeh Pournader Date: Mon, 4 Aug 2025 19:11:34 -0700 Subject: [PATCH 2/3] Expand the explanation --- unicodetools/data/ucd/dev/DoNotEmit.txt | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/unicodetools/data/ucd/dev/DoNotEmit.txt b/unicodetools/data/ucd/dev/DoNotEmit.txt index 969f86f3a7..6c8eb4b08d 100644 --- a/unicodetools/data/ucd/dev/DoNotEmit.txt +++ b/unicodetools/data/ucd/dev/DoNotEmit.txt @@ -42,13 +42,15 @@ # Sequences for Egyptian Hieroglyphs are not listed here. See # the kEH_AltSeq property in UAX #57 for that information. # -# CJK compatibility ideographs are not listed here either. Each CJK -# compatibility ideograph is canonically equivalent to a CJK unified -# ideograph, which means that distinctions would be lost in normalization. -# The preferred form for applications that intend to keep the distinction is -# using a standardized variation sequence instead of a CJK compatibility -# ideograph. For a comprehensive list of such sequences, see the section -# "CJK compatibility ideographs" in StandardizedVariants.txt. +# CJK compatibility ideographs are not listed here either. Most of the CJK +# compatibility ideographs are canonically equivalent to a CJK unified +# ideograph, which means that distinctions between compatibility ideographs +# and the unified ideogreaphs that they are canonically equivalent to would +# be lost in normalization. The preferred form for applications that intend +# to keep such distinctions is using a standardized variation sequence +# instead of a CJK compatibility ideograph. For a comprehensive list of +# these standardized variations sequences, see the section "CJK +# compatibility ideographs" in StandardizedVariants.txt. # # Note that some sequences could be considered recursive, in the way that # the preferred sequence to use may be a subsequence of the "Do Not Emit" From c1234d6a5450ece6b0d4d6b97fb22bbf355f347e Mon Sep 17 00:00:00 2001 From: Roozbeh Pournader Date: Mon, 4 Aug 2025 21:02:23 -0700 Subject: [PATCH 3/3] Fix two typos --- unicodetools/data/ucd/dev/DoNotEmit.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/unicodetools/data/ucd/dev/DoNotEmit.txt b/unicodetools/data/ucd/dev/DoNotEmit.txt index 6c8eb4b08d..b9c1d8f4a6 100644 --- a/unicodetools/data/ucd/dev/DoNotEmit.txt +++ b/unicodetools/data/ucd/dev/DoNotEmit.txt @@ -45,11 +45,11 @@ # CJK compatibility ideographs are not listed here either. Most of the CJK # compatibility ideographs are canonically equivalent to a CJK unified # ideograph, which means that distinctions between compatibility ideographs -# and the unified ideogreaphs that they are canonically equivalent to would +# and the unified ideographs that they are canonically equivalent to would # be lost in normalization. The preferred form for applications that intend # to keep such distinctions is using a standardized variation sequence # instead of a CJK compatibility ideograph. For a comprehensive list of -# these standardized variations sequences, see the section "CJK +# these standardized variation sequences, see the section "CJK # compatibility ideographs" in StandardizedVariants.txt. # # Note that some sequences could be considered recursive, in the way that