@@ -108,7 +108,7 @@ <h1>Unicode® NamesList File Format</h1>
108108 </ tr >
109109 < tr >
110110 < td > Date</ td >
111- < td > 2024-08-19 </ td >
111+ < td > 2024-08-21 </ td >
112112 </ tr >
113113 < tr >
114114 < td > This Version</ td >
@@ -159,8 +159,8 @@ <h2 id="Introduction">1.0 <a href="#Introduction">Introduction</a></h2>
159159draft versions of the NamesList.txt file. The support for UTF-8 encoded files and the syntax for the UTF-8 charset
160160declaration in a comment at the head of the file were introduced after Unicode
1611616.1.0 was published, as was the syntax for the specification of variation sequences and alternate glyphs and their respective summaries. The repertoire restriction
162- in comments and aliases in the names list format was loosened from the prior
163- limitation to U+0020..U+00FF, to include the wider range U+0020..U+02FF, as of Unicode 11.0.</ p >
162+ in comments and aliases in the names list format was loosened from the earlier
163+ limitation to U+0020..U+00FF, to include the wider range U+0020..U+02FF, as of Unicode 11.0, and dropped entirely as of Unicode 16.0.0 .</ p >
164164
165165< p > The same input file can be used for the preparation of drafts and final editions for ISO/IEC
166166 10646. Earlier versions of that standard used a different style, referred to below as ISO-style. That style necessitated the presence of some
@@ -281,10 +281,18 @@ <h2 id="FileStructure">2.0 <a href="#FileStructure">NamesList File Structure</a>
281281 charset declaration (see below). Alternatively, or in addition, a BOM may be
282282 present at the very beginning of the file, forcing the encoding to be
283283 interpreted as UTF-16 (little-endian only) or UTF-8. When
284- declared as UTF-8, the names list format will support use of characters in
285- the range U+0020..U+02FF in LINE and LABEL elements. Otherwise,
284+ declared as UTF-8, the names list format will support use of any Unicode characters in
285+ STRING and LABEL elements. Otherwise,
286286 the supported repertoire is limited to Latin-1, and attempted use of characters outside
287287 the Latin-1 range will result in data corruption.</ p >
288+ < p > The NamesList file format does not support styled text; each line or other element
289+ will usually be displayed in a specific font selected for it. To allow CHAR elements
290+ that normally use chart glyphs to better coexist with running text in LABEL and STRING
291+ elements, a user defined limit can be set, below which the normal selection of (chart) glyphs
292+ for the CHAR element is overridden in favor of equivalent glyphs from a font selected for better
293+ readability in running text. Any running text outside that range will use standard chart
294+ glyphs, which may result in a ransom note effect. For production of the Unicode Standard
295+ Version 16.0.0 and later the limit is set to U+1EFF.</ p >
288296< p > Several of these elements, while part of the formal definition of the
289297 file format, do not occur in final published versions of
290298 NamesList.txt in the < a href ="https://www.unicode.org/Public/UCD/latest/ "> UCD</ a > .</ p >
@@ -514,14 +522,14 @@ <h3 id="FileElements">2.1 <a href="#FileElements">NamesList File Elements</a></h
514522 < li > Because a LINE or an EXPAND_LINE can itself start with a special character followed
515523 by a SP or LF, an "unmarked" COMMENT_LINE should match the input in lower priority than line
516524 types that require a special character or have a more restrictive set of characters than EXPAND_LINE.
517- Similarly, a SUBHEADER containing TAB "!" LF should match with a higher priority than those
525+ Similarly, a SUBHEADER containing TAB "!" LF should match with a higher priority than one
518526 where the TAB is followed by a LINE.</ li >
519527 </ ul >
520528
521529
522530< h3 id ="FilePrimitives "> 2.2 < a href ="#FilePrimitives "> NamesList File Primitives</ a > </ h3 >
523531
524- < p > The following are the primitives and terminals for the NamesList syntax.</ p >
532+ < p > The following are the primitives and terminals for the NamesList syntax. "Limit" is a user-defined value; see discussion of the implications of Limit in the notes below. </ p >
525533
526534< pre > < strong > LINE</ strong > : < strong > STRING LF
527535COMMENT: "(" LABEL ")"
@@ -533,8 +541,8 @@ <h3 id="FilePrimitives">2.2 <a href="#FilePrimitives">NamesList File Primitives<
533541
534542< strong > TAG</ strong > : <sequence of ASCII letters>
535543< strong > LCTAG</ strong > : <sequence of lowercase ASCII letters>
536- < strong > STRING</ strong > : <sequence of characters in the range U+0020..U+02FF , except controls>
537- < strong > LABEL</ strong > : <sequence of characters in the range U+0020..U+02FF , except controls, "(" or ")">
544+ < strong > STRING</ strong > : <sequence of characters, except controls>
545+ < strong > LABEL</ strong > : <sequence of characters, except controls, "(" or ")">
538546< strong > VARSEL</ strong > : < strong > CHAR
539547 | "ALT" ( "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" )</ strong >
540548< strong > VARSEL_LIST</ strong > : < strong > "{" CHAR_LIST "}"</ strong >
@@ -580,19 +588,27 @@ <h3 id="FilePrimitives">2.2 <a href="#FilePrimitives">NamesList File Primitives<
580588 of following characters.</ li >
581589 < li > The hyphen in a character range CHAR-CHAR is replaced by an EN DASH on
582590 output.</ li >
583- < li > In a STRING or LABEL, a Unicode character outside the range
584- U+0000..U+02FF is displayed as is, with a glyph matching
585- the chart font, and not with the font that is otherwise defined for that element.</ li >
586591 < li > The NamesList.txt file is encoded in UTF-8 if the < i > first line</ i > is a
587592 FILE_COMMENT containing the declaration "UTF-8" or any casemap variation
588593 thereof. Otherwise the file is encoded in Latin-1 (older versions). Beyond
589594 detecting the charset declaration (typically: "; charset=utf-8") the
590595 remainder of that comment is ignored.
591- If the file is not encoded as
592- UTF-8, the character repertoire for running text (anything
593- other than CHAR) is effectively restricted to the repertoire of Latin-1.
594- Otherwise, characters in the range U+0020..U+02FF
595- are allowed in STRING or LABEL elements, and elements derived from them.</ li >
596+ When declared as UTF-8, the NamesList format will support any Unicode character
597+ in STRING or LABEL elements, but see further implications below.</ li >
598+ < li > In a STRING or LABEL element, a Unicode character outside the range
599+ U+0020..Limit is displayed with a glyph matching
600+ the chart font, and not with the font that is otherwise defined for that element.
601+ The Limit value is user defined.
602+ For production of the Unicode Standard from Version 16.0.0 and later the Limit
603+ value is set to U+1EFF.
604+ All code points less than the Limit value can be mapped onto a font selected for best
605+ results in running text. However, any CHAR elements contained in an EXPAND_LINE
606+ are exempt from this and are always displayed with a glyph matching the chart font.
607+ The net effect is a workaround for the fact that the NamesList format does
608+ not support style runs within any element that encompasses a single unit of flowed text.</ li >
609+ < li > When drafting STRING or LABEL elements, one should note that text containing
610+ characters outside the range U+0020..Limit may result in a ransom note effect,
611+ as the regular text font and charts fonts would be alternated. This is best avoided.</ li >
596612 < li > The code chart layout program
597613 (< a href ="https://www.unicode.org/unibook/ "> Unibook</ a > )
598614 can accept files in several other formats. These include little-endian UTF-16,
@@ -613,6 +629,8 @@ <h2 id="Modifications"><a href="#Modifications">Modifications</a></h2>
613629 < p > < b > Version 16.0.0</ b > </ p >
614630 < ul >
615631 < li > Reissued for Unicode 16.0.0</ li >
632+ < li > Reflect the wider range of possible values for the user defined Limit.</ li >
633+ < li > Added an explanation of the effect of the Limit value.</ li >
616634 </ ul >
617635
618636 < p > < b > Version 15.1.0</ b > </ p >
0 commit comments