Skip to content

Commit 17015b4

Browse files
committed
translations don't always use unicode code points now
1 parent 337b800 commit 17015b4

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

supervisor/shared/translate/compressed_string.h

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,18 @@
3838
// 9 in some translations sometime in the future. This length excludes
3939
// the trailing NUL, though notably decompress_length includes it.
4040
//
41-
// - followed by the huffman encoding of the individual UTF-16 code
41+
// - followed by the huffman encoding of the individual code
4242
// points that make up the string. The trailing "\0" is not
4343
// represented by a huffman code, but is implied by the length.
4444
// (building the huffman encoding on UTF-16 code points gave better
4545
// compression than building it on UTF-8 bytes)
4646
//
47+
// - If possible, the code points are represented as uint8_t values, with
48+
// 0..127 representing themselves and 160..255 representing another range
49+
// of Unicode, controlled by translation_offset and translation_offstart.
50+
// If this is not possible, uint16_t values are used. At present, no translation
51+
// requires code points not in the BMP, so this is adequate.
52+
//
4753
// - code points starting at 128 (word_start) and potentially extending
4854
// to 255 (word_end) (but never interfering with the target
4955
// language's used code points) stand for dictionary entries in a

0 commit comments

Comments
 (0)