Skip to content

Commit 5c23e28

Browse files
committed
add explanation for newer compression features
1 parent 4d8b354 commit 5c23e28

File tree

1 file changed

+15
-0
lines changed

1 file changed

+15
-0
lines changed

supervisor/shared/translate/compressed_string.h

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,13 +53,28 @@
5353
// speaking, words. They're just spans of code points that frequently
5454
// occur together. They are ordered shortest to longest.
5555
//
56+
// - If the translation uses a lot of code points or widely spaced code points,
57+
// then the huffman table entries are UTF-16 code points. But if the translation
58+
// uses only ASCII 7-bit code points plus a SMALL range of higher code points that
59+
// still fit in 8 bits, translation_offset and translation_offstart are used to
60+
// renumber the code points so that they still fit within 8 bits. (it's very beneficial
61+
// for mchar_t to be 8 bits instead of 16!)
62+
//
5663
// - dictionary entries are non-overlapping, and the _ending_ index of each
5764
// entry is stored in an array. A count of words of each length, from
5865
// minlen to maxlen, is given in the array called wlencount. From
5966
// this small array, the start and end of the N'th word can be
6067
// calculated by an efficient, small loop. (A bit of time is traded
6168
// to reduce the size of this table indicating lengths)
6269
//
70+
// - Value 1 ('\1') is used to indicate that a QSTR number follows. the
71+
// QSTR is encoded as a fixed number of bits (translation_qstr_bits), e.g.,
72+
// 10 bits if the highest core qstr is from 512 to 1023 inclusive.
73+
// (maketranslationdata uses a simple heuristic where any qstr >= 3
74+
// characters long is encoded in this way; this is simple but probably not
75+
// optimal. In fact, the rule of >= 2 characters is better for SOME languages
76+
// on SOME boards.)
77+
//
6378
// The "data" / "tail" construct is so that the struct's last member is a
6479
// "flexible array". However, the _only_ member is not permitted to be
6580
// a flexible member, so we have to declare the first byte as a separate

0 commit comments

Comments
 (0)