Skip to content

Commit 167d852

Browse files
committed
add docs explaining the u16 limitation
Signed-off-by: Connor Tsui <[email protected]>
1 parent 241f71d commit 167d852

File tree

1 file changed

+11
-1
lines changed

1 file changed

+11
-1
lines changed

vortex-layout/src/layouts/dict/writer.rs

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,11 +49,21 @@ use crate::sequence::SequentialStreamAdapter;
4949
use crate::sequence::SequentialStreamExt;
5050

5151
/// Constraints for dictionary layout encoding.
52+
///
53+
/// Note that [`max_len`](Self::max_len) is limited to `u16` (65,535 entries) by design. Since
54+
/// layout chunks are typically ~8k elements, having more than 64k unique values in a dictionary
55+
/// means dictionary encoding provides little compression benefit. If a column has very high
56+
/// cardinality, the fallback encoding strategy should be used instead.
5257
#[derive(Clone)]
5358
pub struct DictLayoutConstraints {
5459
/// Maximum size of the dictionary in bytes.
5560
pub max_bytes: usize,
56-
/// Maximum dictionary length.
61+
/// Maximum dictionary length. Limited to `u16` because dictionaries with more than 64k unique
62+
/// values provide diminishing compression returns given typical chunk sizes (~8k elements).
63+
///
64+
/// The codes dtype is chosen dynamically based on the actual dictionary size:
65+
/// - [`PType::U8`] when the dictionary has at most 255 entries
66+
/// - [`PType::U16`] when the dictionary has more than 255 entries
5767
pub max_len: u16,
5868
}
5969

0 commit comments

Comments
 (0)