@@ -468,6 +468,32 @@ fn FASTCOVER_convertToFastCoverParams(
468468 fastCoverParams. shrinkDict = coverParams. shrinkDict ;
469469}
470470
471+ /// Train a dictionary from an array of samples using a modified version of COVER algorithm.
472+ ///
473+ /// Samples must be stored concatenated in a single flat buffer `samplesBuffer`, supplied with an
474+ /// array of sizes `samplesSizes`, providing the size of each sample, in order.
475+ ///
476+ /// Only parameters `d` and `k` are required. All other parameters will use default values if not
477+ /// provided.
478+ ///
479+ /// The resulting dictionary will be saved into `dictBuffer`.
480+ ///
481+ /// In general, a reasonable dictionary has a size of ~100 KB. It's possible to select smaller or
482+ /// larger size, just by specifying `dictBufferCapacity`. In general, it's recommended to provide a
483+ /// few thousands samples, though this can vary a lot. It's recommended that total size of all
484+ /// samples be about ~x100 times the target size of dictionary.
485+ ///
486+ /// # Returns
487+ ///
488+ /// - the size of the dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
489+ /// - an error code, which can be tested with [`crate::ZDICT_isError`]
490+ ///
491+ /// Dictionary training will fail if there are not enough samples to construct a dictionary, or if
492+ /// most of the samples are too small (< 8 bytes being the lower limit). If dictionary training
493+ /// fails, you should use zstd without a dictionary, as the dictionary would've been ineffective
494+ /// anyways. If you believe your samples would benefit from a dictionary please open an issue with
495+ /// details, and we can look into it.
496+ ///
471497/// # Safety
472498///
473499/// Behavior is undefined if any of the following conditions are violated:
@@ -604,6 +630,31 @@ fn train_from_buffer_fastcover(
604630 dictionarySize
605631}
606632
633+ /// This function tries many parameter combinations (specifically, `k` and `d` combinations) and
634+ /// picks the best parameters.
635+ ///
636+ /// `*parameters` is filled with the best parameters found, and the dictionary constructed with
637+ /// those parameters is stored in `dictBuffer`.
638+ ///
639+ /// The parameters `d`, `k`, `steps`, and `accel` are optional:
640+ /// - If `d` is zero, we check `d` in 6..8.
641+ /// - If `k` is zero, we check `d` in 50..2000.
642+ /// - If `steps` is zero it defaults to its default value (40).
643+ /// - If `accel` is zero, the default value of 1 is used.
644+ ///
645+ /// # Returns
646+ ///
647+ /// - the size of the dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
648+ /// - an error code, which can be tested with [`crate::ZDICT_isError`]
649+ ///
650+ /// Dictionary training will fail if there are not enough samples to construct a dictionary, or if
651+ /// most of the samples are too small (< 8 bytes being the lower limit). If dictionary training
652+ /// fails, you should use zstd without a dictionary, as the dictionary would've been ineffective
653+ /// anyways. If you believe your samples would benefit from a dictionary please open an issue with
654+ /// details, and we can look into it.
655+ ///
656+ /// On success `*parameters` contains the parameters selected.
657+ ///
607658/// # Safety
608659///
609660/// Behavior is undefined if any of the following conditions are violated:
0 commit comments