Skip to content

Conversation

@robertbastian
Copy link
Member

Clients want to do custom dictinoary/lstm loading, without doing full custom data loading for segmenters. We can extend Segmenter::new_for_non_complex_scripts with methods load_lstm and load_dictionary so that the non-complex script data can be loaded from a different source, or behind a Cargo feature, without jumping through the hoops of defining a branching data provider or reloading data that has already been loaded.

@robertbastian robertbastian marked this pull request as ready for review February 5, 2026 13:16
@robertbastian robertbastian reopened this Feb 5, 2026
@sffc
Copy link
Member

sffc commented Feb 5, 2026

Or they can just implement a custom DataProvider?

If this case is common enough I guess these methods are harmless.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use this builder pattern in DateTimeNames, which is a power-user API, but it seems okay here too.

/// ✨ *Enabled with the `compiled_data` and `lstm` Cargo features.*
#[cfg(feature = "lstm")]
#[cfg(feature = "compiled_data")]
pub fn with_lstm_unstable(mut self) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: _unstable is on the owned type and compiled data is on the Borrowed type.


/// Loads dictionary data for a [`WordSegmenter`] constructed with
/// [`WordSegmenter::new_for_non_complex_scripts`].
pub fn with_dictionary_unstable<D>(mut self, provider: &D) -> Result<Self, DataError>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should we have the buffer versions, too?

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defer to Shane, broadly speaking looks fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants