-
Notifications
You must be signed in to change notification settings - Fork 262
Add methods to load LSTM or dictionary data for existing segmenters #7590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Or they can just implement a custom DataProvider? If this case is common enough I guess these methods are harmless. |
sffc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use this builder pattern in DateTimeNames, which is a power-user API, but it seems okay here too.
components/segmenter/src/word.rs
Outdated
| /// ✨ *Enabled with the `compiled_data` and `lstm` Cargo features.* | ||
| #[cfg(feature = "lstm")] | ||
| #[cfg(feature = "compiled_data")] | ||
| pub fn with_lstm_unstable(mut self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observation: _unstable is on the owned type and compiled data is on the Borrowed type.
|
|
||
| /// Loads dictionary data for a [`WordSegmenter`] constructed with | ||
| /// [`WordSegmenter::new_for_non_complex_scripts`]. | ||
| pub fn with_dictionary_unstable<D>(mut self, provider: &D) -> Result<Self, DataError> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Should we have the buffer versions, too?
Manishearth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defer to Shane, broadly speaking looks fine
2e8dbfd to
fbb64a1
Compare
fbb64a1 to
c20f629
Compare
Clients want to do custom dictinoary/lstm loading, without doing full custom data loading for segmenters. We can extend
Segmenter::new_for_non_complex_scriptswith methodsload_lstmandload_dictionaryso that the non-complex script data can be loaded from a different source, or behind a Cargo feature, without jumping through the hoops of defining a branching data provider or reloading data that has already been loaded.