Skip to content
Discussion options

You must be logged in to vote

Hi, the loading time depends on the language defaults. It looks like Indonesian has a large number of tokenizer exceptions, which take a while to load.

If you don't need these exceptions for your task, you could remove some of the exceptions from the language defaults (IndonesianDefaults.tokenizer_exceptions) before loading the pipeline to reduce the loading time. If you save this pipeline with nlp.to_disk() it will only include the modified exceptions so you can load it directly without having to make the same modifications each time.

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Answer selected by geokaragiannis
Comment options

You must be logged in to vote
2 replies
@adrianeboyd
Comment options

@geokaragiannis
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / id Indonesian language data and models
2 participants
Converted from issue

This discussion was converted from issue #11079 on July 05, 2022 11:26.