Loading Indonesian is very slow compared to other languages. #11080
-
How to reproduce the behaviourLoading Indonesian is very slow compared to other languages.
For other languages (English, Spanish, etc) the same operation takes about 0.1 seconds or even less Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi, the loading time depends on the language defaults. It looks like Indonesian has a large number of tokenizer exceptions, which take a while to load. If you don't need these exceptions for your task, you could remove some of the exceptions from the language defaults ( |
Beta Was this translation helpful? Give feedback.
-
Hi @adrianeboyd thank you for your response. Is there a way to remove the exceptions without using |
Beta Was this translation helpful? Give feedback.
Hi, the loading time depends on the language defaults. It looks like Indonesian has a large number of tokenizer exceptions, which take a while to load.
If you don't need these exceptions for your task, you could remove some of the exceptions from the language defaults (
IndonesianDefaults.tokenizer_exceptions
) before loading the pipeline to reduce the loading time. If you save this pipeline withnlp.to_disk()
it will only include the modified exceptions so you can load it directly without having to make the same modifications each time.