NLTK splitter specifying language #10228
-
Hi, I was using NLTK splitter, and noticed that it is only using English language tokenizers. I have created a PR on fork for langchain to take language as parameter for the NLTK splitter and passing this to "sent_tokenize" in "nltk.tokenize". Will it be useful to create PR towards langchain repo ? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
🤖 That's great to hear that you have created a PR for LangChain! Adding the ability to specify the language for the NLTK splitter would definitely be a valuable addition to the framework. It would make it more versatile and useful for users working with languages other than English. I encourage you to submit your PR to the LangChain repository. The maintainers will review it and decide if it should be merged. Thank you for your contribution! This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
🤖
That's great to hear that you have created a PR for LangChain! Adding the ability to specify the language for the NLTK splitter would definitely be a valuable addition to the framework. It would make it more versatile and useful for users working with languages other than English. I encourage you to submit your PR to the LangChain repository. The maintainers will review it and decide if it should be merged. Thank you for your contribution!
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to …