NLTK splitter specifying language #10228

tarekabouzeid · 2023-09-05T12:35:50Z

tarekabouzeid
Sep 5, 2023

Hi,

I was using NLTK splitter, and noticed that it is only using English language tokenizers. I have created a PR on fork for langchain to take language as parameter for the NLTK splitter and passing this to "sent_tokenize" in "nltk.tokenize".

Will it be useful to create PR towards langchain repo ?

Thanks

Answered by dosubot[bot]

Sep 5, 2023

🤖

That's great to hear that you have created a PR for LangChain! Adding the ability to specify the language for the NLTK splitter would definitely be a valuable addition to the framework. It would make it more versatile and useful for users working with languages other than English. I encourage you to submit your PR to the LangChain repository. The maintainers will review it and decide if it should be merged. Thank you for your contribution!

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to …

View full answer

@dosu-bot · 2023-09-05T12:38:09Z

dosubot[bot]
bot Sep 5, 2023

🤖

That's great to hear that you have created a PR for LangChain! Adding the ability to specify the language for the NLTK splitter would definitely be a valuable addition to the framework. It would make it more versatile and useful for users working with languages other than English. I encourage you to submit your PR to the LangChain repository. The maintainers will review it and decide if it should be merged. Thank you for your contribution!

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NLTK splitter specifying language #10228

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

NLTK splitter specifying language #10228

Uh oh!

tarekabouzeid Sep 5, 2023

Replies: 1 comment

Uh oh!

dosubot[bot] bot Sep 5, 2023

tarekabouzeid
Sep 5, 2023

dosubot[bot]
bot Sep 5, 2023