It would be convenient to be able to register your own Chunkers #1005
aropb
started this conversation in
2. Feature requests
Replies: 2 comments 2 replies
-
The recommended approach is to load and use custom handlers. When loading files, the "steps" parameter allows to choose which handlers to execute, which in turns allows to customize all the ingestion aspects: extraction, chunking, storage, etc. |
Beta Was this translation helpful? Give feedback.
0 replies
-
That's what I'm doing now, but why rewrite the entire handler if you only need to replace the chunker. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently, the creation of a Chunker instance is strictly embedded in the code (SummarizationHandler, TextPartitioningHandler).
It would be very convenient to be able to register your Chunkers instead of the standard ones and then use them through dependencies.
It is also important to be able to use your own Tokenizer (instead of CL100KTokenizer()).
The CL100KTokenizer is currently being created by default in SummarizationHandler, TextPartitioningHandler for default Chunkers.
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions