v0.12.0 - Centralized Chunk Configuration #153
benbrandt
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What's New
This release is a big API change to pull all chunk configuration options into the same place, at initialization of the splitters. This was motivated by two things:
Overall, I think this has aligned the library with the usage I have seen in the wild, and pulls all of the settings for the "domain" of chunking into a single unit.
Breaking Changes
Rust
true
, and this does logically make sense as the default behavior.TextSplitter
andMarkdownSplitter
now take aChunkConfig
in their::new
methodChunkSizer
,ChunkCapacity
andtrim
settings into a single struct that can be instantiated with a builder-lite pattern.with_trim_chunks
method has been removed fromTextSplitter
andMarkdownSplitter
. You can now settrim
in theChunkConfig
struct.ChunkCapacity
is now a struct instead of a Trait. If you were using a customChunkCapacity
, you can change yourimpl
to aFrom<TYPE> for ChunkCapacity
instead. and you should be able to still pass it in to all of the same methods.ChunkSizer
s take a concrete type in their method instead of an implMigration Examples
Default settings:
Hugging Face Tokenizers:
Tiktoken:
Ranges:
Markdown:
ChunkSizer impls
ChunkCapacity impls
Python
capacity
is now a required arguement in the__init__
and classmethods ofTextSplitter
andMarkdownSplitter
trim_chunks
parameter is now justtrim
in the__init__
and classmethods ofTextSplitter
andMarkdownSplitter
Migration Examples
Default settings:
Ranges:
Hugging Face Tokenizers:
Tiktoken:
Custom callback:
Markdown:
Full Changelog: v0.11.0...v0.12.0
This discussion was created from the release v0.12.0 - Centralized Chunk Configuration.
Beta Was this translation helpful? Give feedback.
All reactions