Skip to content

Releases: isaacus-dev/semchunk

v1.0.1

02 Jun 11:44

Choose a tag to compare

Fixed

  • Documented the progress argument in the docstring for chunkerify() and its type hint in the README.

v1.0.0

02 Jun 11:41

Choose a tag to compare

Added

  • Added a progress argument to the chunker returned by chunkerify() that, when set to True and multiple texts are passed, displays a progress bar.

v0.3.2

01 Jun 06:31

Choose a tag to compare

Fixed

  • Fixed a bug where a DivisionByZeroError would be raised where a token counter returned zero tokens when called from merge_splits(), courtesy of @jcobol (#5) (7fd64eb), fixing #4.

v0.3.1

18 May 12:13

Choose a tag to compare

Fixed

  • Fixed typo in error messages in chunkerify() where it was referred to as make_chunker().

v0.3.0

18 May 12:06

Choose a tag to compare

Added

  • Introduced the chunkerify() function, which constructs a chunker from a tokenizer or token counter that can be reused and can also chunk multiple texts in a single call. The resulting chunker speeds up chunking by 40.4% thanks, in large part, to a token counter that avoid having to count the number of tokens in a text when the number of characters in the text exceed a certain threshold, courtesy of @R0bk (#3) (337a186).

v0.2.4

13 May 11:34

Choose a tag to compare

Changed

  • Improved chunking performance with larger chunk sizes by switching from linear to binary search for the identification of optimal chunk boundaries, courtesy of @R0bk (#3) (1e3ddb9).

v0.2.3

11 Mar 04:29

Choose a tag to compare

Fixed

  • Ensured that memoization does not overwrite chunk()'s function signature.

v0.2.2

05 Feb 09:44

Choose a tag to compare

Fixed

  • Ensured that the memoize argument is passed back to chunk() in recursive calls.

v0.2.1

05 Feb 09:44

Choose a tag to compare

Added

  • Memoized chunk().

Fixed

  • Fixed typos in README.

v0.2.0

07 Nov 12:42

Choose a tag to compare

Added

  • Added the memoize argument to chunk(), which memoizes token counters by default to significantly improve performance.

Changed

  • Improved chunking performance.