Skip to content

Commit f025107

Browse files
committed
Improved chunking performance.
1 parent 0e591fa commit f025107

File tree

2 files changed

+5
-4
lines changed

2 files changed

+5
-4
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
11
## Changelog 🔄
22
All notable changes to `semchunk` will be documented here. This project adheres to [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
33

4+
## [0.1.1] - 2023-11-07
5+
### Changed
6+
- Improved chunking performance.
7+
48
## [0.1.0] - 2023-11-05
59
### Added
610
- Added the `chunk()` function, which splits text into semantically meaningful chunks of a specified size as determined by a provided token counter.
711

12+
[0.1.1]: https://github.com/umarbutler/semchunk/compare/v0.1.0...v0.1.1
813
[0.1.0]: https://github.com/umarbutler/semchunk/releases/tag/v0.1.0

src/semchunk/semchunk.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -48,10 +48,6 @@ def chunk(text: str, chunk_size: int, token_counter: callable, _recursion_depth:
4848
4949
Returns:
5050
list[str]: A list of chunks up to `chunk_size`-tokens-long, with any whitespace used to split the text removed."""
51-
52-
# If the text is already within the chunk size, return it as the only chunk.
53-
if token_counter(text) <= chunk_size:
54-
return [text]
5551

5652
# Split the text using the most semantically meaningful splitter possible.
5753
splitter, splits = _split_text(text)

0 commit comments

Comments
 (0)