Skip to content

Commit 0448235

Browse files
committed
Releasing v0.2.4.
1 parent b3c0ab4 commit 0448235

File tree

3 files changed

+7
-6
lines changed

3 files changed

+7
-6
lines changed

CHANGELOG.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
## Changelog 🔄
22
All notable changes to `semchunk` will be documented here. This project adheres to [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
33

4-
## [Unreleased] - 2024-XX-XX
4+
## [0.2.4] - 2024-05-13
55
### Changed
6-
- Improved chunking performance with larger chunk sizes by switching from linear to binary search for the identification of optimal chunk boundaries.
6+
- Improved chunking performance with larger chunk sizes by switching from linear to binary search for the identification of optimal chunk boundaries, courtesy of [@R0bk](https://github.com/R0bk) ([#3](https://github.com/umarbutler/semchunk/pull/3)) ([1e3ddb9](https://github.com/umarbutler/semchunk/pull/3/commits/1e3ddb91698f072da1d8a7d809a66467e1d31ff8)).
77

88
## [0.2.3] - 2024-03-11
99
### Fixed
@@ -44,6 +44,7 @@ All notable changes to `semchunk` will be documented here. This project adheres
4444
### Added
4545
- Added the `chunk()` function, which splits text into semantically meaningful chunks of a specified size as determined by a provided token counter.
4646

47+
[0.2.4]: https://github.com/umarbutler/semchunk/compare/v0.2.3...v0.2.4
4748
[0.2.3]: https://github.com/umarbutler/semchunk/compare/v0.2.2...v0.2.3
4849
[0.2.2]: https://github.com/umarbutler/semchunk/compare/v0.2.1...v0.2.2
4950
[0.2.1]: https://github.com/umarbutler/semchunk/compare/v0.2.0...v0.2.1

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "semchunk"
7-
version = "0.2.3"
7+
version = "0.2.4"
88
authors = [
99
{name="Umar Butler", email="[email protected]"},
1010
]

tests/bench.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
import semchunk
2-
import semantic_text_splitter
2+
from semantic_text_splitter import TextSplitter
33
import test_semchunk
44
import time
55

66
chunk_size = 512
7-
semantic_text_splitter_chunker = semantic_text_splitter.TiktokenTextSplitter('gpt-4')
7+
semantic_text_splitter_chunker = TextSplitter.from_tiktoken_model('gpt-4', chunk_size)
88

99
def bench_semchunk(text: str) -> None:
1010
semchunk.chunk(text, chunk_size=chunk_size, token_counter=test_semchunk._token_counter)
1111

1212
def bench_semantic_text_splitter(text: str) -> None:
13-
semantic_text_splitter_chunker.chunks(text, chunk_size)
13+
semantic_text_splitter_chunker.chunks(text)
1414

1515
libraries = {
1616
'semchunk': bench_semchunk,

0 commit comments

Comments
 (0)